{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Trust Region\n", "For this training course we'll be using the optimization routines in SciPy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipy.optimize as opt\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need to define a callback function so that we can plot the optimization steps." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def callback(x):\n", " global xprev\n", " plt.plot([xprev[0],x[0]],[xprev[1],x[1]],'b.-')\n", " xprev = x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.1 Trust Region Subproblem\n", "Consider the trust-region subproblem given by\n", "\n", "$\n", "\\begin{align}\n", "&\\min_{s \\in \\mathbb{R}^n} s^Tg + \\frac{1}{2}s^THs \\\\\n", "&\\text{ s.t. } \\lVert s \\rVert_2 \\le \\Delta \n", "\\end{align}\n", "$\n", "\n", "and recall that any global minimizer $s$ satisfies\n", "\n", "$\n", "\\begin{align}\n", "(H + \\lambda I)s = -g \\\\\n", "(H + \\lambda I) \\succeq 0 \\\\\n", "\\lambda \\left(\\lVert s \\rVert_2 - \\Delta \\right)=0\n", "\\end{align}\n", "$\n", "\n", "from the relevant optimality conditions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1.1 Solving the Subproblem\n", "\n", "#### Coding Task:\n", "\n", "Define the secular function as\n", "\n", "$\n", "\\begin{align}\n", "\\phi(\\lambda) = \\lVert s(\\lambda) \\rVert_2 = \\lVert -(H + \\lambda I)^{-1}g \\rVert_2\n", "\\end{align}\n", "$\n", "\n", "and plot the following on the same plot: \n", "\n", "1. $\\phi(\\lambda)$ against $\\lambda$\n", "\n", "2. the trust region radius $\\Delta$\n", "\n", "3. $-\\lambda_1$ (minus the smallest eigenvalue) of $H$\n", "\n", "4. $\\lambda = 0$\n", "\n", "Use your plot to estimate:\n", "\n", "1. the optimal $\\lambda^*$ and thus the optimal solution $s^*$ to the trust-region subproblem \n", "\n", "2. if the optimal solution $s^*$ is constrained or unconstrained\n", "\n", "for the following input data:\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "1 & 0 & 0 \\\\\n", "0 & 2 & 0 \\\\\n", "0 & 0 & 2\n", "\\end{pmatrix},\\qquad\\quad\n", "g = \n", "\\begin{pmatrix}\n", "1 \\\\\n", "0 \\\\\n", "1\n", "\\end{pmatrix},\\quad\n", "\\text{ and } \\Delta = 2 \n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "1 & 0 & 0 \\\\\n", "0 & 2 & 0 \\\\\n", "0 & 0 & 2\n", "\\end{pmatrix},\\qquad\\quad\n", "g = \n", "\\begin{pmatrix}\n", "1 \\\\\n", "0 \\\\\n", "1\n", "\\end{pmatrix},\\quad\n", "\\text{ and } \\Delta = 5/12 \n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "-2 & 0 & 0 \\\\\n", "0 & -1 & 0 \\\\\n", "0 & 0 & -1\n", "\\end{pmatrix},\\quad\n", "g = \n", "\\begin{pmatrix}\n", "1 \\\\\n", "0 \\\\\n", "1\n", "\\end{pmatrix},\\quad\n", "\\text{ and } \\Delta = 5/12 \n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "-2 & 0 & 0 \\\\\n", "0 & -1 & 0 \\\\\n", "0 & 0 & -1\n", "\\end{pmatrix},\\quad\n", "g = \n", "\\begin{pmatrix}\n", "0 \\\\\n", "0 \\\\\n", "1\n", "\\end{pmatrix},\\quad\n", "\\text{ and } \\Delta = 1/2 \n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "-2 & 0 & 0 \\\\\n", "0 & -1 & 0 \\\\\n", "0 & 0 & -1\n", "\\end{pmatrix},\\quad\n", "g = \n", "\\begin{pmatrix}\n", "0 \\\\\n", "0 \\\\\n", "1\n", "\\end{pmatrix},\\quad\n", "\\text{ and } \\Delta = \\sqrt{2} \n", "\\end{align}\n", "$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Parameters\n", "g = np.array() \n", "H = np.array()\n", "I = np.\n", "Delta = \n", "\n", "# Secular function\n", "psi = lambda l: \n", "\n", "# Evaluate secular function\n", "X = np.arange(...)\n", "Y = np.array([psi(l) for l in X])\n", "\n", "# Mask poles\n", "X[Y>100] = np.inf\n", "Y[Y>100] = np.inf\n", "\n", "# Plot secular function\n", "plt.figure()\n", "plt.plot(X,Y)\n", "plt.xlabel(r'$\\lambda$')\n", "plt.ylabel(r'$\\phi(\\lambda)$')\n", "plt.grid()\n", "\n", "# Plot minus leftmost eigenvalue\n", "plt.axvline(..., color='c')\n", "\n", "# Plot trust region radius\n", "plt.axhline(..., color='m')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1.2 Subproblem Evolution\n", "Solve the trust-region subproblem with \n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "1 & 0 \\\\\n", "0 & 3\n", "\\end{pmatrix},\\quad\n", "g = \n", "\\begin{pmatrix}\n", "1 \\\\\n", "1\n", "\\end{pmatrix}\n", "\\end{align}\n", "$\n", "\n", "as the trust-region radius $\\Delta$ increases and plot the solution trajectory. Hint: carefully use the fact that\n", "\n", "$\n", "\\begin{align}\n", "s(\\lambda) = -(H + \\lambda I)^{-1}\n", "\\end{align}\n", "$\n", "\n", "For what value of the trust-region radius does the solution become unconstrained?\n", "\n", "What about when we change the problem to the following?\n", "\n", "$\n", "\\begin{align}\n", "H = \n", "\\begin{pmatrix}\n", "-3 & 0 \\\\\n", "0 & -1\n", "\\end{pmatrix},\\quad\n", "g = \n", "\\begin{pmatrix}\n", "1 \\\\\n", "1\n", "\\end{pmatrix}\n", "\\end{align}\n", "$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Parameters\n", "g = np.array(...) \n", "H = np.array(...)\n", "I = np.\n", "Delta = \n", "\n", "# Constrained solution\n", "s = lambda l: \n", "\n", "# Evaluate solution trajectory until unconstrained solution\n", "X = np.linspace(...)\n", "Y = np.array([s(l) for l in X])\n", "\n", "# Plot solution trajectory\n", "plt.figure()\n", "plt.plot(...)\n", "plt.grid()\n", "\n", "# Evaluate trust-region function\n", "fun = lambda s:\n", "X = np.linspace(...)\n", "Y = np.linspace(...)\n", "Z = fun(np.meshgrid(X,Y))\n", "\n", "# Plot trust-region function\n", "plt.contour(...)\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.2 Trust Region Methods\n", "Consider the Rosenbrock function defined as\n", "\n", "$\n", "f(x) = (a - x_1)^2 + b(x_2 - x_1^2)^2\n", "$\n", "\n", "where usually $a=1$ and $b=100$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.1 Trust Region Exact\n", "\n", "A trust-region algorithm that solves the trust-region subproblem exactly, using Newton's method on the secular equation (as discussed in lectures):\n", "\n", "$\n", "\\begin{align}\n", "&\\text{Factorize } H + \\lambda I = LL^T \\\\\n", "&\\text{Solve } Lu = -g \\\\\n", "&\\text{Solve } L^Ts = u \\\\\n", "&\\text{Solve } Lw = s \\\\\n", "&\\text{Set } \\lambda = \\lambda + \\left( \\frac{\\lVert s \\rVert_2 - \\Delta}{\\Delta} \\right)\\left( \\frac{\\lVert s \\rVert^2_2}{\\lVert w \\rVert^2_2} \\right)\n", "\\end{align}\n", "$\n", "\n", "with suitable safeguards and special handling of the hard case.\n", "\n", "#### Coding Task: \n", "Now apply Scipy's trust-exact starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Parameters\n", "a = 1\n", "b = 100\n", "\n", "# Initial guess\n", "x0 = np.array(...)\n", "\n", "# Objective function, gradient and Hessian\n", "fun = lambda x: \n", "jac = lambda x: \n", "hess = lambda x: \n", "\n", "# Plot function contours\n", "plt.figure()\n", "X = np.linspace(...)\n", "Y = np.linspace(...)\n", "Z = np.meshgrid(X,Y)\n", "plt.contour(...)\n", "plt.colorbar()\n", "\n", "# Call Scipy's CG\n", "xprev = x0 # for plotting\n", "res = opt.minimize(..., method=..., callback=callback)\n", "\n", "# Print results and show plot\n", "print(res)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.2 Trust Region CG\n", "\n", "A trust region algorithm that solves the trust-region subproblem approximately using CG (as discussed in lectures). Essentially this applies CG to solve\n", "\n", "$\n", "\\begin{align}\n", "(H + \\lambda I)s = -g\n", "\\end{align}\n", "$\n", "\n", "truncating to the trust region radius $\\Delta$ before encountering negative curvature or exceeding $\\Delta$.\n", "\n", "#### Coding Task: \n", "Now apply Scipy's trust-ncg starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Parameters\n", "a = 1\n", "b = 100\n", "\n", "# Initial guess\n", "x0 = np.array(...)\n", "\n", "# Objective function, gradient and Hessian\n", "fun = lambda x: \n", "jac = lambda x: \n", "hess = lambda x: \n", "\n", "# Plot function contours\n", "plt.figure()\n", "X = np.linspace(...)\n", "Y = np.linspace(...)\n", "Z = np.meshgrid(X,Y)\n", "plt.contour(...)\n", "plt.colorbar()\n", "\n", "# Call Scipy's CG\n", "xprev = x0 # for plotting\n", "res = opt.minimize(..., method=..., callback=callback)\n", "\n", "# Print results and show plot\n", "print(res)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.3 Trust Region Krylov\n", "\n", "A trust region algorithm that solves the trust-region subproblem approximately using Lanczos (as discussed in lectures).\n", "\n", "#### Coding Task: \n", "Now apply Scipy's trust-krylov starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Parameters\n", "a = 1\n", "b = 100\n", "\n", "# Initial guess\n", "x0 = np.array(...)\n", "\n", "# Objective function, gradient and Hessian\n", "fun = lambda x: \n", "jac = lambda x: \n", "hess = lambda x: \n", "\n", "# Plot function contours\n", "plt.figure()\n", "X = np.linspace(...)\n", "Y = np.linspace(...)\n", "Z = np.meshgrid(X,Y)\n", "plt.contour(...)\n", "plt.colorbar()\n", "\n", "# Call Scipy's CG\n", "xprev = x0 # for plotting\n", "res = opt.minimize(..., method=..., callback=callback)\n", "\n", "# Print results and show plot\n", "print(res)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercises: \n", "\n", "1. Compare the number of objective function, gradient and Hessian evaluations across the different trust-region methods above (these are reported as nfev, njev and nhev in the outputs above). How do these compare to linesearch?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.7" } }, "nbformat": 4, "nbformat_minor": 4 }