{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Trust Region\n",
    "For this training course we'll be using the optimization routines in SciPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy.optimize as opt\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also need to define a callback function so that we can plot the optimization steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def callback(x):\n",
    "    global xprev\n",
    "    plt.plot([xprev[0],x[0]],[xprev[1],x[1]],'b.-')\n",
    "    xprev = x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 Trust Region Subproblem\n",
    "Consider the trust-region subproblem given by\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "&\\min_{s \\in \\mathbb{R}^n} s^Tg + \\frac{1}{2}s^THs \\\\\n",
    "&\\text{ s.t. } \\lVert s \\rVert_2 \\le \\Delta \n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and recall that any global minimizer $s$ satisfies\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "(H + \\lambda I)s = -g \\\\\n",
    "(H + \\lambda I) \\succeq 0 \\\\\n",
    "\\lambda \\left(\\lVert s \\rVert_2 - \\Delta \\right)=0\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "from the relevant optimality conditions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.1 Solving the Subproblem\n",
    "\n",
    "#### Coding Task:\n",
    "\n",
    "Define the secular function as\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\phi(\\lambda) = \\lVert s(\\lambda) \\rVert_2 = \\lVert -(H + \\lambda I)^{-1}g  \\rVert_2\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and plot the following on the same plot: \n",
    "\n",
    "1. $\\phi(\\lambda)$ against $\\lambda$\n",
    "\n",
    "2. the trust region radius $\\Delta$\n",
    "\n",
    "3. $-\\lambda_1$ (minus the smallest eigenvalue) of $H$\n",
    "\n",
    "4. $\\lambda = 0$\n",
    "\n",
    "Use your plot to estimate:\n",
    "\n",
    "1. the optimal $\\lambda^*$ and thus the optimal solution $s^*$ to the trust-region subproblem \n",
    "\n",
    "2. if the optimal solution $s^*$ is constrained or unconstrained\n",
    "\n",
    "for the following input data:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "1 & 0 & 0 \\\\\n",
    "0 & 2 & 0 \\\\\n",
    "0 & 0 & 2\n",
    "\\end{pmatrix},\\qquad\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "1 \\\\\n",
    "0 \\\\\n",
    "1\n",
    "\\end{pmatrix},\\quad\n",
    "\\text{ and } \\Delta = 2 \n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "1 & 0 & 0 \\\\\n",
    "0 & 2 & 0 \\\\\n",
    "0 & 0 & 2\n",
    "\\end{pmatrix},\\qquad\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "1 \\\\\n",
    "0 \\\\\n",
    "1\n",
    "\\end{pmatrix},\\quad\n",
    "\\text{ and } \\Delta = 5/12 \n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "-2 & 0 & 0 \\\\\n",
    "0 & -1 & 0 \\\\\n",
    "0 & 0 & -1\n",
    "\\end{pmatrix},\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "1 \\\\\n",
    "0 \\\\\n",
    "1\n",
    "\\end{pmatrix},\\quad\n",
    "\\text{ and } \\Delta = 5/12 \n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "-2 & 0 & 0 \\\\\n",
    "0 & -1 & 0 \\\\\n",
    "0 & 0 & -1\n",
    "\\end{pmatrix},\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "0 \\\\\n",
    "0 \\\\\n",
    "1\n",
    "\\end{pmatrix},\\quad\n",
    "\\text{ and } \\Delta = 1/2 \n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "-2 & 0 & 0 \\\\\n",
    "0 & -1 & 0 \\\\\n",
    "0 & 0 & -1\n",
    "\\end{pmatrix},\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "0 \\\\\n",
    "0 \\\\\n",
    "1\n",
    "\\end{pmatrix},\\quad\n",
    "\\text{ and } \\Delta = \\sqrt{2} \n",
    "\\end{align}\n",
    "$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "g = np.array() \n",
    "H = np.array()\n",
    "I = np.\n",
    "Delta = \n",
    "\n",
    "# Secular function\n",
    "psi = lambda l: \n",
    "\n",
    "# Evaluate secular function\n",
    "X = np.arange(...)\n",
    "Y = np.array([psi(l) for l in X])\n",
    "\n",
    "# Mask poles\n",
    "X[Y>100] = np.inf\n",
    "Y[Y>100] = np.inf\n",
    "\n",
    "# Plot secular function\n",
    "plt.figure()\n",
    "plt.plot(X,Y)\n",
    "plt.xlabel(r'$\\lambda$')\n",
    "plt.ylabel(r'$\\phi(\\lambda)$')\n",
    "plt.grid()\n",
    "\n",
    "# Plot minus leftmost eigenvalue\n",
    "plt.axvline(..., color='c')\n",
    "\n",
    "# Plot trust region radius\n",
    "plt.axhline(..., color='m')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1.2 Subproblem Evolution\n",
    "Solve the trust-region subproblem with \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "1 & 0 \\\\\n",
    "0 & 3\n",
    "\\end{pmatrix},\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "1 \\\\\n",
    "1\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "as the trust-region radius $\\Delta$ increases and plot the solution trajectory. Hint: carefully use the fact that\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "s(\\lambda) = -(H + \\lambda I)^{-1}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "For what value of the trust-region radius does the solution become unconstrained?\n",
    "\n",
    "What about when we change the problem to the following?\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "H = \n",
    "\\begin{pmatrix}\n",
    "-3 & 0 \\\\\n",
    "0 & -1\n",
    "\\end{pmatrix},\\quad\n",
    "g = \n",
    "\\begin{pmatrix}\n",
    "1 \\\\\n",
    "1\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "g = np.array(...) \n",
    "H = np.array(...)\n",
    "I = np.\n",
    "Delta = \n",
    "\n",
    "# Constrained solution\n",
    "s = lambda l: \n",
    "\n",
    "# Evaluate solution trajectory until unconstrained solution\n",
    "X = np.linspace(...)\n",
    "Y = np.array([s(l) for l in X])\n",
    "\n",
    "# Plot solution trajectory\n",
    "plt.figure()\n",
    "plt.plot(...)\n",
    "plt.grid()\n",
    "\n",
    "# Evaluate trust-region function\n",
    "fun = lambda s:\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = fun(np.meshgrid(X,Y))\n",
    "\n",
    "# Plot trust-region function\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 Trust Region Methods\n",
    "Consider the Rosenbrock function defined as\n",
    "\n",
    "$\n",
    "f(x) = (a - x_1)^2 + b(x_2 - x_1^2)^2\n",
    "$\n",
    "\n",
    "where usually $a=1$ and $b=100$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.1 Trust Region Exact\n",
    "\n",
    "A trust-region algorithm that solves the trust-region subproblem exactly, using Newton's method on the secular equation (as discussed in lectures):\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "&\\text{Factorize } H + \\lambda I = LL^T \\\\\n",
    "&\\text{Solve } Lu = -g \\\\\n",
    "&\\text{Solve } L^Ts = u \\\\\n",
    "&\\text{Solve } Lw = s \\\\\n",
    "&\\text{Set } \\lambda = \\lambda + \\left( \\frac{\\lVert s \\rVert_2 - \\Delta}{\\Delta} \\right)\\left( \\frac{\\lVert s \\rVert^2_2}{\\lVert w \\rVert^2_2} \\right)\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "with suitable safeguards and special handling of the hard case.\n",
    "\n",
    "#### Coding Task: \n",
    "Now apply Scipy's trust-exact starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Scipy's CG\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.2 Trust Region CG\n",
    "\n",
    "A trust region algorithm that solves the trust-region subproblem approximately using CG (as discussed in lectures). Essentially this applies CG to solve\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "(H + \\lambda I)s = -g\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "truncating to the trust region radius $\\Delta$ before encountering negative curvature or exceeding $\\Delta$.\n",
    "\n",
    "#### Coding Task: \n",
    "Now apply Scipy's trust-ncg starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Scipy's CG\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2.3 Trust Region Krylov\n",
    "\n",
    "A trust region algorithm that solves the trust-region subproblem approximately using Lanczos (as discussed in lectures).\n",
    "\n",
    "#### Coding Task: \n",
    "Now apply Scipy's trust-krylov starting from $x^0 = (-a,a)$ to solve the above problem for $a=1$ and $b=100$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Scipy's CG\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercises: \n",
    "\n",
    "1. Compare the number of objective function, gradient and Hessian evaluations across the different trust-region methods above (these are reported as nfev, njev and nhev in the outputs above). How do these compare to linesearch?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}