{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Linesearch\n",
    "For this training course we'll be using the optimization routines in SciPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy.optimize as opt\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also need to define a callback function so that we can plot the optimization steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def callback(x):\n",
    "    global xprev\n",
    "    plt.plot([xprev[0],x[0]],[xprev[1],x[1]],'b.-')\n",
    "    xprev = x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 Naive Linesearch Methods\n",
    "Consider the quadratic optimization problem\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\min_{x} f(x) = \\frac{1}{2}(a x_1^2 + x_2^2)\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "for some $a >0$.\n",
    "\n",
    "The gradient for this problem is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla f(x) = \n",
    "\\begin{pmatrix}\n",
    "a x_1 \\\\\n",
    "x_2\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and thus $x^* = 0$ is the stationary point.\n",
    "\n",
    "The Hessian for this problem is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla^2 f(x) = \n",
    "\\begin{pmatrix}\n",
    "a & 0 \\\\\n",
    "0 & 1\n",
    "\\end{pmatrix} \n",
    "\\succ 0\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and hence (as $a >0$) the problem is convex, so $x^* = 0$ is the unique global minimizer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1.1 Steepest Descent\n",
    "Recall that steepest descent proceeds as a step of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the steepest descent direction \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "p^k = -\\nabla f(x^k)\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "with step-size $\\alpha^k$ obtained by linesearch. For convex quadratics, we have an explicit formula for exact linesearch:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\alpha^k = -\\frac{(p^k)^T \\nabla f(x^k)}{(p^k)^T\\nabla^2f(x^k)p^k} = \\frac{\\nabla f(x^k)^T\\nabla f(x^k)}{\\nabla f(x^k)^T \\nabla^2 f(x^k) \\nabla f(x^k)} = \\frac{a^2 (x_1^k)^2 + (x_2^k)^2}{a^3 (x_1^k)^2 + (x_2^k)^2}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "for our specific quadratic objective function $f(x) = 0.5(a x_1^2 + x_2^2)$ defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Textbook steepest descent method with exact linesearch for 0.5(ax_1^2+x_2^2)\n",
    "# fun - objective function \n",
    "# x0 - starting point\n",
    "# jac - objective function gradient\n",
    "# tol - tolerance for termination\n",
    "# callback - callback function (for plotting)\n",
    "def steepest_descent(fun, x0, args=(), jac=None, tol=1e-1, callback=None, **options):\n",
    "    \n",
    "    x = x0\n",
    "    it = 0\n",
    "    while np.linalg.norm(jac(x)) > tol: # first-order optimality (zero gradient)\n",
    "        alpha = ((a**2)*x[0]**2 + x[1]**2)/((a**3)*x[0]**2 + x[1]**2) # exact linesearch for 0.5(ax_1^2+x_2^2)\n",
    "        x = x + alpha*(-jac(x)) # steepest descent iteration\n",
    "\n",
    "        if callback is not None: # callback function (for plotting)\n",
    "            callback(x)\n",
    "            \n",
    "        it += 1 # number of iterations\n",
    "        \n",
    "    return opt.OptimizeResult(x=x, fun=fun(x), jac=jac(x), nit=it, success=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task:\n",
    "Now apply the steepest descent method starting from $x^0 = (1,a)$ to solve the above problem for various values of $a$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Scaling parameter\n",
    "a = 10\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function and gradient\n",
    "fun = lambda x:\n",
    "jac = lambda x:\n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call steepest descent via scipy\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=steepest_descent, callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is going on here? Steepest descent is sensitive to problem scaling: one can show that each iterate is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^k = \n",
    "\\left(\n",
    "\\frac{a-1}{a+1}\n",
    "\\right)^k\n",
    "\\begin{pmatrix}\n",
    "(-1)^k \\\\\n",
    "a\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and thus the iterates jump around. Moreover since \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{\\lVert x^{k+1} - x^* \\rVert}{\\lVert x^k - x^* \\rVert} = \\frac{\\lVert x^{k+1} - 0 \\rVert}{\\lVert x^k - 0 \\rVert} = \\left| \\frac{a-1}{a+1} \\right| < 1\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "the convergence is linear with rate $|(a-1)/(a+1)|$ and the closer to $1$ this is (the larger $a$ is) the more iterations will be needed to converge. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1.2 (Damped) Newton's Method\n",
    "Recall that (damped) Newton's method proceeds as a step of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the Newton direction $p^k$ which solves\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla^2 f(x^k) \\, p^k = -\\nabla f(x^k)\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "with step-size $\\alpha^k$ obtained by linesearch. For convex quadratics, we have an explicit formula for exact linesearch:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\alpha^k = -\\frac{(p^k)^T \\nabla f(x^k)}{(p^k)^T\\nabla^2f(x^k)p^k} = \\frac{\\left(\\nabla^2 f(x^k)^{-1}\\nabla f(x^k)\\right)^T\\nabla f(x^k)}{\\left(\\nabla^2 f(x^k)^{-1}\\nabla f(x^k)\\right)^T \\nabla^2 f(x^k) \\nabla^2 f(x^k)^{-1}\\nabla f(x^k)} = \\, 1\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "for any convex quadratic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Textbook (damped) Newton method with exact linesearch for for convex quadratics\n",
    "# fun - objective function \n",
    "# x0 - starting point\n",
    "# jac - objective function gradient\n",
    "# hess - objective function Hessian\n",
    "# tol - tolerance for termination\n",
    "# callback - callback function (for plotting)\n",
    "def newton(fun, x0, args=(), jac=None, hess=None, tol=1e-1, callback=None, **options):\n",
    "    \n",
    "    x = x0\n",
    "    it = 0\n",
    "    while np.linalg.norm(jac(x)) > tol: # first-order optimality (zero gradient)\n",
    "        alpha = 1 # exact linesearch for convex quadratics\n",
    "        x = x + alpha*np.linalg.solve(hess(x),-jac(x)) # Newton iteration\n",
    "\n",
    "        if callback is not None: # callback function (for plotting)\n",
    "            callback(x)\n",
    "            \n",
    "        it += 1 # number of iterations\n",
    "        \n",
    "    return opt.OptimizeResult(x=x, fun=fun(x), jac=jac(x), hess=hess(x), nit=it, success=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply Newton's method starting from $x^0 = (1,a)$ to solve the above problem for various values of $a$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scaling parameter\n",
    "a = 10\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Newton's method via scipy\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=newton, callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is going on here? Newton's method is not sensitive to problem scaling: the first iterate is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^1 = x^0 - \\nabla^2 f(x^0)^{-1}\\nabla f(x^0) = x^0 - \n",
    "\\begin{pmatrix}\n",
    "1/a & 0 \\\\\n",
    "0 & 1\n",
    "\\end{pmatrix}\n",
    "\\begin{pmatrix}\n",
    "a x_1^0 \\\\\n",
    "x_2^0\n",
    "\\end{pmatrix}\n",
    "= x^0 - x^0 = 0 = x^*\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and thus we converge in one step. Moreover, consider the general convex quadratic\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "f(x) = \\frac{1}{2}x^TAx + b^Tx\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "with $A \\succ 0$, then (solving $\\nabla f(x) = 0$) we get that $x^* = -A^{-1}b$ and \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^1 = x^0 - \\nabla^2 f(x^0)^{-1}\\nabla f(x^0) = x^0 - A^{-1}(Ax^0 + b) = -A^{-1}b = x^*\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "therefore Newton's method converges in one step for any convex quadratic! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1.3 Conjugate Gradient Method\n",
    "Recall that for a general convex quadratic ($A \\succ 0$)\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "f(x) = \\frac{1}{2}x^TAx + b^Tx\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "the conjugate gradient method proceeds as steps of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the conjugate directions \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "p^k = -\\nabla f(x^k) + \\beta^k p^{k-1}\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "where $\\beta^k$ is chosen to ensure $A$-conjugacy of the directions\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\beta_k = \\frac{\\nabla f(x^k)^T\\nabla f(x^k)}{\\nabla f(x^{k-1})^T\\nabla f(x^{k-1})}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and the step-size $\\alpha^k$ is given by exact linesearch (from our explicit formula)\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\alpha^k = \\frac{\\nabla f(x^k)^T \\nabla f(x^k)}{(p^k)^TAp^k}. \n",
    "\\end{align}\n",
    "$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Textbook conjugate gradient method for convex quadratics\n",
    "# fun - objective function \n",
    "# x0 - starting point\n",
    "# jac - objective function gradient\n",
    "# hess - objective function Hessian\n",
    "# tol - tolerance for termination\n",
    "# callback - callback function (for plotting)\n",
    "def conjugate_gradient(fun, x0, args=(), jac=None, hess=None, tol=1e-1, callback=None, **options):\n",
    "    \n",
    "    x = x0\n",
    "    g = jac(x)\n",
    "    A = hess(x)\n",
    "    p = -g\n",
    "    it = 0\n",
    "    while np.linalg.norm(g) > tol: # first-order optimality (zero gradient)\n",
    "        alpha = g.T.dot(g)/p.T.dot(A.dot(p)) # exact linesearch\n",
    "        x = x + alpha*p # take step\n",
    "        gnew = g + alpha*A.dot(p) # new gradient\n",
    "        beta = gnew.dot(gnew)/g.dot(g)\n",
    "        p = -gnew + beta*p # new conjugate direction\n",
    "        g = gnew\n",
    "        \n",
    "        if callback is not None: # callback function (for plotting)\n",
    "            callback(x)\n",
    "            \n",
    "        it += 1 # number of iterations\n",
    "        \n",
    "    return opt.OptimizeResult(x=x, fun=fun(x), jac=jac(x), hess=hess(x), nit=it, success=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply conjugate gradient starting from $x^0 = (1,a)$ to solve the above problem for various values of $a$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Scaling parameter\n",
    "a = 10\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Newton's method via scipy\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=conjugate_gradient, callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is going on here? Conjugate gradient is guaranteed to converge in at most as many iterations as there are dimensions of the problem, in this case since we are in 2D that means just two iterations! \n",
    "\n",
    "In practice, convergence speed depends on the spectrum of the Hessian (and can be much faster for large problems):\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\frac{\\lVert x^k - x^* \\rVert_A}{\\lVert x^0 - x^* \\rVert_A}\n",
    "\\le 2 \\left( \\frac{\\sqrt{\\kappa(A)}-1}{\\sqrt{\\kappa(A)}+1} \\right)^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "where $\\kappa(A)$ is the Hessian condition number."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 More Sophisticated Linesearch Methods\n",
    "Consider the Rosenbrock function defined as\n",
    "\n",
    "$\n",
    "f(x) = (a - x_1)^2 + b(x_2 - x_1^2)^2\n",
    "$\n",
    "\n",
    "where usually $a=1$ and $b=100$.\n",
    "\n",
    "#### Exercises: \n",
    "1. Find the stationary point $x^*$ of this function.\n",
    "2. Compute numerically the eigenvalue of the Hessian at $x^*$ to verify that $x^*$ is a minimum.\n",
    "\n",
    "#### Answers:\n",
    "The gradient for this function is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla f(x) = \n",
    "\\begin{pmatrix}\n",
    "-2(a - x_1) -2x_1 2b(x_2 - x_1^2) \\\\\n",
    "2b(x_2 - x_1^2)\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and thus $x^* = (a,a^2)$ is the stationary point.\n",
    "\n",
    "The Hessian for this function is\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla^2 f(x) = \n",
    "\\begin{pmatrix}\n",
    "2 -4bx_2 +12bx_1^2 & -4bx_1 \\\\\n",
    "-4bx_1 & 2b\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and since \n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla^2 f(x^*) = \n",
    "\\begin{pmatrix}\n",
    "2 +8ba^2 & -4ba \\\\\n",
    "-4ba & 2b\n",
    "\\end{pmatrix}\n",
    "\\succ 0\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "$x^* = (a,a^2)$ is the minimizer, which is global since $f(x^*) = 0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.1 Nonlinear Conjugate Gradient\n",
    "Recall that nonlinear conjugate gradient proceeds as a step of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the conjugate direction\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "p^k = -\\nabla f(x^k) + \\beta^k p^{k-1}\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "where $\\beta^k$ is chosen to ensure conjugacy of the directions, several choices are possible here, e.g. Fletcher-Reeves:\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\beta_k = \\frac{\\nabla f(x^k)^T\\nabla f(x^k)}{\\nabla f(x^{k-1})^T\\nabla f(x^{k-1})}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "and the step-size $\\alpha^k$ is obtained by suitable linesearch."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply Scipy's CG starting from $x^0 = (2a,2a)$ to solve the above problem for $a=1$ and $b=100$. Then try again from $x^0 = (10a,10a)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function and gradient\n",
    "fun = lambda x: \n",
    "jac = lambda x:\n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Scipy's CG\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.2 Truncated Newton\n",
    "Recall that Truncated Newton methods proceed as a step of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the approximate Newton direction $p^k$ which approximately solves\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "\\nabla^2 f(x^k) \\, p^k \\approx -\\nabla f(x^k)\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "for example, using a limited number of conjugate-gradient iterations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply Scipy's Newton-CG starting from $x^0 = (2a,2a)$ to solve the above problem for $a=1$ and $b=100$. Then try again from $x^0 = (10a,10a)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function, gradient and Hessian\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "hess = lambda x:\n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...\n",
    "plt.colorbar()\n",
    "\n",
    "# Call Scipy's Newton-CG\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.3 Quasi-Newton\n",
    "Recall that Quasi-Newton proceed as a step of the form\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "x^{k+1} = x^k + \\alpha^k p^k\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "along the Quasi-Newton direction $p^k$ which solves\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "B^k \\, p^k = -\\nabla f(x^k)\n",
    "\\end{align}\n",
    "$ \n",
    "\n",
    "where $B^k$ is a suitable approximation to $\\nabla^2 f(x^k)$, chosen to statisfy the secant equation\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "B^{k+1} (x^{k+1} - x^k) = \\nabla f(x^{k+1}) - \\nabla f(x^k)\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "For example, one of the most popular approximations is BFGS (a rank two update)\n",
    "\n",
    "$\n",
    "\\begin{align}\n",
    "B^{k+1} = B^k + \\frac{y^k(y^k)^T}{(y^k)^Ts^k} - \\frac{B^ks^k(B^ks^k)^T}{(s^k)^TB^ks^k}\n",
    "\\end{align}\n",
    "$\n",
    "\n",
    "where $s^k = x^{k+1} - x^k$ and $y^k = \\nabla f(x^{k+1}) - \\nabla f(x^k)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply Scipy's BFGS starting from $x^0 = (2a,2a)$ to solve the above problem for $a=1$ and $b=100$. Then try again from $x^0 = (10a,10a)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function and gradient\n",
    "fun = lambda x: \n",
    "jac = lambda x:\n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call SciPy's BFGS\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback)\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.4 Limited-Memory Quasi-Newton\n",
    "Recall that limited-memory Quasi-Newton is a variant of Quasi-Newton which stores only the last $m$ values of $s^k$ and $y^k$ for some small $m$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Coding Task: \n",
    "Now apply Scipy's L-BFGS-B starting from $x^0 = (2a,2a)$ to solve the above problem for $a=1$ and $b=100$. Then try again from $x^0 = (10a,10a)$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "a = 1\n",
    "b = 100\n",
    "m = 10\n",
    "\n",
    "# Initial guess\n",
    "x0 = np.array(...)\n",
    "\n",
    "# Objective function and gradient\n",
    "fun = lambda x: \n",
    "jac = lambda x: \n",
    "\n",
    "# Plot function contours\n",
    "plt.figure()\n",
    "X = np.linspace(...)\n",
    "Y = np.linspace(...)\n",
    "Z = np.meshgrid(X,Y)\n",
    "plt.contour(...)\n",
    "plt.colorbar()\n",
    "\n",
    "# Call SciPy's L-BFGS-B\n",
    "xprev = x0 # for plotting\n",
    "res = opt.minimize(..., method=..., callback=callback, options=dict(maxcor=m))\n",
    "\n",
    "# Print results and show plot\n",
    "print(res)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercises: \n",
    "\n",
    "1. Compare the number of objective function, gradient and Hessian evaluations across the different linesearch methods above (these are reported as nfev, njev and nhev in the outputs above). What do you observe?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}