LSMR - Sparse Least Squares LSMR Solver

#include <spral_lsmr.h> /* or <spral.h> for all packages */

Purpose

This package uses the LSMR iterative method to solve sparse linear equations and sparse least-squares problems of the form:

\[\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|x\|_2 \mbox{ subject to }Ax = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|Ax - b\|_2^2,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|Ax - b\|_2^2 + \lambda^2\|x\|_2^2, \end{array}\end{split}\]

where the \(m \times n\) matrix \(A\) may be square or rectangular, and may have any rank. The scalar \(\lambda\) is a damping parameter. If \(\lambda > 0\), the solution is regularized in the sense that a unique soluton always exists, and \(\|x\|_2\) is always bounded.

Preconditioning may be used to try to reduce the number of iterations. A suitable choice for the preconditioner depends on the user’s knowledge of \(A\). For a user-chosen \(n \times n\) nonsingular matrix \(P\), LSMR solves

\[\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|Py\|_2 \mbox{ subject to }APy = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|APy - b\|_2^2 ,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|APy - b\|_2^2 + \lambda^2\|Py\|_2^2 , \end{array}\end{split}\]

The user must then recover the final solution \(x\) by computing \(x=Py\). \(P\) will be a good preconditioner if \(AP\) is significantly better conditioned than \(A\).

Reverse communication is used for preconditioning operations \(Pz\) and \(P^Tz\) and matrix-vector products of the form \(Av\) and \(A^Tu\).

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation \((A^TA+\lambda^2I)x=A^Tb\) (or \(((AP)^T(AP)+\lambda^2I)y=(AP)^Tb\)), but has better numerical properties, especially if \(A\) is ill-conditioned.

Notation

In the rest of this documentation, we use the following notation:

\[\begin{split}\bar{A} = \left( \begin{array}{c} A \\ \lambda I \end{array} \right)P, \hspace{1cm} \bar{b} = \left( \begin{array}{c} b \\ 0 \end{array} \right),\end{split}\]

and

\[r = b - APy , \hspace{1cm} \bar{r} = \bar{b} - \bar{A}y, \hspace{1cm} x = Py.\]

Version history

2016-05-10 Version 1.0.0.: Initial release.

[For detailed history, see ChangeLog]

Subroutines

void spral_lsmr_default_options(struct spral_lsmr_options *options)

Initialises options to default values.

Parameters:

options – data structure to be initialised.

int spral_lsmr_solve(int *action, int m, int n, double u[m], double v[n], double y[n], void **keep, struct spral_lsmr_options const *options, struct spral_lsmr_inform *inform, double *damp)

Solve the least squares problem. Uses reverse communication. Upon return the user must perform a task specified by the action parameter and recall the routine. Possible values of action and associated tasks are:

action	Task to be performed
0	None. Computation has terminated. Check inform.flag for reason.
1	The user must compute \[v = v + P^TA^Tu\] without altering \(u\).
2	The user must compute \[u = u + APv\] without altering \(v\).
3	The user may test for convergence. To continue, recall without changing any of the arguments.

Parameters:

action – reverse communication task (see table above). Must be 0 on first call.
m – number of rows in \(A\).
n – number of columns in \(A\).
u – the vector \(u\). Must contain \(b\) on first call.
v – the vector \(v\).
y – the current solution \(y\) of the preconditioned problem (note \(x=Py\)).
keep – private internal data for LSMR.
options – controls execution of algorithm.
inform – information about execution of algorithm. Must not be changed by the user.
damp – Damping parameter \(\lambda\).

Returns:

Value of spral_lsmr_inform.flag.

int lsmr_free(void **keep)

Free memory allocated in keep. If a series of problems is being solved sequentially, the same keep may be used without calling spral_lsmr_free() between each solution.

Parameters:

keep – private data to be freed.

Returns:

0 on success, or Fortran stat parameter on failed deallocation.

Derived types

struct spral_lsmr_options

Specify options used by LSMR algorithm.

int print_freq_head: Frequency of printing heading information (that is, how many lines are printed before the heading information is reprinted). Default is 20.

int print_freq_itn: Frequency of printing status. There is printing on each of the first print_freq_itn iterations and then printing every print_freq_itn iterations. Default is 10.

int unit_diagnostics: Fortran unit for diagnostic printing. Printing is suppressed if negative. Default is 6.

int unit_error: Fortran unit for printing error messages. Printing is suppressed if negative. Default is 6.

double atol: Relative error in \(A\). i.e. if \(A\) is accurate to about 6 digits, set atol to 1.0e-6. Only used if spral_lsmr_options.ctest =3. Default is sqrt(DBL_EPSILON).

double btol: Relative error in \(b\). i.e. if \(b\) is accurate to about 6 digits, set btol to 1.0e-6. Only used if spral_lsmr_options.ctest =3. Default is sqrt(DBL_EPSILON).

double conlim: Upper limit on \(cond(\bar{A})\), apparent condition number of \(\bar{A}\). Only used if spral_lsmr_options.ctest =3. Default is 1/(10*sqrt(DBL_EPSILON)):

int ctest

Convergence test to use. Options are:

1	User to test convergence (action=3). Without computation of norms in inform.
2	User to test convergence (action=3). With computation of norms in inform.
3 (default)	Convergence test of Fond and Saunders (in preconditioner norm).

Default is 3.

int itnlim: Maximum number of iterations. If negative, the limit used is \(4n\). Default is -1.

int itn_test: Number of iterations between user convergence tests. If negative, use \(\min(n,10)\). Default is -1.

int localSize: Number of historical vectors to use for reorthogonalization. Default is 0.

struct spral_lsmr_inform

Information about progress of algorithm.

double condAP: Estimate of \(cond(\bar{A})\). A very high value of condAP may again indicate an error in the products with \(A\), \(A^T\), \(P\), or \(P^T\). A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

int flag: Exit status of algorithm. See table below.

int itn: Number of iterations performed

double normAP: Estimate of Frobenius norm of \(\bar{A}\). If \(\lambda\) is small and the columns of \(AP\) have all been scaled to have length 1.0, normAP should increase to roughly \(\sqrt{n}\). A radically different value for normAP may indicate an error in the user-supplied products with \(A\), \(A^T\), \(P\), or \(P^T\). A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normAP: Estimate of \(\|\bar{A}^T\bar{r}\|_2\), (normal equations residual). This should be small in all cases. Note that normAPr will often be smaller than the true value. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normr: Estimate of \(\|\bar{r}\|_2\). This will be small if \(Ax = b\) has a solution. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normy: Estimate of \(\|y\|_2\). A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

int stat: The Fortran stat parameter in the event of a failed allocation.

inform.flag	Interpretation
0	\(x = 0.0\) is the exact solution. No iterations were performed.
1	The equations \(Ax = b\) are probably compatible. \(\\|Ax - b\\|_2\) is sufficiently small, given the values of `spral_lsmr_options.atol` and `spral_lsmr_options.btol`. (`spral_lsmr_options.ctest` =3 only).
2	If damp is not present or is zero then the system \(Ax = b\) is probably not compatible. A least-squares solution has been obtained that is sufficiently accurate, given the value of `spral_lsmr_options.atol`. Otherwise, damped least-squares solution has been obtained that is sufficiently accurate, given the value of `spral_lsmr_options.atol`. (`spral_lsmr_options.ctest` =3 only).
3	An estimate of cond(\(\bar{A}\)) has exceeded `spral_lsmr_options.conlim`. The system \(Ax = b\) appears to be ill-conditioned, or there could be an error in the products with \(A\), \(A^T\), \(P\), or \(P^T\). (`spral_lsmr_options.ctest` =3 only).
4	\(\\|APy - b \\|_2\) is small enough for this machine. (`spral_lsmr_options.ctest` =3 only).
5	The least-squares solution is good enough for this machine. (`spral_lsmr_options.ctest` =3 only).
6	The estimate `spral_lsmr_inform.condAP` appears to be too large for this machine. (`spral_lsmr_options.ctest` =3 only).
7	The iteration limit `spral_lsmr_options.itnlim` has been reached.
8	An array allocation failed.
9	An array deallocation failed.
10	Either m<0 or n<0.

Example

The following code illustrates the use of LMSR

/* examples/C/lsmr.c - Example code for SPRAL_LSMR package */
#include "spral.h"

#include <stdlib.h>
#include <stdio.h>

void matrix_mult(int m, int n, int const *ptr, int const *row,
      double const *val, double const *v, double *u);
void matrix_mult_trans(int m, int n, int const *ptr, int const *row,
      double const *val, double const *u, double *v);

int main(void) {
   /* Derived types */
   void *keep;
   struct spral_lsmr_options options;
   struct spral_lsmr_inform inform;

   /* Initialize derived types */
   keep = NULL;
   spral_lsmr_default_options(&options);
   options.unit_diagnostics = -1; /* Switch off diagnostic printing */

   /* Data for matrix:
    * ( 1.0     -1.0 )
    * (     2.0      )
    * ( 2.0      2.0 )
    * ( 5.0 3.0 -2.0 )
    * (          6.0 ) */
   const int m = 5, n = 3;
   int ptr[]    = {   0,             3,         5,                 9 };
   int row[]    = {   0,   2,   3,   1,   3,    0,   2,    3,   4 };
   double val[] = { 1.0, 2.0, 5.0, 2.0, 3.0, -1.0, 2.0, -2.0, 6.0 };
   /* Data for rhs b */
   double b[] = { 1.0, 1.0, 1.0, 1.0, 1.0 };

   /* prepare for LSMR calls (using no preconditioning) */
   int action = 0;
   double u[m], v[n], x[n];
   for(int i=0; i<m; ++i) u[i] = b[i];

   bool done = false;
   while(!done) {
      spral_lsmr_solve(&action, m, n, u, v, x, &keep, &options, &inform, NULL);

      switch(action) {
      case 0: /* we are done */
         printf("Exit LSMR with inform.flag = %d and inform.itn = %d\n",
               inform.flag, inform.itn);
         printf("LS solution is:\n");
         for(int i=0; i<n; ++i) printf(" %10.2f", x[i]);
         printf("\n");
         done = true;
         break;

      case 1: /* Compute v = v + A'*u without altering u */
         matrix_mult_trans(m, n, ptr, row, val, u, v);
         break;

      case 2: /* Compute u = u + A*v  without altering v */
         matrix_mult(m, n, ptr, row, val, v, u);
         break;
      }
   }

   spral_lsmr_free(&keep);
   return 0; // Success
}

/* Takes b and computes u = u + A * v (A in CSC format) */
void matrix_mult(int m, int n, int const *ptr, int const *row,
      double const *val, double const *v, double *u) {
   for(int j=0; j<n; ++j) {
      for(int k=ptr[j]; k<ptr[j+1]; ++k) {
         int i = row[k];
         u[i] += val[k]*v[j];
      }
   }
}

/* Takes b and computes v = v + A^T * u (A in CSC format) */
void matrix_mult_trans(int m, int n, int const *ptr, int const *row,
      double const *val, double const *u, double *v) {
   for(int j=0; j<n; ++j) {
      for(int k=ptr[j]; k<ptr[j+1]; ++k) {
         int i = row[k];
         v[j] += val[k]*u[i];
      }
   }
}

This returns the following output:

Exit LSMR with inform.flag = 2 and inform.itn = 3
LS solution is:
       0.18      0.26      0.17

Method

Algorithm

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation \((A^TA+\lambda^2I)x=A^Tb\) (or \(((AP)^T(AP)+\lambda^2I)y=(AP)^Tb\), \(Py = x\), if preconditioning is used), but has better numerical properties, especially if \(A\) is ill-conditioned. Full details may be found in [1].

Scaling

LSMR uses an iterative method to approximate the solution. The number of iterations required to reach a certain accuracy depends strongly on the scaling of the problem. Poor scaling of the rows or columns of \(A\) should therefore be avoided where possible. For example, in problem 1 the solution is unaltered by row-scaling. If a row of \(A\) is very small or large compared to the other rows of \(A\), the corresponding row of \(( A\; b )\) should be scaled up or down.

In problems 1 and 2, the solution \(x\) is easily recovered following column-scaling. Unless better information is known, the nonzero columns of \(A\) should be scaled so that they all have the same Euclidean norm (e.g., 1.0). In problem 3, there is no freedom to re-scale if damp is nonzero. However, the value of damp should be assigned only after attention has been paid to the scaling of \(A\).

The parameter damp is intended to help regularize ill-conditioned systems, by preventing the true solution from being very large. Another aid to regularization is provided by the spral_lsmr_inform.condAP, which may be used to terminate iterations before the computed solution becomes very large.

Initial estimate

Note that \(x\) (or \(y\) for the preconditioned problem) is not an input parameter. If some initial estimate \(x_0\) of \(x\) is known and if \(\lambda = 0\), one could proceed as follows:

Compute a residual vector \(r_0 = b - Ax_0\).
Use LSMR to solve the system \(A \delta x = r_0\).
Add the correction \(\delta x\) to obtain a final solution \(x = x_0 + \delta x\).

This can be generalized for the preconditioned case. The guess \(x_0\) has to be available before and after the calls to spral_lsmr_solve(). To judge the benefits, suppose spral_lsmr_solve() takes \(k_1\) iterations to solve \(Ax = b\) and \(k_2\) iterations to solve \(A \delta x = r_0\). If \(x_0\) is “good”, \(\|r_0\|_2\) will be smaller than \(\|b\|_2\). If the same stopping tolerances spral_lsmr_options.atol and spral_lsmr_options.btol are used for each system, \(k_1\) and \(k_2\) will be similar, but the final solution \(x = x_0 + \delta x\) should be more accurate. The only way to reduce the total work is to use a larger stopping tolerance for the second system. If some value spral_lsmr_options.btol is suitable for \(Ax=b\), the larger value \(\mathrm{spral\_lsmr\_options.btol}*\|b\|_2 / \|r_0\|_2\) should be suitable for \(A \delta x = r_0\).

LSMR - Sparse Least Squares LSMR Solver

Purpose

Notation

Version history

Subroutines

Derived types

Example

Method

Algorithm

Scaling

Initial estimate

References