# LSMR - Sparse Least Squares LSMR Solver¶

#include <spral_lsmr.h> /* or <spral.h> for all packages */


## Purpose¶

This package uses the LSMR iterative method to solve sparse linear equations and sparse least-squares problems of the form:

$\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|x\|_2 \mbox{ subject to }Ax = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|Ax - b\|_2^2,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|Ax - b\|_2^2 + \lambda^2\|x\|_2^2, \end{array}\end{split}$

where the $$m \times n$$ matrix $$A$$ may be square or rectangular, and may have any rank. The scalar $$\lambda$$ is a damping parameter. If $$\lambda > 0$$, the solution is regularized in the sense that a unique soluton always exists, and $$\|x\|_2$$ is always bounded.

Preconditioning may be used to try to reduce the number of iterations. A suitable choice for the preconditioner depends on the user’s knowledge of $$A$$. For a user-chosen $$n \times n$$ nonsingular matrix $$P$$, LSMR solves

$\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|Py\|_2 \mbox{ subject to }APy = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|APy - b\|_2^2 ,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|APy - b\|_2^2 + \lambda^2\|Py\|_2^2 , \end{array}\end{split}$

The user must then recover the final solution $$x$$ by computing $$x=Py$$. $$P$$ will be a good preconditioner if $$AP$$ is significantly better conditioned than $$A$$.

Reverse communication is used for preconditioning operations $$Pz$$ and $$P^Tz$$ and matrix-vector products of the form $$Av$$ and $$A^Tu$$.

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation $$(A^TA+\lambda^2I)x=A^Tb$$ (or $$((AP)^T(AP)+\lambda^2I)y=(AP)^Tb$$), but has better numerical properties, especially if $$A$$ is ill-conditioned.

### Notation¶

In the rest of this documentation, we use the following notation:

$\begin{split}\bar{A} = \left( \begin{array}{c} A \\ \lambda I \end{array} \right)P, \hspace{1cm} \bar{b} = \left( \begin{array}{c} b \\ 0 \end{array} \right),\end{split}$

and

$r = b - APy , \hspace{1cm} \bar{r} = \bar{b} - \bar{A}y, \hspace{1cm} x = Py.$

### Version history¶

2016-05-10 Version 1.0.0.
Initial release.

[For detailed history, see ChangeLog]

## Subroutines¶

void spral_lsmr_default_options(struct spral_lsmr_options *options)

Initialises options to default values.

Parameters: options – data structure to be initialised.
int spral_lsmr_solve(int *action, int m, int n, double u[m], double v[n], double y[n], void **keep, struct spral_lsmr_options const *options, struct spral_lsmr_inform *inform, double *damp)

Solve the least squares problem. Uses reverse communication. Upon return the user must perform a task specified by the action parameter and recall the routine. Possible values of action and associated tasks are:

0 None. Computation has terminated. Check inform.flag for reason.
1

The user must compute

$v = v + P^TA^Tu$

without altering $$u$$.

2

The user must compute

$u = u + APv$

without altering $$v$$.

3

The user may test for convergence.

To continue, recall without changing any of the arguments.

Parameters: action – reverse communication task (see table above). Must be 0 on first call. m – number of rows in $$A$$. n – number of columns in $$A$$. u – the vector $$u$$. Must contain $$b$$ on first call. v – the vector $$v$$. y – the current solution $$y$$ of the preconditioned problem (note $$x=Py$$). keep – private internal data for LSMR. options – controls execution of algorithm. inform – information about execution of algorithm. Must not be changed by the user. damp – Damping parameter $$\lambda$$. Value of spral_lsmr_inform.flag.
int lsmr_free(void **keep)

Free memory allocated in keep. If a series of problems is being solved sequentially, the same keep may be used without calling spral_lsmr_free() between each solution.

Parameters: keep – private data to be freed. 0 on success, or Fortran stat parameter on failed deallocation.

## Derived types¶

struct spral_lsmr_options

Specify options used by LSMR algorithm.

int print_freq_head

Frequency of printing heading information (that is, how many lines are printed before the heading information is reprinted). Default is 20.

int print_freq_itn

Frequency of printing status. There is printing on each of the first print_freq_itn iterations and then printing every print_freq_itn iterations. Default is 10.

int unit_diagnostics

Fortran unit for diagnostic printing. Printing is suppressed if negative. Default is 6.

int unit_error

Fortran unit for printing error messages. Printing is suppressed if negative. Default is 6.

double atol

Relative error in $$A$$. i.e. if $$A$$ is accurate to about 6 digits, set atol to 1.0e-6. Only used if spral_lsmr_options.ctest =3. Default is sqrt(DBL_EPSILON).

double btol

Relative error in $$b$$. i.e. if $$b$$ is accurate to about 6 digits, set btol to 1.0e-6. Only used if spral_lsmr_options.ctest =3. Default is sqrt(DBL_EPSILON).

double conlim

Upper limit on $$cond(\bar{A})$$, apparent condition number of $$\bar{A}$$. Only used if spral_lsmr_options.ctest =3. Default is 1/(10*sqrt(DBL_EPSILON)):

int ctest

Convergence test to use. Options are:

 1 User to test convergence (action=3). Without computation of norms in inform. 2 User to test convergence (action=3). With computation of norms in inform. 3 (default) Convergence test of Fond and Saunders (in preconditioner norm).

Default is 3.

int itnlim

Maximum number of iterations. If negative, the limit used is $$4n$$. Default is -1.

int itn_test

Number of iterations between user convergence tests. If negative, use $$\min(n,10)$$. Default is -1.

int localSize

Number of historical vectors to use for reorthogonalization. Default is 0.

struct spral_lsmr_inform

double condAP

Estimate of $$cond(\bar{A})$$. A very high value of condAP may again indicate an error in the products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

int flag

Exit status of algorithm. See table below.

int itn

Number of iterations performed

double normAP

Estimate of Frobenius norm of $$\bar{A}$$. If $$\lambda$$ is small and the columns of $$AP$$ have all been scaled to have length 1.0, normAP should increase to roughly $$\sqrt{n}$$. A radically different value for normAP may indicate an error in the user-supplied products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normAP

Estimate of $$\|\bar{A}^T\bar{r}\|_2$$, (normal equations residual). This should be small in all cases. Note that normAPr will often be smaller than the true value. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normr

Estimate of $$\|\bar{r}\|_2$$. This will be small if $$Ax = b$$ has a solution. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

double normy

Estimate of $$\|y\|_2$$. A negative value indicates that no estimate is currently available. This component is not used if spral_lsmr_options.ctest =1.

int stat

The Fortran stat parameter in the event of a failed allocation.

inform.flag Interpretation
0 $$x = 0.0$$ is the exact solution. No iterations were performed.
1 The equations $$Ax = b$$ are probably compatible. $$\|Ax - b\|_2$$ is sufficiently small, given the values of spral_lsmr_options.atol and spral_lsmr_options.btol. (spral_lsmr_options.ctest =3 only).
2 If damp is not present or is zero then the system $$Ax = b$$ is probably not compatible. A least-squares solution has been obtained that is sufficiently accurate, given the value of spral_lsmr_options.atol. Otherwise, damped least-squares solution has been obtained that is sufficiently accurate, given the value of spral_lsmr_options.atol. (spral_lsmr_options.ctest =3 only).
3 An estimate of cond($$\bar{A}$$) has exceeded spral_lsmr_options.conlim. The system $$Ax = b$$ appears to be ill-conditioned, or there could be an error in the products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. (spral_lsmr_options.ctest =3 only).
4 $$\|APy - b \|_2$$ is small enough for this machine. (spral_lsmr_options.ctest =3 only).
5 The least-squares solution is good enough for this machine. (spral_lsmr_options.ctest =3 only).
6 The estimate spral_lsmr_inform.condAP appears to be too large for this machine. (spral_lsmr_options.ctest =3 only).
7 The iteration limit spral_lsmr_options.itnlim has been reached.
8 An array allocation failed.
9 An array deallocation failed.
10 Either m<0 or n<0.

## Example¶

The following code illustrates the use of LMSR

/* examples/C/lsmr.c - Example code for SPRAL_LSMR package */
#include "spral.h"

#include <stdlib.h>
#include <stdio.h>

void matrix_mult(int m, int n, int const *ptr, int const *row,
double const *val, double const *v, double *u);
void matrix_mult_trans(int m, int n, int const *ptr, int const *row,
double const *val, double const *u, double *v);

int main(void) {
/* Derived types */
void *keep;
struct spral_lsmr_options options;
struct spral_lsmr_inform inform;

/* Initialize derived types */
keep = NULL;
spral_lsmr_default_options(&options);
options.unit_diagnostics = -1; /* Switch off diagnostic printing */

/* Data for matrix:
* ( 1.0     -1.0 )
* (     2.0      )
* ( 2.0      2.0 )
* ( 5.0 3.0 -2.0 )
* (          6.0 ) */
const int m = 5, n = 3;
int ptr[]    = {   0,             3,         5,                 9 };
int row[]    = {   0,   2,   3,   1,   3,    0,   2,    3,   4 };
double val[] = { 1.0, 2.0, 5.0, 2.0, 3.0, -1.0, 2.0, -2.0, 6.0 };
/* Data for rhs b */
double b[] = { 1.0, 1.0, 1.0, 1.0, 1.0 };

/* prepare for LSMR calls (using no preconditioning) */
int action = 0;
double u[m], v[n], x[n];
for(int i=0; i<m; ++i) u[i] = b[i];

bool done = false;
while(!done) {
spral_lsmr_solve(&action, m, n, u, v, x, &keep, &options, &inform, NULL);

switch(action) {
case 0: /* we are done */
printf("Exit LSMR with inform.flag = %d and inform.itn = %d\n",
inform.flag, inform.itn);
printf("LS solution is:\n");
for(int i=0; i<n; ++i) printf(" %10.2f", x[i]);
printf("\n");
done = true;
break;

case 1: /* Compute v = v + A'*u without altering u */
matrix_mult_trans(m, n, ptr, row, val, u, v);
break;

case 2: /* Compute u = u + A*v  without altering v */
matrix_mult(m, n, ptr, row, val, v, u);
break;
}
}

spral_lsmr_free(&keep);
return 0; // Success
}

/* Takes b and computes u = u + A * v (A in CSC format) */
void matrix_mult(int m, int n, int const *ptr, int const *row,
double const *val, double const *v, double *u) {
for(int j=0; j<n; ++j) {
for(int k=ptr[j]; k<ptr[j+1]; ++k) {
int i = row[k];
u[i] += val[k]*v[j];
}
}
}

/* Takes b and computes v = v + A^T * u (A in CSC format) */
void matrix_mult_trans(int m, int n, int const *ptr, int const *row,
double const *val, double const *u, double *v) {
for(int j=0; j<n; ++j) {
for(int k=ptr[j]; k<ptr[j+1]; ++k) {
int i = row[k];
v[j] += val[k]*u[i];
}
}
}


This returns the following output:

Exit LSMR with inform.flag = 2 and inform.itn = 3
LS solution is:
0.18      0.26      0.17


## Method¶

### Algorithm¶

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation $$(A^TA+\lambda^2I)x=A^Tb$$ (or $$((AP)^T(AP)+\lambda^2I)y=(AP)^Tb$$, $$Py = x$$, if preconditioning is used), but has better numerical properties, especially if $$A$$ is ill-conditioned. Full details may be found in [1].

### Scaling¶

LSMR uses an iterative method to approximate the solution. The number of iterations required to reach a certain accuracy depends strongly on the scaling of the problem. Poor scaling of the rows or columns of $$A$$ should therefore be avoided where possible. For example, in problem 1 the solution is unaltered by row-scaling. If a row of $$A$$ is very small or large compared to the other rows of $$A$$, the corresponding row of $$( A\; b )$$ should be scaled up or down.

In problems 1 and 2, the solution $$x$$ is easily recovered following column-scaling. Unless better information is known, the nonzero columns of $$A$$ should be scaled so that they all have the same Euclidean norm (e.g., 1.0). In problem 3, there is no freedom to re-scale if damp is nonzero. However, the value of damp should be assigned only after attention has been paid to the scaling of $$A$$.

The parameter damp is intended to help regularize ill-conditioned systems, by preventing the true solution from being very large. Another aid to regularization is provided by the spral_lsmr_inform.condAP, which may be used to terminate iterations before the computed solution becomes very large.

### Initial estimate¶

Note that $$x$$ (or $$y$$ for the preconditioned problem) is not an input parameter. If some initial estimate $$x_0$$ of $$x$$ is known and if $$\lambda = 0$$, one could proceed as follows:

1. Compute a residual vector $$r_0 = b - Ax_0$$.
2. Use LSMR to solve the system $$A \delta x = r_0$$.
3. Add the correction $$\delta x$$ to obtain a final solution $$x = x_0 + \delta x$$.

This can be generalized for the preconditioned case. The guess $$x_0$$ has to be available before and after the calls to spral_lsmr_solve(). To judge the benefits, suppose spral_lsmr_solve() takes $$k_1$$ iterations to solve $$Ax = b$$ and $$k_2$$ iterations to solve $$A \delta x = r_0$$. If $$x_0$$ is “good”, $$\|r_0\|_2$$ will be smaller than $$\|b\|_2$$. If the same stopping tolerances spral_lsmr_options.atol and spral_lsmr_options.btol are used for each system, $$k_1$$ and $$k_2$$ will be similar, but the final solution $$x = x_0 + \delta x$$ should be more accurate. The only way to reduce the total work is to use a larger stopping tolerance for the second system. If some value spral_lsmr_options.btol is suitable for $$Ax=b$$, the larger value $$\mathrm{spral\_lsmr\_options.btol}*\|b\|_2 / \|r_0\|_2$$ should be suitable for $$A \delta x = r_0$$.

### References¶

 [1] D.C.-L. Fong and M.A. Saunders (2011). LSMR: An iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33:5, 2950-2971. [DOI: 10.1137/10079687X]