# spral_lsmr - Sparse Least Squares LSMR Solver¶

## Purpose¶

This package uses the LSMR iterative method to solve sparse linear equations and sparse least-squares problems of the form:

$\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|x\|_2 \mbox{ subject to }Ax = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|Ax - b\|_2^2,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|Ax - b\|_2^2 + \lambda^2\|x\|_2^2, \end{array}\end{split}$

where the $$m \times n$$ matrix $$A$$ may be square or rectangular, and may have any rank. The scalar $$\lambda$$ is a damping parameter. If $$\lambda > 0$$, the solution is regularized in the sense that a unique soluton always exists, and $$\|x\|_2$$ is always bounded.

Preconditioning may be used to try to reduce the number of iterations. A suitable choice for the preconditioner depends on the user’s knowledge of $$A$$. For a user-chosen $$n \times n$$ nonsingular matrix $$P$$, LSMR solves

$\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|Py\|_2 \mbox{ subject to }APy = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|APy - b\|_2^2 ,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|APy - b\|_2^2 + \lambda^2\|Py\|_2^2 , \end{array}\end{split}$

The user must then recover the final solution $$x$$ by computing $$x=Py$$. $$P$$ will be a good preconditioner if $$AP$$ is significantly better conditioned than $$A$$.

Reverse communication is used for preconditioning operations $$Pz$$ and $$P^Tz$$ and matrix-vector products of the form $$Av$$ and $$A^Tu$$.

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation $$(A^TA+\lambda^2I)x=A^Tb$$ (or $$((AP)^T(AP)+\lambda^2I)y=(AP)^Tb$$), but has better numerical properties, especially if $$A$$ is ill-conditioned.

### Notation¶

In the rest of this documentation, we use the following notation:

$\begin{split}\bar{A} = \left( \begin{array}{c} A \\ \lambda I \end{array} \right)P, \hspace{1cm} \bar{b} = \left( \begin{array}{c} b \\ 0 \end{array} \right),\end{split}$

and

$r = b - APy , \hspace{1cm} \bar{r} = \bar{b} - \bar{A}y, \hspace{1cm} x = Py.$

### Version history¶

2016-05-10 Version 1.0.0.
Initial release.

[For detailed history, see ChangeLog]

## Subroutines¶

subroutine lsmr_solve(action, m, n, u, v, y, keep, options, inform[, damp])

Solve the least squares problem. Uses reverse communication. Upon return the user must perform a task specified by the action parameter and recall the routine. Possible values of action and associated tasks are:

0 None. Computation has terminated. Check inform%flag for reason.
1

The user must compute

$v = v + P^TA^Tu$

without altering $$u$$.

2

The user must compute

$u = u + APv$

without altering $$v$$.

3

The user may test for convergence.

To continue, recall without changing any of the arguments.

Parameters: action [integer,inout] :: reverse communication task (see table above). Must be 0 on first call. m [integer,in] :: number of rows in $$A$$. n [integer,in] :: number of columns in $$A$$. u (m) [real,inout] :: the vector $$u$$. Must contain $$b$$ on first call. v (n) [real,inout] :: the vector $$v$$. y (n) [real,inout] :: the current solution $$y$$ of the preconditioned problem (note $$x=Py$$). keep [lsmr_keep,inout] :: private internal data for LSMR. options [lsmr_options,in] :: controls execution of algorithm. inform [lsmr_inform,inout] :: information about execution of algorithm. Must not be changed by the user. damp [real,in] :: Damping parameter $$\lambda$$.
subroutine lsmr_free(keep, stat)

Free memory allocated in keep (unnecessary if keep is going out of scope).

Parameters: lsmr_keep [inout] :: private data to be freed. stat [integer,out] :: return 0 on success, or Fortran stat parameter on failed deallocation.

Note

If a series of problems is being solved sequentially, the same keep may be used without calling lsmr_free() between each solution.

## Derived types¶

type lsmr_options

Specify options used by LSMR algorithm.

Type fields:
• % print_freq_head [integer,default=20] :: frequency of printing heading information (that is, how many lines are printed before the heading information is reprinted).
• % print_freq_itn [integer,default=10] :: frequency of printing status. There is printing on each of the first print_freq_itn iterations and then printing every print_freq_itn iterations.
• % unit_diagnostics [integer,default=6] :: Fortran unit for diagnostic printing. Printing is suppressed if negative.
• % unit_error [integer,default=6] :: Fortran unit for printing error messages. Printing is suppressed if negative.
• % atol [real,default=sqrt(epsilon(1.0d0))] :: Relative error in $$A$$. i.e. if $$A$$ is accurate to about 6 digits, set atol to 1.0d-6. Only used if options%ctest=3.
• % btol [real,default=sqrt(epsilon(1.0d0))] :: Relative error in $$b$$. i.e. if $$b$$ is accurate to about 6 digits, set btol to 1.0d-6. Only used if options%ctest=3.
• % conlim [real,default=1/(10*sqrt(epsilon(1.0d0))] :: Upper limit on $$cond(\bar{A})$$, apparent condition number of $$\bar{A}$$. Only used if options%ctest=3.
• % ctest [integer,default=3] ::

Convergence test to use, one of:

 1 User to test convergence (action=3). Without computation of norms in inform. 2 User to test convergence (action=3). With computation of norms in inform. 3 (default) Convergence test of Fond and Saunders (in preconditioner norm).
• % itnlim [integer,default=-1] :: Maximum number of iterations. If negative, the limit used is $$4n$$.
• % itn_test [integer,default=-1] :: Number of iterations between user convergence tests. If negative, use $$min(n,10)$$.
• % localSize [integer,default=0] :: Number of historical vectors to use for reorthogonalization.
type lsmr_inform

Type fields: % condAP [real] :: Estimate of $$cond(\bar{A})$$. A very high value of condAP may again indicate an error in the products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1. % flag [integer] :: exit status of algorithm. See table below. % itn [integer] :: number of iterations performed % normAP [real] :: Estimate of Frobenius norm of $$\bar{A}$$. If $$\lambda$$ is small and the columns of $$AP$$ have all been scaled to have length 1.0, normAP should increase to roughly $$\sqrt{n}$$. A radically different value for normAP may indicate an error in the user-supplied products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1. % normAP :: Estimate of $$\|\bar{A}^T\bar{r}\|_2$$, (normal equations residual). This should be small in all cases. Note that normAPr will often be smaller than the true value. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1. % normr [real] :: Estimate of $$\|\bar{r}\|_2$$. This will be small if $$Ax = b$$ has a solution. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1. % normy [real] :: Estimate of $$\|y\|_2$$. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1. % stat [integer] :: The Fortran stat parameter in the event of a failed allocation.
inform%flag Interpretation
0 $$x = 0.0$$ is the exact solution. No iterations were performed.
1 The equations $$Ax = b$$ are probably compatible. $$\|Ax - b\|_2$$ is sufficiently small, given the values of options%atol and options%btol. (options%ctest=3 only).
2 If damp is not present or is zero then the system $$Ax = b$$ is probably not compatible. A least-squares solution has been obtained that is sufficiently accurate, given the value of options%atol. Otherwise, damped least-squares solution has been obtained that is sufficiently accurate, given the value of options%atol. (options%ctest=3 only).
3 An estimate of cond($$\bar{A}$$) has exceeded options%conlim. The system $$Ax = b$$ appears to be ill-conditioned, or there could be an error in the products with $$A$$, $$A^T$$, $$P$$, or $$P^T$$. (options%ctest=3 only).
4 $$\|APy - b \|_2$$ is small enough for this machine. (options%ctest=3 only).
5 The least-squares solution is good enough for this machine. (options%ctest=3 only).
6 The estimate inform%condAP appears to be too large for this machine. (options%ctest=3 only).
7 The iteration limit options%itnlim has been reached.
8 An array allocation failed.
9 An array deallocation failed.
10 Either m<0 or n<0.

## Example¶

The following code illustrates the use of LMSR

! examples/Fortran/lsmr.f90 - Example code for SPRAL_LSMR package
program spral_lsmr_example
use  spral_lsmr
implicit none

integer, parameter :: wp = kind( 1.0d+0 )

type ( lsmr_keep )    :: keep
type ( lsmr_options ) :: options
type ( lsmr_inform )  :: inform

integer :: ptr(4), row(9)
real(wp) :: b(5), u(5), v(3), val(9), x(3)

integer :: action, m, n, stat

! Data for matrix:
! ( 1.0     -1.0 )
! (     2.0      )
! ( 2.0      2.0 )
! ( 5.0 3.0 -2.0 )
! (          6.0 )
m = 5; n = 3
ptr(1:n+1)        = (/ 1,               4,         6,                 10 /)
row(1:ptr(n+1)-1) = (/ 1,     3,   4,   2,   4,    1,   3,    4,   5 /)
val(1:ptr(n+1)-1) = (/ 1.0, 2.0, 5.0, 2.0, 3.0, -1.0, 2.0, -2.0, 6.0 /)
! Data for rhs b
b(1:m) = (/ 1.0, 1.0, 1.0, 1.0, 1.0 /)

! prepare for LSMR calls (using no preconditioning and default
! settings, except switch off diagnostic printing)
options%unit_diagnostics = -1
action = 0
u(1:m) = b(1:m)

do

call lsmr_solve(action, m, n, u, v, x, keep, options, inform)

if (action.eq.0) then
! we are done.
write (*,'(a,i3,a,i3)') ' Exit LSMR with inform%flag = ',inform%flag,&
' and inform%itn = ',inform%itn
write (*,'(a)') ' LS solution is:'
write (*,'(10f10.2)') x(1:n)
exit

else if (action.eq.1) then

! Compute v = v + A'*u without altering u
call matrix_mult_trans(m,n,ptr,row,val,u,v)

else if (action.eq.2) then

! Compute u = u + A*v  without altering v
call matrix_mult(m,n,ptr,row,val,v,u)

end if

end do

call lsmr_free(keep,stat)

contains
!**************************************************************
! Takes b and computes u = u + A * v (A in CSC format)

subroutine matrix_mult(m,n,ptr,row,val,v,u)

integer,  intent(in) :: m,n
integer,  intent(in) :: ptr(n+1),row(:)
real(wp), intent(in) :: val(:),v(n)
real(wp), intent(inout) :: u(m)

integer:: i,j,k
real(wp) :: temp

do j = 1,n
temp = v(j)
do k = ptr(j),ptr(j+1)-1
i = row(k)
u(i) = u(i) + val(k)*temp
end do
end do

end subroutine matrix_mult

!**************************************************************
! Takes b and computes v = v + A^T * u (A in CSC format)

subroutine matrix_mult_trans(m,n,ptr,row,val,u,v)

integer,  intent(in) :: m,n
integer,  intent(in) :: ptr(n+1),row(:)
real(wp), intent(in) :: val(:),u(m)
real(wp), intent(inout) :: v(n)

integer:: i,j,k
real(wp) :: sum

do j = 1,n
sum = 0.0_wp
do k = ptr(j),ptr(j+1)-1
i = row(k)
sum = sum + val(k)*u(i)
end do
v(j) = v(j) + sum
end do

end subroutine matrix_mult_trans

end program spral_lsmr_example


This returns the following output:

Exit LSMR with inform%flag =   2 and inform%itn =   3
LS solution is:
0.18      0.26      0.17


## Method¶

### Algorithm¶

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation $$(A^TA+\lambda^2I)x=A^Tb$$ (or $$((AP)^T(AP)+\lambda^2I)y=(AP)^Tb$$, $$Py = x$$, if preconditioning is used), but has better numerical properties, especially if $$A$$ is ill-conditioned. Full details may be found in [1].

### Scaling¶

LSMR uses an iterative method to approximate the solution. The number of iterations required to reach a certain accuracy depends strongly on the scaling of the problem. Poor scaling of the rows or columns of $$A$$ should therefore be avoided where possible. For example, in problem 1 the solution is unaltered by row-scaling. If a row of $$A$$ is very small or large compared to the other rows of $$A$$, the corresponding row of $$( A\; b )$$ should be scaled up or down.

In problems 1 and 2, the solution $$x$$ is easily recovered following column-scaling. Unless better information is known, the nonzero columns of $$A$$ should be scaled so that they all have the same Euclidean norm (e.g., 1.0). In problem 3, there is no freedom to re-scale if damp is nonzero. However, the value of damp should be assigned only after attention has been paid to the scaling of $$A$$.

The parameter damp is intended to help regularize ill-conditioned systems, by preventing the true solution from being very large. Another aid to regularization is provided by the inform%condAP, which may be used to terminate iterations before the computed solution becomes very large.

### Initial estimate¶

Note that $$x$$ (or $$y$$ for the preconditioned problem) is not an input parameter. If some initial estimate $$x_0$$ of $$x$$ is known and if $$\lambda = 0$$, one could proceed as follows:

1. Compute a residual vector $$r_0 = b - Ax_0$$.
2. Use LSMR to solve the system $$A \delta x = r_0$$.
3. Add the correction $$\delta x$$ to obtain a final solution $$x = x_0 + \delta x$$.

This can be generalized for the preconditioned case. The guess $$x_0$$ has to be available before and after the calls to lsmr_solve(). To judge the benefits, suppose lsmr_solve() takes $$k_1$$ iterations to solve $$Ax = b$$ and $$k_2$$ iterations to solve $$A \delta x = r_0$$. If $$x_0$$ is “good”, $$\|r_0\|_2$$ will be smaller than $$\|b\|_2$$. If the same stopping tolerances options%atol and options%btol are used for each system, $$k_1$$ and $$k_2$$ will be similar, but the final solution $$x = x_0 + \delta x$$ should be more accurate. The only way to reduce the total work is to use a larger stopping tolerance for the second system. If some value options%btol is suitable for $$Ax=b$$, the larger value $$\mathrm{options\%btol}*\|b\|_2 / \|r_0\|_2$$ should be suitable for $$A \delta x = r_0$$.

### References¶

 [1] D.C.-L. Fong and M.A. Saunders (2011). LSMR: An iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33:5, 2950-2971. [DOI: 10.1137/10079687X]