spral_lsmr - Sparse Least Squares LSMR Solver

Purpose

This package uses the LSMR iterative method to solve sparse linear equations and sparse least-squares problems of the form:

\[\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|x\|_2 \mbox{ subject to }Ax = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|Ax - b\|_2^2,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|Ax - b\|_2^2 + \lambda^2\|x\|_2^2, \end{array}\end{split}\]

where the \(m \times n\) matrix \(A\) may be square or rectangular, and may have any rank. The scalar \(\lambda\) is a damping parameter. If \(\lambda > 0\), the solution is regularized in the sense that a unique soluton always exists, and \(\|x\|_2\) is always bounded.

Preconditioning may be used to try to reduce the number of iterations. A suitable choice for the preconditioner depends on the user’s knowledge of \(A\). For a user-chosen \(n \times n\) nonsingular matrix \(P\), LSMR solves

\[\begin{split}\begin{array}{ll} \mbox{1. Nonsymmetric equations:} & \mbox{minimize } \|Py\|_2 \mbox{ subject to }APy = b, \\ \mbox{2. Linear least squares:} & \mbox{minimize } \|APy - b\|_2^2 ,\\ \mbox{3. Regularized least squares:} & \mbox{minimize } \|APy - b\|_2^2 + \lambda^2\|Py\|_2^2 , \end{array}\end{split}\]

The user must then recover the final solution \(x\) by computing \(x=Py\). \(P\) will be a good preconditioner if \(AP\) is significantly better conditioned than \(A\).

Reverse communication is used for preconditioning operations \(Pz\) and \(P^Tz\) and matrix-vector products of the form \(Av\) and \(A^Tu\).

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation \((A^TA+\lambda^2I)x=A^Tb\) (or \(((AP)^T(AP)+\lambda^2I)y=(AP)^Tb\)), but has better numerical properties, especially if \(A\) is ill-conditioned.

Notation

In the rest of this documentation, we use the following notation:

\[\begin{split}\bar{A} = \left( \begin{array}{c} A \\ \lambda I \end{array} \right)P, \hspace{1cm} \bar{b} = \left( \begin{array}{c} b \\ 0 \end{array} \right),\end{split}\]

and

\[r = b - APy , \hspace{1cm} \bar{r} = \bar{b} - \bar{A}y, \hspace{1cm} x = Py.\]

Version history

2016-05-10 Version 1.0.0.
Initial release.

[For detailed history, see ChangeLog]

Subroutines

subroutine lsmr_solve(action, m, n, u, v, y, keep, options, inform[, damp])

Solve the least squares problem. Uses reverse communication. Upon return the user must perform a task specified by the action parameter and recall the routine. Possible values of action and associated tasks are:

action Task to be performed
0 None. Computation has terminated. Check inform%flag for reason.
1

The user must compute

\[v = v + P^TA^Tu\]

without altering \(u\).

2

The user must compute

\[u = u + APv\]

without altering \(v\).

3

The user may test for convergence.

To continue, recall without changing any of the arguments.

Parameters:
  • action [integer,inout] :: reverse communication task (see table above). Must be 0 on first call.
  • m [integer,in] :: number of rows in \(A\).
  • n [integer,in] :: number of columns in \(A\).
  • u (m) [real,inout] :: the vector \(u\). Must contain \(b\) on first call.
  • v (n) [real,inout] :: the vector \(v\).
  • y (n) [real,inout] :: the current solution \(y\) of the preconditioned problem (note \(x=Py\)).
  • keep [lsmr_keep,inout] :: private internal data for LSMR.
  • options [lsmr_options,in] :: controls execution of algorithm.
  • inform [lsmr_inform,inout] :: information about execution of algorithm. Must not be changed by the user.
Options:

damp [real,in] :: Damping parameter \(\lambda\).

subroutine lsmr_free(keep, stat)

Free memory allocated in keep (unnecessary if keep is going out of scope).

Parameters:
  • lsmr_keep [inout] :: private data to be freed.
  • stat [integer,out] :: return 0 on success, or Fortran stat parameter on failed deallocation.

Note

If a series of problems is being solved sequentially, the same keep may be used without calling lsmr_free() between each solution.

Derived types

type lsmr_options

Specify options used by LSMR algorithm.

Type fields:
  • % print_freq_head [integer,default=20] :: frequency of printing heading information (that is, how many lines are printed before the heading information is reprinted).
  • % print_freq_itn [integer,default=10] :: frequency of printing status. There is printing on each of the first print_freq_itn iterations and then printing every print_freq_itn iterations.
  • % unit_diagnostics [integer,default=6] :: Fortran unit for diagnostic printing. Printing is suppressed if negative.
  • % unit_error [integer,default=6] :: Fortran unit for printing error messages. Printing is suppressed if negative.
  • % atol [real,default=sqrt(epsilon(1.0d0))] :: Relative error in \(A\). i.e. if \(A\) is accurate to about 6 digits, set atol to 1.0d-6. Only used if options%ctest=3.
  • % btol [real,default=sqrt(epsilon(1.0d0))] :: Relative error in \(b\). i.e. if \(b\) is accurate to about 6 digits, set btol to 1.0d-6. Only used if options%ctest=3.
  • % conlim [real,default=1/(10*sqrt(epsilon(1.0d0))] :: Upper limit on \(cond(\bar{A})\), apparent condition number of \(\bar{A}\). Only used if options%ctest=3.
  • % ctest [integer,default=3] ::

    Convergence test to use, one of:

    1 User to test convergence (action=3). Without computation of norms in inform.
    2 User to test convergence (action=3). With computation of norms in inform.
    3 (default) Convergence test of Fond and Saunders (in preconditioner norm).
  • % itnlim [integer,default=-1] :: Maximum number of iterations. If negative, the limit used is \(4n\).
  • % itn_test [integer,default=-1] :: Number of iterations between user convergence tests. If negative, use \(min(n,10)\).
  • % localSize [integer,default=0] :: Number of historical vectors to use for reorthogonalization.
type lsmr_inform

Information about progress of algorithm.

Type fields:
  • % condAP [real] :: Estimate of \(cond(\bar{A})\). A very high value of condAP may again indicate an error in the products with \(A\), \(A^T\), \(P\), or \(P^T\). A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1.
  • % flag [integer] :: exit status of algorithm. See table below.
  • % itn [integer] :: number of iterations performed
  • % normAP [real] :: Estimate of Frobenius norm of \(\bar{A}\). If \(\lambda\) is small and the columns of \(AP\) have all been scaled to have length 1.0, normAP should increase to roughly \(\sqrt{n}\). A radically different value for normAP may indicate an error in the user-supplied products with \(A\), \(A^T\), \(P\), or \(P^T\). A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1.
  • % normAP :: Estimate of \(\|\bar{A}^T\bar{r}\|_2\), (normal equations residual). This should be small in all cases. Note that normAPr will often be smaller than the true value. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1.
  • % normr [real] :: Estimate of \(\|\bar{r}\|_2\). This will be small if \(Ax = b\) has a solution. A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1.
  • % normy [real] :: Estimate of \(\|y\|_2\). A negative value indicates that no estimate is currently available. This component is not used if options%ctest=1.
  • % stat [integer] :: The Fortran stat parameter in the event of a failed allocation.
inform%flag Interpretation
0 \(x = 0.0\) is the exact solution. No iterations were performed.
1 The equations \(Ax = b\) are probably compatible. \(\|Ax - b\|_2\) is sufficiently small, given the values of options%atol and options%btol. (options%ctest=3 only).
2 If damp is not present or is zero then the system \(Ax = b\) is probably not compatible. A least-squares solution has been obtained that is sufficiently accurate, given the value of options%atol. Otherwise, damped least-squares solution has been obtained that is sufficiently accurate, given the value of options%atol. (options%ctest=3 only).
3 An estimate of cond(\(\bar{A}\)) has exceeded options%conlim. The system \(Ax = b\) appears to be ill-conditioned, or there could be an error in the products with \(A\), \(A^T\), \(P\), or \(P^T\). (options%ctest=3 only).
4 \(\|APy - b \|_2\) is small enough for this machine. (options%ctest=3 only).
5 The least-squares solution is good enough for this machine. (options%ctest=3 only).
6 The estimate inform%condAP appears to be too large for this machine. (options%ctest=3 only).
7 The iteration limit options%itnlim has been reached.
8 An array allocation failed.
9 An array deallocation failed.
10 Either m<0 or n<0.

Example

The following code illustrates the use of LMSR

! examples/Fortran/lsmr.f90 - Example code for SPRAL_LSMR package
program spral_lsmr_example
    use  spral_lsmr
    implicit none
  
    integer, parameter :: wp = kind( 1.0d+0 )

    type ( lsmr_keep )    :: keep 
    type ( lsmr_options ) :: options
    type ( lsmr_inform )  :: inform

    integer :: ptr(4), row(9)
    real(wp) :: b(5), u(5), v(3), val(9), x(3)

    integer :: action, m, n, stat

    ! Data for matrix:
    ! ( 1.0     -1.0 )
    ! (     2.0      )
    ! ( 2.0      2.0 )
    ! ( 5.0 3.0 -2.0 )
    ! (          6.0 )
    m = 5; n = 3
    ptr(1:n+1)        = (/ 1,               4,         6,                 10 /)
    row(1:ptr(n+1)-1) = (/ 1,     3,   4,   2,   4,    1,   3,    4,   5 /)
    val(1:ptr(n+1)-1) = (/ 1.0, 2.0, 5.0, 2.0, 3.0, -1.0, 2.0, -2.0, 6.0 /)
    ! Data for rhs b
    b(1:m) = (/ 1.0, 1.0, 1.0, 1.0, 1.0 /)

    ! prepare for LSMR calls (using no preconditioning and default
    ! settings, except switch off diagnostic printing)
    options%unit_diagnostics = -1
    action = 0
    u(1:m) = b(1:m)

    do

       call lsmr_solve(action, m, n, u, v, x, keep, options, inform)

       if (action.eq.0) then
          ! we are done.
          write (*,'(a,i3,a,i3)') ' Exit LSMR with inform%flag = ',inform%flag,&
            ' and inform%itn = ',inform%itn
          write (*,'(a)') ' LS solution is:'
          write (*,'(10f10.2)') x(1:n)
          exit

      else if (action.eq.1) then

            ! Compute v = v + A'*u without altering u
             call matrix_mult_trans(m,n,ptr,row,val,u,v)

      else if (action.eq.2) then

            ! Compute u = u + A*v  without altering v
            call matrix_mult(m,n,ptr,row,val,v,u)

      end if

   end do

   call lsmr_free(keep,stat)

  contains
!**************************************************************
      ! Takes b and computes u = u + A * v (A in CSC format)

      subroutine matrix_mult(m,n,ptr,row,val,v,u)

      integer,  intent(in) :: m,n
      integer,  intent(in) :: ptr(n+1),row(:)
      real(wp), intent(in) :: val(:),v(n)
      real(wp), intent(inout) :: u(m)

      integer:: i,j,k
      real(wp) :: temp
    
         do j = 1,n
            temp = v(j)
            do k = ptr(j),ptr(j+1)-1
                i = row(k)
                u(i) = u(i) + val(k)*temp
            end do
         end do

      end subroutine matrix_mult

!**************************************************************
      ! Takes b and computes v = v + A^T * u (A in CSC format)

      subroutine matrix_mult_trans(m,n,ptr,row,val,u,v)

      integer,  intent(in) :: m,n
      integer,  intent(in) :: ptr(n+1),row(:)
      real(wp), intent(in) :: val(:),u(m)
      real(wp), intent(inout) :: v(n)

      integer:: i,j,k
      real(wp) :: sum
      
         do j = 1,n
            sum = 0.0_wp
            do k = ptr(j),ptr(j+1)-1
               i = row(k)
               sum = sum + val(k)*u(i)
            end do
            v(j) = v(j) + sum
         end do

      end subroutine matrix_mult_trans

end program spral_lsmr_example

This returns the following output:

Exit LSMR with inform%flag =   2 and inform%itn =   3
LS solution is:
     0.18      0.26      0.17

Method

Algorithm

The method used is based on the Golub-Kahan bidiagonalization process. It is algebraically equivalent to applying MINRES to the normal equation \((A^TA+\lambda^2I)x=A^Tb\) (or \(((AP)^T(AP)+\lambda^2I)y=(AP)^Tb\), \(Py = x\), if preconditioning is used), but has better numerical properties, especially if \(A\) is ill-conditioned. Full details may be found in [1].

Scaling

LSMR uses an iterative method to approximate the solution. The number of iterations required to reach a certain accuracy depends strongly on the scaling of the problem. Poor scaling of the rows or columns of \(A\) should therefore be avoided where possible. For example, in problem 1 the solution is unaltered by row-scaling. If a row of \(A\) is very small or large compared to the other rows of \(A\), the corresponding row of \(( A\; b )\) should be scaled up or down.

In problems 1 and 2, the solution \(x\) is easily recovered following column-scaling. Unless better information is known, the nonzero columns of \(A\) should be scaled so that they all have the same Euclidean norm (e.g., 1.0). In problem 3, there is no freedom to re-scale if damp is nonzero. However, the value of damp should be assigned only after attention has been paid to the scaling of \(A\).

The parameter damp is intended to help regularize ill-conditioned systems, by preventing the true solution from being very large. Another aid to regularization is provided by the inform%condAP, which may be used to terminate iterations before the computed solution becomes very large.

Initial estimate

Note that \(x\) (or \(y\) for the preconditioned problem) is not an input parameter. If some initial estimate \(x_0\) of \(x\) is known and if \(\lambda = 0\), one could proceed as follows:

  1. Compute a residual vector \(r_0 = b - Ax_0\).
  2. Use LSMR to solve the system \(A \delta x = r_0\).
  3. Add the correction \(\delta x\) to obtain a final solution \(x = x_0 + \delta x\).

This can be generalized for the preconditioned case. The guess \(x_0\) has to be available before and after the calls to lsmr_solve(). To judge the benefits, suppose lsmr_solve() takes \(k_1\) iterations to solve \(Ax = b\) and \(k_2\) iterations to solve \(A \delta x = r_0\). If \(x_0\) is “good”, \(\|r_0\|_2\) will be smaller than \(\|b\|_2\). If the same stopping tolerances options%atol and options%btol are used for each system, \(k_1\) and \(k_2\) will be similar, but the final solution \(x = x_0 + \delta x\) should be more accurate. The only way to reduce the total work is to use a larger stopping tolerance for the second system. If some value options%btol is suitable for \(Ax=b\), the larger value \(\mathrm{options\%btol}*\|b\|_2 / \|r_0\|_2\) should be suitable for \(A \delta x = r_0\).

References

[1]D.C.-L. Fong and M.A. Saunders (2011). LSMR: An iterative algorithm for sparse least-squares problems. SIAM J. Sci. Comput. 33:5, 2950-2971. [DOI: 10.1137/10079687X]