# SSMFE_CORE: Sparse Symmetric Matrix-Free Eigensolver (Core Algorithm)

## Sparse Symmetric Matrix-Free Eigensolver, core interface

Fortran User Guide
This package computes extreme (leftmost and/or rightmost) eigenpairs $\left\{{\lambda }_{i},{x}_{i}\right\}$ of the following eigenvalue problems:
• the standard eigenvalue problem $\begin{array}{rcll}Ax=\lambda x,& & & \text{(7.1)}\text{}\text{}\end{array}$

• the generalized eigenvalue problem $\begin{array}{rcll}Ax=\lambda Bx,& & & \text{(7.2)}\text{}\text{}\end{array}$

• the eigenvalue problem $\begin{array}{rcll}ABx=\lambda x,& & & \text{(7.3)}\text{}\text{}\end{array}$

where $A$ and $B$ are real symmetric (or Hermitian) matrices and $B$ is positive deﬁnite.

SPRAL_SSMFE_CORE delegates a considerable part of the computation to the user, which makes its solver procedures rather diﬃcult to use. For solving problems (7.1) and (7.2), the user is advised to consider using the package SPRAL_SSMFE, which oﬀers a simple interface to SPRAL_SSMFE_CORE, or the package SPRAL_SSMFE_EXPERT, which delegates less work to the user.

### Major version history

2014-11-20 Version 1.0.0
Initial release

### 7.1 Installation

Please see the SPRAL install documentation. In particular note that:

• A BLAS library is required.
• A LAPACK library is required.

### 7.2 Usage overview

SPRAL_SSMFE_CORE implements a block iterative algorithm for simultaneous computation of several eigenpairs, i.e. eigenvalues and corresponding eigenvectors, of problems (7.1)–(7.3). The block nature of this algorithm allows the user to beneﬁt from highly optimized linear algebra subroutines and from the ubiquitous multicore architecture of modern computers. It also makes this algorithm more reliable than Krylov-based algorithms employed by e.g. ARPACK in the presence of clustered eigenvalues. However, convergence of the iterations may be slow if the density of the spectrum is high.

Thus, good performance (in terms of speed) is contingent on the following two factors: (i) the number of desired eigenpairs must be substantial (i.e. not less than the number of CPU cores), and (ii) the employment of a convergence acceleration technique. SPRAL_SSMFE_CORE allows the user to beneﬁt from two acceleration techniques: shift-and-invert and preconditioning. The shift-and-invert technique rewrites the eigenvalue problem for a matrix $M$ as the eigenvalue problem (7.1) for the matrix $A={\left(M-\sigma I\right)}^{-1}$, where $I$ is the identity matrix and $\sigma$ is a real value near eigenvalues of interest, and the generalized problem $Mx=\mu Bx$ as the problem (7.3) for $B$ and $A={\left(M-\sigma B\right)}^{-1}$. The preconditioning technique applies to (7.1) and (7.2) with positive deﬁnite $A$ and requires a matrix or an operator (that is, an algorithm producing a vector $v=Tu$ for a given vector $u$) $T$, called a preconditioner, such that the vector $v=Tf$ is an approximation to the solution $u$ of the system $Au=f$. This technique is more sophisticated and is likely to be of interest only to experienced users.

For futher detail on the algorithm, see the outline in Section 7.6 and associated references.

SPRAL_SSMFE_CORE delegates more computation to the user than the packages SPRAL_SSMFE and SPRAL_SSMFE_EXPERT, which are built upon it. Not only are the storage and operations on all vectors of size $n$ user’s responsibility, but also the decision on whether a suﬃcient number of eigenpairs have been computed to suﬃcient accuracy. The amount of computation performed by the solver subroutines of SPRAL_SSMFE_CORE and the memory they use are negligible. These features facilitate the use of these subroutines for shared-memory, out-of-core and hybrid computation.

#### 7.2.1 Calling sequences

use SPRAL_SSMFE_CORE

The following procedures are available to the user:

• ssmfe() computes speciﬁed numbers of left- and right-most eigenvalues and the corresponding eigenvectors)
• ssmfe_largest() computes a speciﬁed number of eigenvalues of largest magnitude and the corresponding eigenvectors
• ssmfe_free() should be called after all other calls are complete. It frees memory referenced by keep and inform.

The solver procedures ssmfe() and ssmfe_largest() must be called repeatedly using a reverse communication interface. The procedure ssmfe_free() should be called once after the ﬁnal call to a solver procedure to free all memory that has been internally allocated by the solver procedure.

#### 7.2.2 Package types

INTEGER denotes default INTEGER, and REAL denotes REAL(kind=kind(0d0)). We use the term package type to mean REAL(kind=kind(0d0)) for calls to the double precision real valued interface, and to mean COMPLEX(kind=kind(0d0)) for calls to the double precision complex valued interface.

#### 7.2.3 Derived types

For each problem, the user must employ the derived types deﬁned by the module to declare scalars of the types ssmfe_rcid (real version) or ssmfe_rciz (complex version), ssmfe_core_keep, ssmfe_core_options and ssmfe_inform. The following pseudocode illustrates this.

use SPRAL_SSMFE_CORE
...
type (ssmfe_rcid        ) :: rcid
type (ssmfe_core_keep   ) :: keep
type (ssmfe_core_options) :: options
type (ssmfe_inform      ) :: inform
...

The components of ssmfe_options and ssmfe_inform are explained in Sections 7.4.1 and 7.4.2. Components of ssmfe_rcid are used for the reverse communication interface and are explained in Section 7.3.1. The components of keep are used to pass data between repeated calls to the subroutines and must not be altered by the user.

### 7.3 Argument lists

#### 7.3.1 ssmfe() and ssmfe_largest()

To compute speciﬁed numbers of leftmost and rightmost eigenvalues and corresponding eigenvectors, the following call must be made repeatedly:

call ssmfe( rci, problem, left, right, m, lambda, rr, ind, keep, options, inform )

To compute a speciﬁed number of eigenvalues of largest magnitude and corresponding eigenvectors, the following call must be made repeatedly:

call ssmfe_largest( rci, problem, nep, m, lambda, rr, ind, keep, options, inform )

To use the solver procedures, the user needs to maintain a workspace W containing kw + 1 blocks of m vectors of size n. A value kw = 7 is always suﬃcient. However, if options%minAprod $=$ .false. and either options%minBprod $=$ .false. or the standard eigenvalue problem (7.1) is solved, then kw = 3 is suﬃcient; if options%minAprod $=$ .true. and either options%minBprod $=$ .true. or (7.3) is solved, then kw must be at least 7; otherwise kw = 5 is suﬃcient. Solver procedures use indices 1 to m to refer to vectors inside each block and indices 0 to kw to refer to particular blocks. The ﬁrst (zero-indexed) block holds the eigenvector approximations: the user must ﬁll this block with m linearly independent vectors before the ﬁrst call to a solver procedure.

The number of desired eigenpairs may exceed m: whenever converged eigenpairs have been detected, a solver procedure reports the indices of these eigenpairs and they must be moved by the user to a separate eigenvectors’ storage X.

When $B\ne I$, it is expedient to have storage BX for the $B$-images of the converged eigenvectors, i.e. BX = B*X.

To simplify the description of the reverse communication interface, below we assume that an array W(n, m, 0:kw) of package type is used as a workspace, and that arrays X(n, mep) and BX(n, mep) of package type are used for storing the computed eigenvectors and their $B$-images, where mep is not less than the number of desired eigenpairs. The transpose (real or complex, depending on the package type) of a matrix H is denoted by H${}^{\prime }$.

The meaning of the arguments of the solver procedures is as follows.

rci
is is an INTENT(INOUT) scalar of type ssmfe_rcid in real version and ssmfe_rciz in complex version. Before the ﬁrst call, rci%job must be set to 0. No other values may be assigned to rci%job by the user. After each call, the value of rci%job must be inspected by the user’s code and the appropriate action taken:
-3
: fatal error return, the computation must be terminated;
-2
: not all desired eigenpairs converged to required accuracy, see Section 7.5;
-1
: the computation is complete and successful.
1
: the user must compute $V=AU$, where

$U=$ W(:, ix:jx, rci%kx),  with  ix$=$ rci%jx and  jx$=$ ix + rci%nx - 1,

$V=$ W(:, iy:jy, rci%ky),  with  iy$=$ rci%jy and  jy$=$ iy + rci%nx - 1.

2
: the user must compute $V=TU$ if preconditioning is used or copy $U$ to $V$ otherwise, where $U$ and $V$ are as for rci%job = 1.
3
: the user must compute $V=BU$, where $U$ and $V$ are as for rci%job = 1.
4
: for each i = 1,…,m such that inform%converged(i) is zero, the user must set inform%converged(i) to a positive value if the residual norm inform%res_norms(i) and the estimated eigenvalue and eigenvector errors inform%err_lambda(i) and inform%err_x(i) are small enough for this eigenpair to be deemed as converged.
5
: the user must save the converged eigenvectors to the eigenvector storage X and, optionally, for problems (7.2) and (7.3), save their $B$-images. The converged eigenvectors are columns of the $n×m$ matrix W(:,:,rci%kx) and their $B$-images are respective columns of W(:,:,rci%ky) that are identiﬁed by rci%i, rci%jx and rci%nx as follows: if rci%i > 0, then the column numbers run from rci%jx to rci%jx + rci%nx - 1, and if rci%i < 0, then they run from rci%jx - rci%nx + 1 to rci%jx.
11
: if rci%i = 0, then the user must perform a copy $V←U$, where $U$ and $V$ are as for rci%job = 1, otherwise the columns of W(:,:,rci%kx) and W(:,:,rci%ky) (if rci%kx $\ne$ rci%ky) must be reordered using the index array ind(:) so that the column ind(j) becomes column j for j = 1,…,rci%nx.
12
: for each i = 0, 1,..., rci%nx - 1, the user must compute the dot product of the columns

W(:, rci%jx + i, rci%kx)

and

W(:, rci%jy + i, rci%ky)

and place it in

rr(rci%i + i, rci%j + i, rci%k).

13
: if rci%kx$=$ rci%ky, then for each i = 0, 1,..., rci%nx - 1, the user must perform the scaling

W(:, rci%jx + i, rci%kx) = W(:, rci%jx + i, rci%kx)$∕{s}_{i}$,

where ${s}_{i}$ is the 2-norm of the column W(:, rci%jx + i, rci%kx), otherwise the user must perform the scalings

W(:, rci%jx + i, rci%kx) = W(:, rci%jx + i, rci%kx)$∕{s}_{i}$

W(:, rci%jy + i, rci%ky) = W(:, rci%jy + i, rci%ky)$∕{s}_{i}$,

where ${s}_{i}$ is the square root of the dot product of the columns W(:, rci%jx + i, rci%kx) and W(:, rci%jy + i, rci%ky). No scaling is to be applied to zero columns.

14: for each i = 0, 1,..., rci%nx - 1, the user must perform axpy-updates:

W(:, rci%jy + i, rci%ky) = W(:, rci%jy + i, rci%ky) +

rr(rci%i + i, rci%j + i, rci%k) * W(:, rci%jx + i, rci%kx).

15: the user must perform the matrix multiplication:

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny-1, rci%k) =

rci%alpha * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx)’ *

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) +

rci%beta * rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k).

16: the user must perform the matrix multiplication:

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) =

rci%alpha * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) *

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k) +

rci%beta * W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky).

17: the user must perform the multiplication:

W(:, rci%jx $:$ rci%jx + rci%ny - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) *

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k).

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) can be used as a workspace.

21: the user must $B$-orthogonalize the columns of W speciﬁed by rci%nx, rci%jx and rci%kx to all vectors stored in X by solving the system

(X’ * BX) Q = X’ * W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky)

for Q and updating

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) - X * Q.

For problems (7.2) and (7.3), the respective columns of W(:,:,rci%ky), which store $B$-images of the respective columns of W(:,:,rci%kx), must be updated accordingly, either by applying B to these vectors or using the columns of BX, i.e.

W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky) =

W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky) - BX * Q;

22: the user must solve the system

(X’ * BX) Q = X’ * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx)

for Q and perform the update

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) - BX * Q,

where X and BX are same as in the case rci%job = 21 (in the case of problem (7.1), rci%job = 21 and 22 require exactly the same computation).

999: If rci%k > 0, then a restart, normally with a larger block size m, is suggested with the aim of achieving better convergence. If the suggestion is accepted, the user must compute the new block size as m = rci%nx + k + l, where k $\ge$ rci%i and l $\ge$ rci%j, reallocate the workspace array W if the new block size is diﬀerent from the old one, and set rci%i = 0 and rci%j = 0. The ﬁrst block W(:,:,0) of the new workspace should retain the vectors W(:,i:j,0), where i = rci%jx and j = i + rci%nx - 1, from the old workspace. The remaining m - rci%nx columns of W(:,:,0) must be ﬁlled with arbitrary vectors that are linearly independent from the converged eigenvectors and such that the entire set of the columns of W(:,:,0) is linearly independent. If the restart is not acceptable (e.g. the new block size exceeds a certain limit set by the user), then nothing needs to be done.

Restriction: rci%job = 0, rci%i = 0 and rci%j = 0 are the only assignments to the components of rci that can be done by the user. The ﬁrst one can only be done before the ﬁrst call. The other two can only be done if rci%job = 999 and rci%k > 0.

problem
is an INTENT(IN) scalar of type INTEGER that speciﬁes the problem to be solved: if problem is zero, then (7.1) is solved, if it is positive then (7.2) is solved, otherwise (7.3) is solved.
left
is an INTENT(IN) scalar of type default INTEGER that speciﬁes the number of desired leftmost eigenvalues. On return with rci%job = 5, can be set to zero if a suﬃcient number of leftmost eigenvalues have been computed.
right
is an INTENT(IN) scalar of type default INTEGER that speciﬁes the number of desired rightmost eigenvalues. On return with rci%job = 5, can be set to zero if suﬃcient number of rightmost eigenvalues have been computed.
nep
is an INTENT(IN) scalar of type default INTEGER that speciﬁes the number of desired largest eigenvalues.
lambda(:)
is an INTENT(INOUT) array of type REAL and size m that is used to return the computed eigenvalues. After a successful completion of the computation it contains eigenvalues in ascending order. This array must not be changed by the user.
m
is an INTENT(IN) scalar of type INTEGER that holds the block size of the user’s workspace W. Restriction: if both left and right are non-zero or ssmfe_largest() is called then m $>1$, otherwise m $>0$.
rr(:,:,:)
is an INTENT(INOUT) work array of package type, and dimensions 2*m, 2*m and 3. It must only be changed by the user when instructed to do so by rci%job.
ind(:)
is an INTENT(INOUT) array of default integer type, and size at least m. It must not be changed by the user. It is used for reordering the columns of some blocks of W.
keep
is an INTENT(INOUT) scalar of type ssmfe_core_keep that holds private data.
options
is an INTENT(IN) scalar of type ssmfe_core_options. Its components oﬀer the user a range of options, see Section 7.4.1. It must not be changed by the user between calls.
inform
is an INTENT(INOUT) scalar of type ssmfe_inform. Its components provide information about the execution of the subroutine, see Section 7.4.2. It must not be changed by the user.

#### 7.3.2 ssmfe_free()

At the end of the computation, the memory allocated by the solver procedures should be released by making the following subroutine call:

ssmfe_free( keep, inform )

keep
is an INTENT(INOUT) scalar of type ssmfe_core_keepk, optional. On exit, its components that are allocatable arrays will have been deallocated.
inform
is an INTENT(INOUT) scalar of type ssmfe_inform, optional. On exit, its components that are allocatable arrays will have been deallocated.

### 7.4 Derived types

#### 7.4.1 ssmfe_core_options

The derived data type ssmfe_core_options is used to specify the options used within SSMFE_CORE. The components, that are automatically given default values in the deﬁnition of the type, are:

err_est
is a scalar of type default INTEGER that deﬁnes which error estimation scheme for eigenvalues and eigenvectors is to be used by the stopping criterion. Two schemes are implemented. If err_est = 1, residual error bounds are used, namely, a modiﬁed Davis-Kahan estimate for the eigenvector error and the Lehmann bounds for the eigenvalue error. (see Section 7.6.2). If err_est = 2, then the eigenvector and eigenvalue errors are estimated by analyzing the convergence curve for the eigenvalues (see Section 7.6.2). The default is err_est = 2. Restriction: err_est = 1 or 2.
extra_left
is a scalar of type default INTEGER that holds the number of extra approximate eigenvectors corresponding to leftmost eigenvalues that are of no interest to the user and are iterated solely to enhance convergence. The default is extra_left = 0. Restriction: extra_left $\ge$ 0.
extra_right
is a scalar of type default INTEGER that holds the number of extra approximate eigenvectors corresponding to rightmost eigenvalues that are of no interest to the user and are iterated solely to enhance convergence. The default is extra_right = 0. Restriction: extra_right $\ge$ 0.
minAprod
is a scalar of type default LOGICAL that determines whether the number of multiplications by $A$ is to be reduced at the expense of memory. If $minAprod=.false.$, on each iteration three returns to the user with rci%job = 1 are made for multiplications of rci%nx vectors by $A$. Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s work array W must be increased by 2. The default is minAprod = .true.. Restriction: minAprod = .true. if problem < 0.
minBprod
is a scalar of type default LOGICAL that determines whether the number of multiplications by $B$ is to be reduced at the expense of memory. If $minBprod=.false.$, on each iteration at least three returns to the user with rci%job = 3 are made for multiplications of rci%nx vectors by $B$. Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s work array W must be increased by 2. The default is minBprod = .true..
min_gap
is a scalar of type REAL that controls the restart. If the relative distance between the last eigenvalue of interest on either margin of the spectrum and the rest of the spectrum is smaller than min_gap, the solver procedure suggests restart (cf. rci%job $=999$). The values rci%i and rci%j are set to the numbers of eigenvalues on the left and right margin of the spectrum that are too close to the eigenvalues of interest, causing slow convergence. The default is min_gap = 0, i.e. by default no restart is ever suggested. Restriction: $0\le$ min_gap $\le 1$.
cf_max
is a scalar of type REAL that is used to detect the convergence stagnation based on the estimated asymptotic convergence factor ${q}_{ij}$ given by (7.10) (see §7.6.2). If ${q}_{ij}$ is greater than cf_max for $i>5$, the eigenpair is marked as stagnated by setting inform%converged(j) to a negative value. The default is cf_max = 1, i.e. the estimated asymptotic convergence factor is not used for stagnation detection. Restriction: $0.5\le$ cf_max $\le 1$.

#### 7.4.2 ssmfe_infrom

The derived data type ssmfe_inform is used to hold information from the execution of the solver procedures. The components are:

converged
is a rank-1 allocatable array of type default INTEGER that is allocated to have size m on a call with rci%job = 0 or 999. If, on some iteration i, an eigenpair (lambda(j), X(j)) has been accepted as converged, then converged(j) = i; if the convergence stagnated, then converged(j) = -i; otherwise converged(j) = 0.
err_lambda
is a rank-1 allocatable array of type REAL that is allocated to have size m on a call with rci%job = 0 or 999. err_lambda(i) contains the estimated eigenvalue error for the approximate eigenvalue lambda(i) if info%converged(i) is non-zero, and -1.0 otherwise.
err_x
is a rank-1 allocatable array of type REAL. This array is allocated to have size m on a call with rci%job = 0 or 999, and is used for storing the eigenvector errors in the same way as err_lambda(:) is used for storing the eigenvalue errors.
flag
is a scalar of type default INTEGER that is used as an error ﬂag. If a call is successful, flag has value 0. A nonzero value of flag indicates an error or a warning (see Section 7.5).
iteration
is a scalar of type default INTEGER that holds the number of iterations since the previous rci%job = 0 or rci%job = 999 call.
residual_norms
is a rank-1 allocatable array of type REAL that is allocated to have size m on a call with rci%job = 0 or 999. On returns with rci%job = 4 or 5, res_norms(i) contains the Euclidean norm of the residual for lambda(i), X(i).
stat
is a scalar of type default INTEGER that holds the allocation status (see Section 7.5).

### 7.5 Error ﬂags

A successful return from ssmfe and ssmfe_largest is indicated by inform%flag$=$0. A negative value indicates an error, a positive value indicates a warning.

Possible negative values of inform%flag are as follows:

-1 Block size m is out-of-range.
-2 Incorrect value of rci%job.
-3 Incorrect value of options%err_est.
-4 Incorrect value of options%minAprod.
-5 Incorrect value of options%extra_left or options%extra_right.
-6 Incorrect value of options%min_gap.
-7 Incorrect value of options%cf_max.
-11 Incorrect value of left (for ssmfe) or nep (for ssmfe_largest).
-12 Incorrect value of right.
-100 Not enough memory; inform%stat contains the value of the Fortran stat parameter.
-200 $B$ is not positive deﬁnite or initial eigenvectors are linearly dependent.

The value inform%flag = 1 indicates that the iterations have been terminated because no further improvement in accuracy is possible (this may happen if $B$ or the preconditioner is not positive deﬁnite, or if the components of the residual vectors are so small that the round-oﬀ errors make them essentially random).

### 7.6 Method

#### 7.6.1 The algorithm

The algorithm implemented by ssmfe() and ssmfe_largest() is based on the Jacobi-conjugate preconditioned gradients (JCPG) method. The outline of the algorithm, assuming for simplicity that 0 < left $\le$ m and right = 0 and using Matlab notation, is as follows.

• Initialization. Perform the Rayleigh-Ritz procedure in the trial subspace spanned by the columns of an $n×m$ matrix X i.e. compute
• L = X’*A*X, if problem $\ge$ 0, and L = X’*B*A*B*X otherwise,
• M = X’*B*X (B=I if problem = 0),

and solve the generalized eigenvalue problem for the matrix pencil $L-t\ast M$, i.e. compute an $m×m$ matrix Q such that Q’*M*Q is the identity matrix and D = Q’*L*Q is a diagonal matrix. Compute X = X*Q and set Z = F = [].

• Main loop.

DO

1. If problem$\ge$0, compute the residual matrix R = A*X - B*X*D, where D is a diagonal matrix with entries  $\text{D(j,j)}=\frac{\text{X(:,j)’*A*X(:,j)}}{\text{X(:,j)’*B*X(:,j)}}$ (7.4)

else compute R = A*B*X - X*D, where D is a diagonal matrix with entries

 $\text{D(j,j)}=\frac{\text{X(:,j)’*B*A*B*X(:,j)}}{\text{X(:,j)’*B*X(:,j)}}.$ (7.5)

2. Perform the orthogonalization of R to constraints C by updating R = R - B*C*(C’*R).
3. If options%err_est $=$1, compute R’*B*R and use it to compute error bounds; otherwise only compute the diagonal of this matrix and use to compute error bounds. Test for converged eigenpairs and move converged eigenvectors from X to C and reduce m accordingly. Exit the main loop if X = [].
4. If problem is non-negative, compute the preconditioned gradient matrix Y = T*R.
5. If Z is not empty, conjugate Y to Z, i.e.
1. if problem is non-negative, then compute P = Z’*A*Y, otherwise compute P = Z’*B*A*B*Y;
2. compute S = Z’*B*Y;
3. update Y = Y + Z*H, where  (7.6)

where F is described in step 8.

6. Perform orthogonalization of the search direction matrix Y to constraints C by updating
Y = Y - C*(C’*B*Y).
7. B-normalize the columns of Y and reorder them so that the B-norm of the projection of Y(:,j) onto the linear span of X and Y(:,1:j-1) is an increasing function of j. Compute M = [X Y]’*B*[X Y]. If the condition number of M is greater than the allowed limit of $1{0}^{4}$ then start removing the last columns of Y and M and respective rows of M until the condition number falls below the limit.
8. Perform the Rayleigh-Ritz procedure in the linear span of columns of [X Y]. Update X by selecting Ritz vectors corresponding to the leftmost m Ritz values, and place the remaining Ritz vectors into Z and corresponding Ritz values onto the diagonal of F.

END DO

The orthogonalization to constraints on step 2 ensures that the algorithm deals with the residuals for the constrained problem rather than with those for the unconstrained one, which may not go to zeros if the constraints are not close enough to exact eigenvectors.

(7.6) ensures optimality of the new search direction vector Y(:,j), notably, the search for the minimum of the Rayleigh quotient (7.4) or, in the case of problem (7.3), (7.5) in the direction of this vector produces asymptotically smallest possible value.

The search directions cleanup procedure employed on step 7 ensures that the convergence of iterations is not damaged by the computational errors of the LAPACK eigensolver _SYGV, in the Rayleigh-Ritz procedure of step 8. The accuracy of _SYGV is aﬀected by the condition number of M. If the latter is very large, poor accuracy in the computed Ritz vectors may lead to convergence failure. The ordering of Y(:,j) ensures that the ‘least important’ search direction is dropped if the condition number of M is unacceptably large. In practice, the loss of search direction is a rare and does not lead to convergence problems.

If the number of sought eigenpairs exceeds m, then m is not reduced on step 3. Instead, the approximate eigenvectors moved to C are replaced with vectors from Z.

#### 7.6.2 Error estimation

##### Standard problem

If options%err_est = 1, the error estimates for the eigenvalues are based on the eigenvalues of a matrix of the form

$\begin{array}{rcll}Â={\stackrel{̃}{\Lambda }}_{k}-{S}_{k}^{T}{S}_{k},& & & \text{(7.7)}\text{}\text{}\end{array}$

where ${\stackrel{̃}{\Lambda }}_{k}$ is a diagonal matrix with the $k-1$ leftmost Ritz values ${\stackrel{̃}{\lambda }}_{j}$ on the diagonal, and the columns of ${S}_{k}$ are the respective residual vectors ${r}_{j}=A{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}{\stackrel{̃}{x}}_{j}$ divided by $\sqrt{{\lambda }_{k}-{\stackrel{̃}{\lambda }}_{j}}$. If $k$ is such that ${\stackrel{̃}{\lambda }}_{k-1}<{\lambda }_{k}$, then the eigenvalues of $Â$ are the left-hand side bounds for eigenvalues ${\lambda }_{i}$, and thus the diﬀerence ${\stackrel{̃}{\lambda }}_{j}-{\stackrel{̂}{\lambda }}_{j}$ estimates the eigenvalue error ${\stackrel{̃}{\lambda }}_{j}-{\lambda }_{j}$. The unknown ${\lambda }_{k}$ is replaced by ${\stackrel{̃}{\lambda }}_{k}$, and select the maximal $k\le m$ for which the distance between ${\stackrel{̃}{\lambda }}_{k-1}$ and ${\stackrel{̃}{\lambda }}_{k}$ exceeds the sum of the absolute error tolerance for eigenvalues and the Frobenius norm of the matrix formed by the residuals ${r}_{j},j=1,\dots ,k-1$. If ${\stackrel{̃}{\lambda }}_{j}-{\stackrel{̂}{\lambda }}_{j}$ is close to the machine accuracy, it may be too polluted by round-oﬀ errors to rely upon. In such case, we use instead

$\begin{array}{rcll}{\stackrel{̃}{\lambda }}_{j}-{\lambda }_{j}\le {\delta }_{j}\approx \frac{\parallel {r}_{j}{\parallel }^{2}}{{\stackrel{̃}{\lambda }}_{k}-{\lambda }_{j}}.& & & \text{(7.8)}\text{}\text{}\end{array}$

The eigenvector errors are estimated based on the Davis-Kahan inequality:

$\begin{array}{rcll}\underset{x\in {\mathsc{𝒳}}_{k-1}}{min}sin\left\{{\stackrel{̃}{x}}_{j};x\right\}\le \frac{\parallel {r}_{j}\parallel }{{\lambda }_{k}-{\stackrel{̃}{\lambda }}_{j}}\approx \frac{\parallel {r}_{j}\parallel }{{\stackrel{̃}{\lambda }}_{k}-{\stackrel{̃}{\lambda }}_{j}},& & & \text{(7.9)}\text{}\text{}\end{array}$

where ${\mathsc{𝒳}}_{k-1}$ is the invariant subspace corresponding to $k-1$ leftmost eigenvalues.

If options%err_est$=$2 the errors are estimated based on the eigenvalue decrements history, which produces an estimate for the asymptotic convergence facotr, the geometrical average of the eigenvalue error reduction per iteration:

 ${q}_{ij}={\left|\frac{{\lambda }_{j}^{i}-{\lambda }_{j}^{i-1}}{{\lambda }_{j}^{i}-{\lambda }_{j}^{0}}\right|}^{\frac{1}{i}}\approx {\left|\frac{{\lambda }_{j}-{\lambda }_{j}^{i-1}}{{\lambda }_{j}-{\lambda }_{j}^{0}}\right|}^{\frac{1}{i}}={\left|\frac{{\lambda }_{j}-{\lambda }_{j}^{i-1}}{{\lambda }_{j}-{\lambda }_{j}^{i-2}}\cdots \frac{{\lambda }_{j}-{\lambda }_{j}^{1}}{{\lambda }_{j}-{\lambda }_{j}^{0}}\right|}^{\frac{1}{i}}$ (7.10)

where ${\lambda }_{j}^{i}$ is the approximation to ${\lambda }_{j}$ on $i$-th iteration (see Technical Report RAL-TR-2010-19 for further details). Unlike the residual estimates mentioned in this section, such ‘kinematic’ error estimates are not guaranteed to be upper bounds for the actual errors. However, the numerical tests have demonstrated that kinematic error estimates are signiﬁcantly more accurate, i.e. closer to the actual error, than the residual-based estimates. Furthermore, they straightforwardly apply to the generalized case as well.

##### Generalized problems

In the case of the generalized eigenvalue problem (7.2), all of the residual norms in the previous section must be replaced with $\parallel \cdot {\parallel }_{{B}^{-1}}$-norm of the residual ${r}_{j}=A{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}B{\stackrel{̃}{x}}_{j}$ ($\parallel {r}_{j}{\parallel }_{{B}^{-1}}^{2}={r}_{j}^{\ast }{B}^{-1}{r}_{j}$) or its upper estimate, e.g. ${\beta }_{1}^{-1∕2}\parallel \cdot \parallel$, where ${\beta }_{1}$ is the smallest eigenvalue of $B$. Hence, if ${\beta }_{1}$ is known, then the error tolerances for eigenvalues and eigenvectors must be multiplied by ${\beta }_{1}$ and $\sqrt{{\beta }_{1}}$ respectively. If no estimate for $\parallel \cdot {\parallel }_{{B}^{-1}}$-norm is available, then the use of non-zero residual tolerances and options%err_est$=$1 is not recommended. In the case of problem (7.3), the residuals are computed as ${r}_{j}=AB{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}{\stackrel{̃}{x}}_{j}$, $B$-norms of ${r}_{j}$ are used in (7.8) and (7.9), and Lehmann matrix becomes $Â={\stackrel{̃}{\Lambda }}_{k}-{S}_{k}^{T}B\phantom{\rule{1em}{0ex}}{S}_{k}$.

#### References

[1] E. E. Ovtchinnikov and J. Reid. A preconditioned block conjugate gradient algorithm for computing extreme eigenpairs of symmetric and Hermitian problems. Technical Report RAL-TR-2010-19, 2010.

[2] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian eigenvalue computation I: Computing an extreme eigenvalue, SIAM J. Numer. Anal., 46:2567–2592, 2008.

[3] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian eigenvalue computation II: Computing several extreme eigenvalues, SIAM J. Numer. Anal., 46:2593–2619, 2008.

### 7.7 Examples

#### 7.7.1 Preconditioning example

The following code computes the 5 leftmost eigenpairs of the matrix $A$ of order 100 that approximates the two-dimensional Laplacian operator on a 20-by-20 grid. One forward and one backward Gauss-Seidel update are used for preconditioning, which halves the number of iterations compared with solving the same problem without preconditioning. The module laplace2d (examples/Fortran/ssmfe/laplace2d.f90) supplies the subroutine apply_laplacian() that multiplies a block of vectors by $A$, and a subroutine apply_gauss_seidel_step() that computes $y=Tx$ for a block of vectors $x$ by applying one forward and one backward update of the Gauss-Seidel method to the system $Ay=x$.

! examples/Fortran/ssmfe/precond_core.f90
! Laplacian on a square grid (using SPRAL_SSMFE_CORE routines)
program ssmfe_core_precond_example
use spral_random
use spral_ssmfe_core
use laplace2d ! implement Laplacian and preconditioners
implicit none

integer, parameter :: wp = kind(0d0) ! working precision is double

integer, parameter :: l   = 20    ! grid points along each side
integer, parameter :: n   = l*l   ! problem size
integer, parameter :: nep = 5     ! eigenpairs wanted
integer, parameter :: m = 3       ! dimension of the iterated subspace
real(wp), parameter :: tol = 1e-6 ! eigenvector tolerance

type(random_state) :: state       ! PRNG state

real(wp), external :: dnrm2, ddot ! BLAS functions

integer :: ncon                 ! number of converged eigenpairs
integer :: i, j, k
integer :: ind(m)               ! permutation index
real(wp) :: s
real(wp) :: lambda(n)           ! eigenvalues
real(wp) :: X(n, n)             ! eigenvectors
real(wp) :: lmd(m)              ! work array
real(wp) :: rr(2*m, 2*m, 3)     ! work array
real(wp) :: W(n, m, 0:5)        ! work array
real(wp) :: U(n, m)             ! work array
type(ssmfe_rcid        ) :: rci       ! reverse communication data
type(ssmfe_core_options) :: options   ! options
type(ssmfe_core_keep   ) :: keep      ! private data
type(ssmfe_inform      ) :: inform    ! information

! Initialize W to lin indep vectors by randomizing
do i=1,n
do j=1,m
W(i,j,0) = random_real(state, positive=.true.)
end do
end do

ncon = 0
rci%job = 0
do ! reverse communication loop
!    call ssmfe( rci, 0, nep, 0, m, lmd, rr, ind, keep, options, inform )
!    call ssmfe( rci, 0, 0, nep, m, lmd, rr, ind, keep, options, inform )
call ssmfe_largest( rci, 0, nep, m, lmd, rr, ind, keep, options, inform )
select case ( rci%job )
case ( 1 )
call apply_laplacian &
( l, l, rci%nx, W(1 : n, rci%jx : rci%jx + rci%nx - 1, rci%kx), &
W(1 : n, rci%jy : rci%jy + rci%nx - 1, rci%ky ) )
call dscal( n*rci%nx, -1.0D0, W(1, rci%jy, rci%ky), 1 )
case ( 2 )
call dcopy( n*rci%nx, W(1, rci%jx, rci%kx), 1, W(1, rci%jy, rci%ky), 1 )
!      call apply_gauss_seidel_step &
!        ( l, l, rci%nx, W(1 : n, rci%jx : rci%jx + rci%nx - 1, rci%kx), &
!          W(1 : n, rci%jy : rci%jy + rci%nx - 1, rci%ky ) )
case ( 999 )
print *, rci%i, rci%j, rci%k
exit
case ( 4 )
print *, ’iteration’, inform%iteration
do j = 1, m
print ’(e14.7, 2e10.1)’, lmd(j), inform%residual_norms(j), inform%err_X(j)
if ( inform%converged(j) /= 0 ) then
cycle
else if ( inform%err_X(j) > 0 .and. inform%err_X(j) < tol ) then
inform%converged(j) = 1
end if
end do
case ( 5 )
if ( rci%i < 0 ) cycle
do k = 0, rci%nx - 1
j = ncon + k + 1
lambda(j) = lmd(rci%jx + k)
call dcopy( n, W(1, rci%jx + k, 0), 1, X(1, j), 1 )
end do
ncon = ncon + rci%nx
!      if ( ncon > 0 .or. inform%iteration > 250 ) exit
if ( ncon >= nep .or. inform%iteration > 10 ) exit
case ( 11 )
if ( rci%i == 0 ) then
if ( rci%kx /= rci%ky .or. rci%jx > rci%jy ) then
call dcopy &
( n * rci%nx, W(1, rci%jx, rci%kx), 1, W(1, rci%jy, rci%ky), 1 )
else if ( rci%jx < rci%jy ) then
do j = rci%nx - 1, 0, -1
call dcopy &
( n, W(1, rci%jx + j, rci%kx), 1, W(1, rci%jy + j, rci%ky), 1 )
end do
end if
else
do i = 1, n
do j = 1, rci%nx
U(i, j) = W(i, ind(j), rci%kx)
end do
do j = 1, rci%nx
W(i, j, rci%kx) = U(i, j)
end do
if ( rci%ky /= rci%kx ) then
do j = 1, rci%nx
U(i, j) = W(i, ind(j), rci%ky)
end do
do j = 1, rci%nx
W(i, j, rci%ky) = U(i, j)
end do
end if
end do
end if
case ( 12 )
do i = 0, rci%nx - 1
rr(rci%i + i, rci%j + i, rci%k) = &
ddot(n, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1)
end do
case ( 13 )
do i = 0, rci%nx - 1
if ( rci%kx == rci%ky ) then
s = dnrm2(n, W(1, rci%jx + i, rci%kx), 1)
if ( s > 0 ) &
call dscal( n, 1/s, W(1, rci%jx + i, rci%kx), 1 )
else
s = sqrt(abs(ddot &
(n, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1)))
if ( s > 0 ) then
call dscal( n, 1/s, W(1, rci%jx + i, rci%kx), 1 )
call dscal( n, 1/s, W(1, rci%jy + i, rci%ky), 1 )
else
call dcopy( n, 0.0D0, 0, W(1, rci%jy + i, rci%ky), 1 )
end if
end if
end do
case ( 14 )
do i = 0, rci%nx - 1
s = -rr(rci%i + i, rci%j + i, rci%k)
call daxpy&
( n, s, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1 )
end do
case ( 15 )
if ( rci%nx > 0 .and. rci%ny > 0 ) &
call dgemm &
( ’T’, ’N’, rci%nx, rci%ny, n, &
rci%alpha, W(1, rci%jx, rci%kx), n, W(1, rci%jy, rci%ky), n, &
rci%beta, rr(rci%i, rci%j, rci%k), 2*m )
case ( 16, 17 )
if ( rci%ny < 1 ) cycle
if ( rci%nx < 1 ) then
if ( rci%job == 17 ) cycle
if ( rci%beta == 1.0D0 ) cycle
do j = rci%jy, rci%jy + rci%ny - 1
call dscal( n, rci%beta, W(1, j, rci%ky), 1 )
end do
cycle
end if
if ( rci%job == 17 ) then
call dgemm &
( ’N’, ’N’, n, rci%ny, rci%nx, &
1.0D0, W(1, rci%jx, rci%kx), n, rr(rci%i, rci%j, rci%k), 2*m, &
0.0D0, W(1, rci%jy, rci%ky), n )
call dcopy &
( n * rci%ny, W(1, rci%jy, rci%ky), 1, W(1, rci%jx, rci%kx), 1 )
else
call dgemm &
( ’N’, ’N’, n, rci%ny, rci%nx, &
rci%alpha, W(1, rci%jx, rci%kx), n, rr(rci%i, rci%j, rci%k), 2*m, &
rci%beta, W(1, rci%jy, rci%ky), n )
end if
case ( 21, 22 )
if ( ncon > 0 ) then
call dgemm &
( ’T’, ’N’, ncon, rci%nx, n, &
1.0D0, X, n, W(1, rci%jy, rci%ky), n, 0.0D0, U, n )
call dgemm &
( ’N’, ’N’, n, rci%nx, ncon, &
-1.0D0, X, n, U, n, 1.0D0, W(1, rci%jx, rci%kx), n )
end if
case ( :-1 )
exit
end select
end do
print ’(i3, 1x, a, 1x, i3, 1x, a)’, ncon, ’eigenpairs converged in’, inform%iteration, ’iterations’
print ’(1x, a, i1, a, es14.7)’, &
(’lambda(’, i, ’) = ’, lambda(i), i = 1, ncon)
call ssmfe_free( keep, inform )
end program ssmfe_core_precond_example
The code generates the following output:
5 eigenpairs converged in  72 iterations
lambda(1) = 4.4676695E-02
lambda(2) = 1.1119274E-01
lambda(3) = 1.1119274E-01
lambda(4) = 1.7770878E-01
lambda(5) = 2.2040061E-01