SSMFE_EXPERT: Sparse Symmetric Matrix-Free Eigensolver (Expert Interface)

Sparse Symmetric Matrix-Free Eigensolver, expert interface

Fortran User Guide
This package computes extreme (leftmost and/or rightmost) eigenpairs $\left\{{\lambda }_{i},{x}_{i}\right\}$ of the following eigenvalue problems:
• the standard eigenvalue problem $\begin{array}{rcll}Ax=\lambda x,& & & \text{(8.1)}\text{}\text{}\end{array}$

• the generalized eigenvalue problem $\begin{array}{rcll}Ax=\lambda Bx,& & & \text{(8.2)}\text{}\text{}\end{array}$

• the buckling problem $\begin{array}{rcll}Bx=\lambda Ax,& & & \text{(8.3)}\text{}\text{}\end{array}$

where $A$ and $B$ are real symmetric (or Hermitian) matrices and $B$ is positive deﬁnite.

SPRAL_SSMFE_EXPERT delegates a considerable part of the computation to the user, which makes its solver procedures rather diﬃcult to use. The user is advised to consider ﬁrst using the package SPRAL_SSMFE, which oﬀers a simple interface to SPRAL_SSMFE_EXPERT.

Major version history

2014-11-20 Version 1.0.0
Initial release

8.1 Installation

Please see the SPRAL install documentation. In particular note that:

• A BLAS library is required.
• A LAPACK library is required.

8.2 Usage overview

The eigensolver subroutines behind SPRAL_SSMFE_EXPERT implement a block iterative algorithm. The block nature of this algorithm allows the user to beneﬁt from highly optimized linear algebra subroutines and from the ubiquitous multicore architecture of modern computers. It also makes this algorithm more reliable than Krylov-based algorithms employed e.g. by ARPACK in the presence of clustered eigenvalues. However, convergence of the iterations may be slow if the density of the spectrum is high.

Thus, good performance (in terms of speed) is contingent on the following two factors: (i) the number of desired eigenpairs must be substantial (e.g. not less than the number of CPU cores), and (ii) the employment of a convergence acceleration technique. The acceleration techniques that can be used are shift-and-invert and preconditioning. The former requires the direct solution of linear systems with the matrix $A$ or its linear combination with $B$, for which a sparse symmetric indeﬁnite solver (such as HSL_MA97 or SPRAL_SSIDS) can be employed. The latter applies to the case of positive deﬁnite $A$ and requires a matrix or an operator (that is, an algorithm producing a vector $v=Tu$ for a given vector $u$) $T$, called a preconditioner, such that the vector $v=Tf$ is an approximation to the solution $u$ of the system $Au=f$. This technique is more sophisticated and is likely to be of interest only to experienced users.

Further information on the algorithm used by SPRAL_SSMFE_EXPERT can be found in Section 8.6.

SPRAL_SSMFE_EXPERT delegates a considerable part of the computation to the user. The user’s code stores all vectors of size equal to the problem size $n$. SPRAL_SSMFE_EXPERT is not “aware” of $n$ or how these vectors are stored; all operations on these vectors are performed by the user. The amount of computation performed by the solver subroutines of SPRAL_SSMFE_EXPERT and the memory they use are negligible. These features facilitate the use of these subroutines for shared-memory, out-of-core and hybrid computation. A simpler but less ﬂexible interface to SPRAL_SSMFE_EXPERT is oﬀered by SPRAL_SSMFE.

8.2.1 Calling sequences

use SPRAL_SSMFE_EXPERT

The following procedures are available to the user:

• ssmfe_standard() computes leftmost eigenpairs of (8.1), optionally using preconditioning if $A$ is positive deﬁnite
• ssmfe_standard_shift() computes eigenpairs of (8.1) near a given shift using the shift-and-invert technique
• ssmfe_generalized() computes leftmost eigenpairs of (8.2), optionally using preconditioning if $A$ is positive deﬁnite
• ssmfe_generalized_shift() computes eigenpairs of (8.2) near a given shift using the shift-and-invert technique
• ssmfe_buckling() computes eigenpairs of (8.3) near a given shift using the shift-and-invert technique
• ssmfe_free() should be called after all other calls are complete. It frees memory referenced by keep and inform.

The main solver procedures must be called repeatedly using a reverse communication interface. The procedure ssmfe_free() should be called once after the ﬁnal call to a solver procedure to deallocate all arrays that have been allocated by the solver procedure.

8.2.2 Package types

INTEGER denotes default INTEGER, and REAL denotes REAL(kind=kind(0d0)). The term call type means REAL(kind=kind(0d0)) for calls to the double precision real interface, and COMPLEX(kind=kind(0d0)) for calls to the double precision complex interface.

8.2.3 Derived types

For each problem, the user must employ the derived types deﬁned by the module to declare scalars of the types ssmfe_rcid (real version) or ssmfe_rciz (complex version), ssmfe_expert_keep, ssmfe_options and ssmfe_inform. The following pseudocode illustrates this.

use SPRAL_SSMFE_EXPERT
...
type (ssmfe_rcid       ) :: rcid
type (ssmfe_expert_keep) :: keep
type (ssmfe_options    ) :: options
type (ssmfe_inform     ) :: inform
...

8.3 Argument lists

8.3.1 ssmfe_standard(), ssmfe_standard_shift(), ssmfe_generalized(),ssmfe_generalized_shift(), and ssmfe_buckling()

To compute the leftmost eigenpairs of (8.1), optionally using preconditioning, the following call must be made repeatedly:

call ssmfe_standard( rci, left, mep, lambda, m, rr, ind, keep, options, inform )

To compute the eigenvalues of (8.1) in the vicinity of a given value sigma and the corresponding eigenvectors using the shift-and-invert technique, the following call must be made repeatedly:

call ssmfe_standard_shift &

( rci, sigma, left, right, mep, lambda, m, rr, ind, keep, options, inform )

To compute the leftmost eigenpairs of (8.2), optionally using preconditioning, the following call must be made repeatedly:

call ssmfe_generalized( rci, left, mep, lambda, m, rr, ind, keep, options, inform )

To compute the eigenvalues of (8.2) in the vicinity of a given value sigma and the corresponding eigenvectors using the shift-and-invert technique, the following call must be made repeatedly:

call ssmfe_generalized_shift &

( rci, sigma, left, right, mep, lambda, m, rr, ind, keep, options, inform )

To compute the eigenvalues of (8.3) in the vicinity of a given value sigma and the corresponding eigenvectors the following call must be made repeatedly:

call ssmfe_buckling &

( rci, sigma, left, right, mep, lambda, m, rr, ind, keep, options, inform )

To use the solver procedures, the user needs to maintain a workspace W containing kw + 1 blocks of m vectors of size $n$. A value kw = 7 is always suﬃcient. However, if options%minAprod $=$ .false. and either options%minBprod $=$ .false. or the standard eigenvalue problem (8.1) is solved, then kw = 3 is suﬃcient; if options%minAprod $=$ .true. and either options%minBprod $=$ .true. or ssmfe_generalized_shift or ssmfe_buckling are used, then kw must be at least 7; otherwise kw = 5 is suﬃcient. Solver procedures use indices 1 to m to refer to vectors inside each block and indices 0 to kw to refer to particular blocks. The ﬁrst (zero-indexed) block holds the eigenvector approximations: the user must ﬁll this block with m linearly independent vectors before the ﬁrst call to a solver procedure.

The number of desired eigenpairs may exceed m: whenever converged eigenpairs have been detected, a solver procedure reports the indices of these eigenpairs and they must be moved by the user to a separate eigenvectors’ storage X.

When $B\ne I$, it is expedient to have storage BX for the $B$-images of the converged eigenvectors, i.e. BX = B*X.

To simplify the description of the reverse communication interface, below we assume that an array W(n,m,0:kw) of package type is used as a workspace, and that arrays X(n, mep) and BX(n, mep) of package type are used for storing the computed eigenvectors and their $B$-images. The transpose (real or complex, depending on the package type) of a matrix H is denoted by H${}^{\prime }$.

The meaning of the arguments of the solver procedures is as follows.

rci
is an INTENT(INOUT) scalar of type ssmfe_rcid in the real version and ssmfe_rciz in the complex version. Before the ﬁrst call, rci%job must be set to 0. No other values may be assigned to rci%job by the user. After each call, the value of rci%job must be inspected by the user’s code and the appropriate action taken:
-3
: fatal error return, the computation must be terminated;
-2
: not all desired eigenpairs converged to required accuracy, see Section 8.5;
-1
: the computation is complete and successful.
1
: (ssmfe_standard() and ssmfe_generalized() only) the user must compute $V=AU$, where

$U=$ W(:, ix:jx, rci%kx),  with  ix$=$ rci%jx and  jx$=$ ix + rci%nx - 1,

$V=$ W(:, iy:jy, rci%ky),  with  iy$=$ rci%jy and  jy$=$ iy + rci%nx - 1.

2
: (ssmfe_standard() and ssmfe_generalized() only) the user must compute $V=TU$ if preconditioning is used or copy $U$ to $V$ otherwise, where $U$ and $V$ are as for rci%job = 1.
3
: (ssmfe_generalized(), ssmfe_generalized_shift() and ssmfe_buckling() only) the user must compute $V=BU$ where $U$ and $V$ are as for rci%job = 1.
5
: the user must save the converged eigenvectors to the eigenvector storage X and, optionally, for problems (8.2) and (8.3), save their $B$-images. The converged eigenvectors are columns of the $n×m$ matrix W(:,:,rci%kx) and their $B$-images are respective columns of W(:,:,rci%ky) that are identiﬁed by rci%i, rci%jx and rci%nx as follows: if rci%i > 0, then the column numbers run from rci%jx to rci%jx + rci%nx - 1, and if rci%i < 0, then they run from rci%jx - rci%nx + 1 to rci%jx.
9
: (ssmfe_standard_shift(), ssmfe_generalized_shift() and ssmfe_buckling() only) the user must compute $V={A}_{\sigma }^{-1}U$, where ${A}_{\sigma }=A-\sigma I$ and $I$ is $n×n$ identity, for problem (8.1), ${A}_{\sigma }=A-\sigma B$ for problem (8.2), and ${A}_{\sigma }=B-\sigma A$ for problem (8.3).
11
: if rci%i = 0, then the user must perform a copy $V←U$, where $U$ and $V$ are as for rci%job = 1, otherwise the columns of W(:,:,rci%kx) and W(:,:,rci%ky) (if rci%kx $\ne$ rci%ky) must be reordered using the index array ind so that the column ind(j) becomes column j for j = 1, …, rci%nx.
12
: for each i = 0, 1,..., rci%nx - 1, the user must compute the dot product of the columns

W(:, rci%jx + i, rci%kx)

and

W(:, rci%jy + i, rci%ky)

and place it in

rr(rci%i + i, rci%j + i, rci%k).

13
: if rci%kx$=$ rci%ky, then for each i = 0, 1,..., rci%nx - 1, the user must perform the scaling

W(:, rci%jx + i, rci%kx) = W(:, rci%jx + i, rci%kx)$∕{s}_{i}$,

where ${s}_{i}$ is the 2-norm of the column W(:, rci%jx + i, rci%kx), otherwise the user must perform the scalings

W(:, rci%jx + i, rci%kx) = W(:, rci%jx + i, rci%kx)$∕{s}_{i}$

W(:, rci%jy + i, rci%ky) = W(:, rci%jy + i, rci%ky)$∕{s}_{i}$,

where ${s}_{i}$ is the square root of the dot product of the columns W(:, rci%jx + i, rci%kx) and W(:, rci%jy + i, rci%ky). No scaling is to be applied to zero columns.

14: for each i = 0, 1,..., rci%nx - 1, the user must perform axpy-updates:

W(:, rci%jy + i, rci%ky) = W(:, rci%jy + i, rci%ky) +

rr(rci%i + i, rci%j + i, rci%k) * W(:, rci%jx + i, rci%kx).

15: the user must perform the matrix multiplication:

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny-1, rci%k) =

rci%alpha * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx)’ *

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) +

rci%beta * rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k).

16: the user must perform the matrix multiplication:

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) =

rci%alpha * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) *

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k) +

rci%beta * W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky).

17: the user must perform the multiplication:

W(:, rci%jx $:$ rci%jx + rci%ny - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) *

rr(rci%i $:$ rci%i + rci%nx - 1, rci%j $:$ rci%j + rci%ny - 1, rci%k).

W(:, rci%jy $:$ rci%jy + rci%ny - 1, rci%ky) can be used as a workspace.

21: the user must $B$-orthogonalize the columns of W speciﬁed by rci%nx, rci%jx and rci%kx to all vectors stored in X by solving the system

(X’ * BX) Q = X’ * W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky)

for Q and updating

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) - X * Q.

For problems (8.2) and (8.3), the respective columns of W(:,:,rci%ky), which store $B$-images of the respective columns of W(:,:,rci%kx), must be updated accordingly, either by applying B to these vectors or using the columns of BX, i.e.

W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky) =

W(:, rci%jy $:$ rci%jy + rci%nx - 1, rci%ky) - BX * Q;

22: the user must solve the system

(X’ * BX) Q = X’ * W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx)

for Q and perform the update

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) =

W(:, rci%jx $:$ rci%jx + rci%nx - 1, rci%kx) - BX * Q,

where X and BX are same as in the case rci%job = 21 (in the case of problem (8.1), rci%job = 21 and 22 require exactly the same computation).

999: If rci%k > 0, then a restart, normally with a larger block size m, is suggested with the aim of achieving better convergence. If the suggestion is accepted, the user must compute the new block size as m = rci%nx + k + l, where k $\ge$ rci%i and l $\ge$ rci%j, reallocate the workspace array W if the new block size is diﬀerent from the old one, and set rci%i = 0 and rci%j = 0. If the restart is not acceptable (e.g. the new block size exceeds a certain limit set by the user), then nothing needs to be done. If rci%k == 0, then the restart with the same block size m is required. In both restart cases, the ﬁrst block W(:,:,0) of the new workspace should retain the vectors W(:,i:j,0), where i = rci%jx and j = i + rci%nx - 1, from the old workspace. The remaining m - rci%nx columns of W(:,:,0) must be ﬁlled with arbitrary vectors that are linearly independent from the converged eigenvectors and such that the entire set of the columns of W(:,:,0) is linearly independent.

Restriction: rci%job = 0, rci%i = 0 and rci%j = 0 are the only assignments to the components of rci that can be done by the user. The ﬁrst one can only be done before the ﬁrst call. The other two can only be done if rci%job = 999 and rci%k > 0.

sigma
is an INTENT(IN) scalar of type REAL that holds the shift, a value around which the desired eigenvalues are situated.
left
is an INTENT(IN) scalar of type default INTEGER that holds the number of desired eigenvalues to the left of sigma. Restriction: $0<$ left + right $\le$ min(mep, n/2), where right is zero for ssmfe_standard() and ssmfe_generalized().
right
is an INTENT(IN) scalar of type default INTEGER that holds the number of desired eigenvalues to the right of sigma. Restriction: $0<$ left + right $\le$ min(mep, n/2).
mep
is an INTENT(IN) scalar of type default INTEGER that holds the size of the array lambda. See Section 8.6 for guidance on setting mep. Restriction: mep is not less than the number of desired eigenpairs.
lambda(:)
is an INTENT(INOUT) array of type REAL and size mep that is used to store the computed eigenvalues. After a successful completion of the computation it contains eigenvalues in ascending order. This array must not be changed by the user.
m
is an INTENT(IN) scalar of type INTEGER that holds the block size of the user’s workspace W. Restriction:
2 $\le$ m $<$ n.
rr(:,:,:)
is an INTENT(INOUT) work array of package type, and dimensions 2*m, 2*m and 3. It can only be changed by the user when instructed to do so by rci%job.
ind(:)
is an INTENT(INOUT) array of default integer type, and size at least m. It must not be changed by the user. It is used for reordering the columns of some blocks of W.
keep
is an INTENT(INOUT) scalar of type ssmfe_expert_keep that holds private data.
options
is an INTENT(IN) scalar of type ssmfe_options. Its components oﬀer the user a range of options, see Section 8.4.1. It must not be changed by the user between calls.
inform
is an INTENT(INOUT) scalar of type ssmfe_inform. Its components provide information about the execution of the subroutine, see Section 8.4.2. It must not be changed by the user.

8.3.2 ssmfe_free()

At the end of the computation, the memory allocated by the solver procedures should be released by making the following subroutine call:

ssmfe_free( keep, inform )

keep
is an INTENT(INOUT) scalar of type ssmfe_expert_keep, optional. On exit, its components that are allocatable arrays will have been deallocated.
inform
is an INTENT(INOUT) scalar of type ssmfe_inform, optional. On exit, its components that are allocatable arrays will have been deallocated.

8.4 Derived types

8.4.1 type(ssmfe_options)

The derived data type ssmfe_options has the following components.

Convergence control options

abs_tol_lambda
is a scalar of type REAL that holds an absolute tolerance used when testing the estimated eigenvalue error, see Section 8.6. The default value is 0. Negative values are treated as the default.
abs_tol_residual
is a scalar of type REAL that holds an absolute tolerance used when testing the residual, see Section 8.6. The default value is 0. Negative values are treated as the default.
max_iterations
is a scalar of type default INTEGER that contains the maximum number of iterations to be performed. The default value is 100. Restriction: max_it $\ge$ 0.
rel_tol_lambda
is a scalar of type REAL that holds a relative tolerance used when testing the estimated eigenvalue error, see Section 8.6. The default value is 0. Negative values are treated as the default.
rel_tol_residual
is a scalar of type REAL that holds a relative tolerance used when testing the residual, see Section 8.6. If both abs_tol_residual and rel_tol_residual are set to 0, then the residual norms are not taken into consideration by the convergence test, see Section 8.6. The default value is 0. Negative values are treated as the default.
tol_x
is a scalar of type REAL that holds a tolerance used when testing the estimated eigenvector error, see Section 8.6. If tol_x is set to zero, the eigenvector error is not estimated. If a negative value is assigned, the tolerance is set to 10*epsilon(lambda). The default value is -1.0.

Printing options

print_level
is a scalar of type default INTEGER that determines the amount of printing. Possible values are:
 $<0$ : no printing; $0$ : error and warning messages only; $1$ : the type (standard or generalized) and the size of the problem, the number of eigenpairs requested, the error tolerances and the size of the subspace are printed before the iterations start; $2$ : as $1$ but, for each eigenpair tested for convergence (see Section 8.6), the iteration number, the index of the eigenpair, the eigenvalue, whether it has converged, the residual norm, and the error estimates are printed; $>2$ : as $1$ but with all eigenvalues, whether converged, residual norms and eigenvalue/eigenvector error estimates printed on each iteration.

The default value is 0. Note that for eigenpairs that are far from convergence, ‘rough’ error estimates are printed (the estimates that are actually used by the stopping criteria, see Section 8.6, only become available on the last few iterations).

unit_error
is a scalar of type default INTEGER that holds the unit number for error messages. Printing is suppressed if unit_error < 0. The default value is 6.
unit_diagnostic
is a scalar of type default INTEGER that holds the unit number for messages monitoring the convergence. Printing is suppressed if unit_diagnostics < 0. The default value is 6.
unit_warning
is a scalar of type default INTEGER that holds the unit number for warning messages. Printing is suppressed if unit_warning < 0. The default value is 6.

err_est
is a scalar of type default INTEGER that deﬁnes which error estimation scheme for eigenvalues and eigenvectors is to be used by the stopping criterion. Two schemes are implemented. If err_est = 1, residual error bounds are used, namely, a modiﬁed Davis-Kahan estimate for the eigenvector error and the Lehmann bounds for the eigenvalue error. (see Section 8.6.5). If err_est = 2, then the eigenvector and eigenvalue errors are estimated by analyzing the convergence curve for the eigenvalues (see Section 8.6.5). The default is err_est = 2. Restriction: err_est = 1 or 2.
extra_left
is a scalar of type default INTEGER that holds the number of extra approximate eigenvectors corresponding to leftmost eigenvalues that are of no interest to the user and are iterated solely to enhance convergence. The default is extra_left = 0. Restriction: extra_left $\ge$ 0.
extra_right
is a scalar of type default INTEGER that holds the number of extra approximate eigenvectors corresponding to rightmost eigenvalues that are of no interest to the user and are iterated solely to enhance convergence. The default is extra_right = 0. Restriction: extra_right $\ge$ 0.
left_gap
is a scalar of type REAL that is only used when left is non-zero, and speciﬁes the minimal acceptable distance between the last computed left eigenvalue and the rest of the spectrum. For ssmfe_standard() and ssmfe_generalized(), the last computed left eigenvalue is the rightmost of the computed ones, and for the other procedures it is the leftmost. If set to a negative value $\delta$, the minimal distance is taken as $|\delta |$ times the average distance between the computed eigenvalues. Note that for this option to have any eﬀect, the value of mep must be larger than left + right: see Section 8.6 for further explanation. The default value is 0.
max_left
is a scalar of type default INTEGER that holds the number of eigenvalues to the left from $\sigma$, or a negative value, if this number is not known (cf. Section 8.6.4). The default is max_left = -1.
max_right
is a scalar of type default INTEGER that holds the number of eigenvalues to the right from $\sigma$, or a negative value, if this number is not known. (cf. Section 8.6.4). The default is max_right = -1.
minAprod
is a scalar of type default LOGICAL that determines whether the number of multiplications by $A$ is to be reduced at the expense of memory. If $minAprod=.false.$, on each iteration three returns to the user with rci%job = 1 are made for multiplications of rci%nx vectors by $A$. Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s work array W must be increased by 2. The default is minAprod = .true.. Restriction: minAprod = .true. for ssmfe_standard_shift(), ssmfe_generalized_shift() and ssmfe_buckling().
minBprod
is a scalar of type default LOGICAL that determines whether the number of multiplications by $B$ is to be reduced at the expense of memory. If $minBprod=.false.$, on each iteration at least three returns to the user with rci%job = 3 are made for multiplications of rci%nx vectors by $B$. Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s work array W must be increased by 2. The default is minBprod = .true..
right_gap
is a scalar of type REAL that is only used by ssmfe_shift, ssmfe_gen_shift and ssmfe_buckling with a non-zero right, and has the same meaning as options%left_gap but for the rightmost computed eigenvalue. The default value is 0.
user_x
is a scalar of type default INTEGER. If user_x > 0 then the ﬁrst user_x columns of x(:,:) will be used as initial guesses for eigenvectors. Hence, if the user has good approximations to some of the required eigenvectors, the computation time may be reduced by putting these approximations into the ﬁrst user_x columns of x(:,:). The default value is 0, i.e. the columns of x(:,:) are overwritten by the solver. Restriction: 0 $\le$ user_x $\le$ m, the ﬁrst user_x columns in x(:,:) must be linearly independent.

8.4.2 type(ssmfe_inform)

The derived data type ssmfe_inform is used to hold information from the execution of the solver procedures. The components are:

converged
is a rank-1 allocatable array of type default INTEGER that is allocated to have size mep on a call with rci%job = 0 or 999. If, on some iteration i, an eigenpair (lambda(j), X(j)) has been accepted as converged, then converged(j) = i; if the convergence stagnated then converged(j) = -i; otherwise converged(j) = 0.
err_lambda
is a rank-1 allocatable array of type REAL that is allocated to have size mep on a call with rci%job = 0 or 999. err_lmd(i) contains the estimated eigenvalue error for the approximate eigenvalue lambda(i) if info%converged(i) is non-zero, and -1.0 otherwise.
err_x
is a rank-1 allocatable array of type REAL. This array is allocated to have size mep on a call with rci%job = 0 or 999, and is used for storing the eigenvector errors in the same way as err_lmd is used for storing the eigenvalue errors.
flag
is a scalar of type default INTEGER that is used as an error ﬂag. If a call is successful, flag has value 0. A nonzero value of flag indicates an error or a warning (see Section 8.5).
iteration
is a scalar of type default INTEGER that holds the number of iterations since the previous rci%job = 0 or rci%job = 999 call.
left
is a scalar of type default INTEGER that holds the number of converged eigenvalues on the left, i.e. the total number of converged eigenpairs of (8.1) or the number of the converged eigenvalues of (8.2) or (8.3) to the left of sigma.
next_left
is a scalar of type default REAL that holds the non-converged eigenvalue next to the last converged on the left (cf. options%left_gap).
next_right
is a scalar of type default REAL that holds the non-converged eigenvalue next to the last converged on the right (cf. options%right_gap).
non_converged
is a scalar of type default INTEGER that holds the number of non-converged eigenpairs (see Section 8.5).
residual_norms
is a rank-1 allocatable array of type default REAL that is allocated to have size mep on a call with rci%job = 0 or 999. On returns with rci%job = 5, residual_norms(i) contains the Euclidean norm of the residual for lambda(i), X(i).
right
is a scalar of type default INTEGER that holds the number of converged eigenvalues of (8.2) or (8.3) to the right of sigma.
stat
is a scalar of type default INTEGER that holds the allocation status (see Section 8.5).

8.5 Error ﬂags

A successful return from a solver procedure is indicated by inform%flag$=$0. A negative value indicates an error, a positive value indicates a warning.

Possible negative values of inform%flag are as follows:

-1 Incorrect value of rci%job.
-2 Block size m is out-of-range.
-3 Incorrect value of options%err_est.
-4 Incorrect value of options%minAprod.
-5 Incorrect value of options%extra_left or options%extra_right.
-6 Incorrect value of options%min_gap.
-11 Incorrect value of left.
-12 Incorrect value of right.
-13 mep is less than the number of desired eigenpairs.
-100 Not enough memory; inform%stat contains the value of the Fortran stat parameter.
-200 $B$ is not positive deﬁnite or initial eigenvectors are linearly dependent.

Possible positive values are:

1 The iterations have been terminated because no further improvement in accuracy is possible (this may happen if the preconditioner is not positive deﬁnite, or if the components of the residual vectors are so small that the round-oﬀ errors make them essentially random). The value of inform%non_converged is set to the number of non-converged eigenpairs.
2 The maximal number of iterations has been exceeded. The value of inform%non_converged is set to the number of non-converged eigenpairs.
3 Out of storage space for the converged eigenpairs. The value of inform%non_converged is set to the number of non-converged eigenpairs.

8.6 Method

8.6.1 The algorithm

The solver procedures of SPRAL_SSMFE_EXPERT are interfaces to solver procedures of SPRAL_SSMFE_CORE, which implement a block iterative algorithm based on the Jacobi-conjugate preconditioned gradients method [2,3]. Further information on the algorithm used by SPRAL_SSMFE_EXPERT can be found in the speciﬁcation document for SPRAL_SSMFE_CORE and in [1].

8.6.2 Stopping criteria

An approximate eigenpair $\left\{x,\lambda \right\}$ is considered to have converged if the following three conditions are all satisﬁed:

1. if options%abs_tol_lambda and options%rel_tol_lambda are not both equal to zero, then the estimated error in the approximate eigenvalue must be less than or equal to

max(options%abs_tol_lambda, $\delta$*options%rel_tol_lambda),

where $\delta$ is the estimated average distance between eigenvalues.

2. if options%tol_x is not zero, then the estimated sine of the angle between the approximate eigenvector and the invariant subspace corresponding to the eigenvalue approximated by $\lambda$ must be less than or equal to options%tol_x.
3. if options%abs_tol_residual and options%rel_tol_residual are not both equal to zero, then the Euclidean norm of the residual, $\parallel Ax-\lambda Bx{\parallel }_{2}$, must be less than or equal to

max(options%abs_tol_residual, options%rel_tol_residual*$\parallel \lambda Bx{\parallel }_{2}$).

The extra eigenpairs are not checked for convergence, as their role is purely auxiliary.

8.6.3 Improving eigenvector accuracy

If the gap between the last computed eigenvalue and the rest of the spectrum is small, then the accuracy of the corresponding eigenvector may be very low. To prevent this from happening, the user should set the eigenpairs storage size mep to a value that is larger than the number of desired eigenpairs, and set the options options%left_gap and options%right_gap to non-zero values ${\delta }_{l}$ and ${\delta }_{r}$. These values determine the size of the minimal acceptable gaps between the computed eigenvalues and the rest of the spectrum, ${\delta }_{l}$ referring to either leftmost eigenvalues (for ssmfe_standard() and ssmfe_generalized() only) or those to the left of the shift sigma, and ${\delta }_{r}$ to those to the right of the shift sigma. Positive values of ${\delta }_{l}$ and ${\delta }_{r}$ set the gap explicitly, and negative values require the gap to be not less than their absolute value times the average distance between the computed eigenvalues. A recommended value of ${\delta }_{l}$ and ${\delta }_{r}$ is $-0.1$. The value of mep has little eﬀect on the speed of computation, hence it might be set to any reasonably large value. The larger the value of mep, the larger the size of an eigenvalue cluster for which accurate eigenvectors can be computed, notably: to safeguard against clusters of size up to $k$, it is suﬃcient to set mep to the number of desired eigenpairs plus $k-1$.

8.6.4 The use of shifted matrix factorization

When using the solver procedures that employ the shift-and-invert technique, it is very important to ensure that the numbers of desired eigenvalues each side of the shift do not exceed the actual numbers of these eigenvalues, as the eigenpairs ‘approximating’ non-existing eigenpairs of the problem will not converge. It is therefore strongly recommended that the user employs a linear system solver that performs the LDLT factorization of the shifted system, e.g. HSL_MA97 or SPRAL_SSIDS. The LDLT factorization of the matrix $A-\sigma B$ consists in ﬁnding a unit lower triangular matrix $L$, a block-diagonal matrix $D$ with $1×1$ and $2×2$ blocks on the main diagonal and a permutation matrix $P$ such that ${P}^{T}\left(A-\sigma B\right)P=LD{L}^{T}$. By inertia theorem, the number of eigenvalues to the left and right from the shift $\sigma$ is equal to the number of negative and positive eigenvalues of $D$, which allows quick computation of the eigenvalue numbers each side of the shift.

8.6.5 Error estimation

Standard problem

If options%err_est = 1, the error estimates for the eigenvalues are based on the eigenvalues of a matrix of the form

$\begin{array}{rcll}Â={\stackrel{̃}{\Lambda }}_{k}-{S}_{k}^{T}{S}_{k},& & & \text{(8.4)}\text{}\text{}\end{array}$

where ${\stackrel{̃}{\Lambda }}_{k}$ is a diagonal matrix with the $k-1$ leftmost Ritz values ${\stackrel{̃}{\lambda }}_{j}$ on the diagonal, and the columns of ${S}_{k}$ are the respective residual vectors ${r}_{j}=A{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}{\stackrel{̃}{x}}_{j}$ divided by $\sqrt{{\lambda }_{k}-{\stackrel{̃}{\lambda }}_{j}}$. If $k$ is such that ${\stackrel{̃}{\lambda }}_{k-1}<{\lambda }_{k}$, then the eigenvalues of $Â$ are the left-hand side bounds for eigenvalues ${\lambda }_{i}$, and thus the diﬀerence ${\stackrel{̃}{\lambda }}_{j}-{\stackrel{̂}{\lambda }}_{j}$ estimates the eigenvalue error ${\stackrel{̃}{\lambda }}_{j}-{\lambda }_{j}$. The unknown ${\lambda }_{k}$ is replaced by ${\stackrel{̃}{\lambda }}_{k}$, and select the maximal $k\le m$ for which the distance between ${\stackrel{̃}{\lambda }}_{k-1}$ and ${\stackrel{̃}{\lambda }}_{k}$ exceeds the sum of the absolute error tolerance for eigenvalues and the Frobenius norm of the matrix formed by the residuals ${r}_{j},j=1,\dots ,k-1$. If ${\stackrel{̃}{\lambda }}_{j}-{\stackrel{̂}{\lambda }}_{j}$ is close to the machine accuracy, it may be too polluted by round-oﬀ errors to rely upon. In such case, we use instead

$\begin{array}{rcll}{\stackrel{̃}{\lambda }}_{j}-{\lambda }_{j}\le {\delta }_{j}\approx \frac{\parallel {r}_{j}{\parallel }^{2}}{{\stackrel{̃}{\lambda }}_{k}-{\lambda }_{j}}.& & & \text{(8.5)}\text{}\text{}\end{array}$

The eigenvector errors are estimated based on the Davis-Kahan inequality:

$\begin{array}{rcll}\underset{x\in {\mathsc{𝒳}}_{k-1}}{min}sin\left\{{\stackrel{̃}{x}}_{j};x\right\}\le \frac{\parallel {r}_{j}\parallel }{{\lambda }_{k}-{\stackrel{̃}{\lambda }}_{j}}\approx \frac{\parallel {r}_{j}\parallel }{{\stackrel{̃}{\lambda }}_{k}-{\stackrel{̃}{\lambda }}_{j}},& & & \text{(8.6)}\text{}\text{}\end{array}$

where ${\mathsc{𝒳}}_{k-1}$ is the invariant subspace corresponding to $k-1$ leftmost eigenvalues.

If options%err_est$=$2 the errors are estimated based on the eigenvalue decrements history, which produces an estimate for the average eigenvalue error reduction per iteration, which in turn yields error estimates for both eigenvalues and eigenvectors. Unlike the residual estimates mentioned in this section, such ‘kinematic’ error estimates are not guaranteed to be upper bounds for the actual errors. However, the numerical tests have demonstrated that kinematic error estimates are signiﬁcantly more accurate, i.e. closer to the actual error, than the residual-based estimates. Furthermore, they straightforwardly apply to the generalized case as well.

Generalized problem

In the case of the generalized eigenvalue problem (8.2) solved by iterations with preconditioning, all of the residual norms in the previous section must be replaced with $\parallel \cdot {\parallel }_{{B}^{-1}}$-norm of the residual ${r}_{j}=A{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}B{\stackrel{̃}{x}}_{j}$ ($\parallel {r}_{j}{\parallel }_{{B}^{-1}}^{2}={r}_{j}^{\ast }{B}^{-1}{r}_{j}$) or its upper estimate, e.g. ${\beta }_{1}^{-1∕2}\parallel \cdot \parallel$, where ${\beta }_{1}$ is the smallest eigenvalue of $B$. Hence, if ${\beta }_{1}$ is known, then the error tolerances for eigenvalues and eigenvectors must be multiplied by ${\beta }_{1}$ and $\sqrt{{\beta }_{1}}$ respectively. If no estimate for $\parallel \cdot {\parallel }_{{B}^{-1}}$-norm is available, then the use of non-zero residual tolerances and options%err_est$=$1 is not recommended. In the case of problems (8.2) solved by iterations with shift-and-invert and the problem (8.3), the residuals are computed as ${r}_{j}=TB{\stackrel{̃}{x}}_{j}-{\stackrel{̃}{\lambda }}_{j}{\stackrel{̃}{x}}_{j}$ where $T={\left(A-\sigma B\right)}^{-1}$ for (8.2) and $T={\left(B-\sigma A\right)}^{-1}$ for (8.3), and $B$-norms of ${r}_{j}$ are used, so that Lehmann matrix becomes $Â={\stackrel{̃}{\Lambda }}_{k}-{S}_{k}^{T}B\phantom{\rule{1em}{0ex}}{S}_{k}$.

References

[1] E. E. Ovtchinnikov and J. Reid. A preconditioned block conjugate gradient algorithm for computing extreme eigenpairs of symmetric and Hermitian problems. Technical Report RAL-TR-2010-19, 2010.

[2] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian eigenvalue computation I: Computing an extreme eigenvalue, SIAM J. Numer. Anal., 46:2567–2592, 2008.

[3] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian eigenvalue computation II: Computing several extreme eigenvalues, SIAM J. Numer. Anal., 46:2593–2619, 2008.

8.7 Examples

8.7.1 Preconditioning example

The following code computes the 5 leftmost eigenpairs of the matrix $A$ of order 100 that approximates the two-dimensional Laplacian operator on a 20-by-20 grid. One forward and one backward Gauss-Seidel update are used for preconditioning, which halves the number of iterations compared with solving the same problem without preconditioning. The module laplace2d (examples/Fortran/ssmfe/laplace2d.f90) supplies the subroutine apply_laplacian() that multiplies a block of vectors by $A$, and the subroutine apply_gauss_seidel_step() that computes $y=Tx$ for a block of vectors $x$ by applying one forward and one backward update of the Gauss-Seidel method to the system $Ay=x$.

! examples/Fortran/ssmfe/precond_expert.f90
! Laplacian on a square grid (using SPRAL_SSMFE_EXPERT routines)
program ssmfe_expert_precond_example
use spral_random
use spral_ssmfe_expert
use laplace2d ! implement Lapalacian and preconditioners
implicit none

integer, parameter :: wp = kind(0d0) ! Working precision is double

integer, parameter :: ngrid = 20           ! grid points along each side
integer, parameter :: n     = ngrid*ngrid  ! problem size
integer, parameter :: nep   = 5            ! eigenpairs wanted
integer, parameter :: m     = 3            ! dimn of the iterated subspace

type(random_state) :: state                ! PRNG state

real(wp), external :: dnrm2, ddot          ! BLAS functions

integer :: ncon                 ! number of converged eigenpairs
integer :: i, j, k
integer :: ind(m)               ! permutation index
real(wp) :: s
real(wp) :: lambda(n)           ! eigenvalues
real(wp) :: X(n, n)             ! eigenvectors
real(wp) :: rr(2*m, 2*m, 3)     ! work array
real(wp) :: W(n, m, 0:7)        ! work array
real(wp) :: U(n, m)             ! work array
type(ssmfe_rcid       ) :: rci      ! reverse communication data
type(ssmfe_options    ) :: options  ! options
type(ssmfe_expert_keep) :: keep     ! private data
type(ssmfe_inform     ) :: inform   ! information

! the gap between the last converged eigenvalue and the rest of the spectrum
! must be at least 0.1 times average gap between computed eigenvalues
options%left_gap = -0.1
! block size is small, convergence may be slow, allow more iterations
options%max_iterations = 200

! Initialize W to lin indep vectors by randomizing
do i=1,n
do j=1,m
W(i,j,0) = random_real(state, positive=.true.)
end do
end do

ncon = 0
rci%job = 0
do ! reverse communication loop
!    call ssmfe_standard &
call ssmfe_generalized &
( rci, nep, n, lambda, m, rr, ind, keep, options, inform )
select case ( rci%job )
case ( 1 )
call apply_laplacian &
( ngrid, ngrid, rci%nx, W(1 : n, rci%jx : rci%jx + rci%nx - 1, rci%kx),&
W(1 : n, rci%jy : rci%jy + rci%nx - 1, rci%ky ) )
case ( 2 )
call apply_gauss_seidel_step &
( ngrid, ngrid, rci%nx, W(1 : n, rci%jx : rci%jx + rci%nx - 1, rci%kx),&
W(1 : n, rci%jy : rci%jy + rci%nx - 1, rci%ky ) )
case ( 3 )
call dcopy( n*rci%nx, W(1, rci%jx, rci%kx), 1, W(1, rci%jy, rci%ky), 1 )
case ( 5 )
if ( rci%i < 0 ) cycle
do k = 0, rci%nx - 1
j = ncon + k + 1
call dcopy( n, W(1, rci%jx + k, 0), 1, X(1, j), 1 )
end do
ncon = ncon + rci%nx
case ( 11 )
if ( rci%i == 0 ) then
if ( rci%kx /= rci%ky .or. rci%jx > rci%jy ) then
call dcopy &
( n * rci%nx, W(1, rci%jx, rci%kx), 1, W(1, rci%jy, rci%ky), 1 )
else if ( rci%jx < rci%jy ) then
do j = rci%nx - 1, 0, -1
call dcopy &
( n, W(1, rci%jx + j, rci%kx), 1, W(1, rci%jy + j, rci%ky), 1 )
end do
end if
else
do i = 1, n
do j = 1, rci%nx
U(i, j) = W(i, ind(j), rci%kx)
end do
do j = 1, rci%nx
W(i, j, rci%kx) = U(i, j)
end do
if ( rci%ky /= rci%kx ) then
do j = 1, rci%nx
U(i, j) = W(i, ind(j), rci%ky)
end do
do j = 1, rci%nx
W(i, j, rci%ky) = U(i, j)
end do
end if
end do
end if
case ( 12 )
do i = 0, rci%nx - 1
rr(rci%i + i, rci%j + i, rci%k) = &
ddot(n, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1)
end do
case ( 13 )
do i = 0, rci%nx - 1
if ( rci%kx == rci%ky ) then
s = dnrm2(n, W(1, rci%jx + i, rci%kx), 1)
if ( s > 0 ) &
call dscal( n, 1/s, W(1, rci%jx + i, rci%kx), 1 )
else
s = sqrt(abs(ddot &
(n, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1)))
if ( s > 0 ) then
call dscal( n, 1/s, W(1, rci%jx + i, rci%kx), 1 )
call dscal( n, 1/s, W(1, rci%jy + i, rci%ky), 1 )
else
call dcopy( n, 0.0D0, 0, W(1, rci%jy + i, rci%ky), 1 )
end if
end if
end do
case ( 14 )
do i = 0, rci%nx - 1
s = -rr(rci%i + i, rci%j + i, rci%k)
call daxpy&
( n, s, W(1, rci%jx + i, rci%kx), 1, W(1, rci%jy + i, rci%ky), 1 )
end do
case ( 15 )
if ( rci%nx > 0 .and. rci%ny > 0 ) &
call dgemm &
( ’T’, ’N’, rci%nx, rci%ny, n, &
rci%alpha, W(1, rci%jx, rci%kx), n, W(1, rci%jy, rci%ky), n, &
rci%beta, rr(rci%i, rci%j, rci%k), 2*m )
case ( 16, 17 )
if ( rci%ny < 1 ) cycle
if ( rci%nx < 1 ) then
if ( rci%job == 17 ) cycle
if ( rci%beta == 1.0D0 ) cycle
do j = rci%jy, rci%jy + rci%ny - 1
call dscal( n, rci%beta, W(1, j, rci%ky), 1 )
end do
cycle
end if
if ( rci%job == 17 ) then
call dgemm &
( ’N’, ’N’, n, rci%ny, rci%nx, &
1.0D0, W(1, rci%jx, rci%kx), n, rr(rci%i, rci%j, rci%k), 2*m, &
0.0D0, W(1, rci%jy, rci%ky), n )
call dcopy &
( n * rci%ny, W(1, rci%jy, rci%ky), 1, W(1, rci%jx, rci%kx), 1 )
else
call dgemm &
( ’N’, ’N’, n, rci%ny, rci%nx, &
rci%alpha, W(1, rci%jx, rci%kx), n, rr(rci%i, rci%j, rci%k), 2*m, &
rci%beta, W(1, rci%jy, rci%ky), n )
end if
case ( 21, 22 )
if ( ncon > 0 ) then
call dgemm &
( ’T’, ’N’, ncon, rci%nx, n, &
1.0D0, X, n, W(1, rci%jy, rci%ky), n, 0.0D0, U, n )
call dgemm &
( ’N’, ’N’, n, rci%nx, ncon, &
-1.0D0, X, n, U, n, 1.0D0, W(1, rci%jx, rci%kx), n )
call dgemm &
( ’N’, ’N’, n, rci%nx, ncon, &
-1.0D0, X, n, U, n, 1.0D0, W(1, rci%jy, rci%ky), n )
end if
case ( 999 )
if ( rci%k == 0 ) then
if ( rci%jx > 1 ) then
do j = 1, rci%jx-1
do i = 1, n
W(i, j, 0) = random_real(state, positive=.true.)
end do
end do
endif
end if
case ( :-1 )
exit
end select
end do
print ’(i3, 1x, a, i4, 1x, a)’, &
inform%left, ’eigenpairs converged in’, inform%iteration, ’iterations’
print ’(1x, a, i2, a, es13.7)’, &
(’lambda(’, i, ’) = ’, lambda(i), i = 1, inform%left)
call ssmfe_free( keep, inform )
end program ssmfe_expert_precond_example
This code produces the following output:
6 eigenpairs converged in 129 iterations
lambda( 1) = 4.4676695E-02
lambda( 2) = 1.1119274E-01
lambda( 3) = 1.1119274E-01
lambda( 4) = 1.7770878E-01
lambda( 5) = 2.2040061E-01
lambda( 6) = 2.2040061E-01

Note that the code computed one extra eigenpair because of the insuﬃcient gap between the 5th and 6th eigenvalues.