SPRAL_SSMFE_EXPERTv1.0.0
Sparse Symmetric Matrix-Free Eigensolver,
expert interface
C User Guide
This package computes extreme (leftmost and/or rightmost) eigenpairs
of the
following eigenvalue problems:
- the standard eigenvalue problem
- the generalized eigenvalue problem
- the buckling problem
where and
are real symmetric (or
Hermitian) matrices and
is positive definite.
SPRAL_SSMFE_EXPERT delegates a considerable part of the computation to the user, which makes its solver
procedures rather difficult to use. The user is advised to consider first using the package SPRAL_SSMFE, which offers a
simple interface to SPRAL_SSMFE_EXPERT.
Evgueni Ovtchinnikov (STFC Rutherford Appleton Laboratory)
Major version history
-
2014-11-20 Version 1.0.0
- Initial release
8.1 Installation
Please see the SPRAL install documentation. In particular note that:
- A BLAS library is required.
- A LAPACK library is required.
8.2 Usage overview
The eigensolver subroutines behind SPRAL_SSMFE_EXPERT implement a block iterative algorithm. The block nature of
this algorithm allows the user to benefit from highly optimized linear algebra subroutines and from the ubiquitous
multicore architecture of modern computers. It also makes this algorithm more reliable than Krylov-based algorithms
employed e.g. by ARPACK in the presence of clustered eigenvalues. However, convergence of the iterations may be
slow if the density of the spectrum is high.
Thus, good performance (in terms of speed) is contingent on the following two factors: (i) the number
of desired eigenpairs must be substantial (e.g. not less than the number of CPU cores), and (ii) the
employment of a convergence acceleration technique. The acceleration techniques that can be used are
shift-and-invert and preconditioning. The former requires the direct solution of linear systems with the matrix
or its linear
combination with ,
for which a sparse symmetric indefinite solver (such as HSL_MA97 or SPRAL_SSIDS) can be employed. The latter applies to the case
of positive definite
and requires a matrix or an operator (that is, an algorithm producing a vector
for a given
vector )
, called a preconditioner,
such that the vector is an
approximation to the solution
of the system .
This technique is more sophisticated and is likely to be of interest only to experienced users.
Further information on the algorithm used by SPRAL_SSMFE_EXPERT can be found in Section 8.6.
SPRAL_SSMFE_EXPERT delegates a considerable part of the computation to
the user. The user’s code stores all vectors of size equal to the problem size
. SPRAL_SSMFE_EXPERT
is not “aware” of
or how these vectors are stored; all operations on these vectors are performed by the user. The amount of
computation performed by the solver subroutines of SPRAL_SSMFE_EXPERT and the memory they use
are negligible. These features facilitate the use of these subroutines for shared-memory, out-of-core
and hybrid computation. A simpler but less flexible interface to SPRAL_SSMFE_EXPERT is offered by
SPRAL_SSMFE.
8.2.1 Calling sequences
Access to the package requires inclusion of either spral.h (for the entire SPRALlibrary) or spral_ssmfe.h (for just
the SSMFE routines), i.e.
#include "spral.h"
The following procedures are available to the user:
- spral_ssmfe_default_options() initializes the options structure to default values
- spral_ssmfe_standard_type() computes leftmost eigenpairs of (8.1), optionally using preconditioning
if
is positive definite
- spral_ssmfe_standard_shift_type() computes eigenpairs of (8.1) near a given shift using the
shift-and-invert technique
- spral_ssmfe_generalized_type() computes leftmost eigenpairs of (8.2), optionally using preconditioning
if
is positive definite
- spral_ssmfe_generalized_shift_type() computes eigenpairs of (8.2) near a given shift using the
shift-and-invert technique
- spral_ssmfe_buckling_type() computes eigenpairs of (8.3) near a given shift using the
shift-and-invert technique
- spral_ssmfe_expert_free() should be called after all other calls are complete. It frees memory
referenced by keep and inform.
The main solver procedures must be called repeatedly using a reverse communication interface. The procedure
spral_ssmfe_expert_free() should be called once after the final call to a solver procedure to deallocate all arrays
that have been allocated by the solver procedure.
8.2.2 Derived types
For each problem, the user must employ the derived types defined by the module to declare scalars of the types
struct spral_ssmfe_rcid (real version) or struct spral_ssmfe_rciz (complex version), struct
spral_ssmfe_options and struct spral_ssmfe_inform. The user must also declare a void * pointer keep for
the package’s private data structure. The options data structure must be initalized through a call to
spral_ssmfe_default_options(), and the pointer keep must be initialized to NULL. The following pseudocode
illustrates this.
#include "spral.h"
...
struct spral_ssmfe_rcid rcid;
struct spral_ssmfe_options options;
struct spral_ssmfe_inform inform;
void *keep = NULL;
...
spral_ssmfe_default_options(&options);
The components of struct spral_ssmfe_options and struct spral_ssmfe_inform are explained in Sections 8.4.1
and 8.4.2. Components of struct spral_ssmfe_rcid are used for the reverse communication interface and are
explained in Section 8.3.2. The void * pointer keep is allocated using Fortran, and must be passed to
spral_ssmfe_expert_free() to free the associated memory.
8.3 Argument lists
8.3.1 spral_ssmfe_default_options()
To initialize a variable of type struct spral_ssmfe_options the following routine is provided.
void spral_ssmfe_default_options(struct spral_ssmfe_core_options *options);
-
*options
- is the instance to be initialized.
8.3.2 spral_ssmfe_standard_type(), spral_ssmfe_standard_shift_type(),
spral_ssmfe_generalized_type(), spral_ssmfe_generalized_shift_type(), and spral_ssmfe_buckling_type()
To compute the leftmost eigenpairs of (8.1), optionally using preconditioning, one of the following
routines must be called repeatedly:
void spral_ssmfe_expert_standard_double (struct spral_ssmfe_rcid *rci, int left,
int mep, double *lambda, int m, double *rr, int *ind, void **keep,
const struct spral_ssmfe_options *options, struct spral_ssmfe_inform *inform);
void spral_ssmfe_expert_standard_double_complex(struct spral_ssmfe_rciz *rci, int left,
int mep, double *lambda, int m, double complex *rr, int *ind, void **keep,
const struct spral_ssmfe_options *options, struct spral_ssmfe_inform *inform);
To compute the eigenvalues of (8.1) in the vicinity of a given value sigma and the corresponding
eigenvectors using the shift-and-invert technique, one of the following routines must be called
repeatedly:
void spral_ssmfe_expert_standard_shift_double (struct spral_ssmfe_rcid *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
void spral_ssmfe_expert_standard_shift_double_complex(struct spral_ssmfe_rciz *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double complex *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
To compute the leftmost eigenpairs of (8.2), optionally using preconditioning, one of the following
routines must be called repeatedly:
void spral_ssmfe_expert_generalized_double (struct spral_ssmfe_rcid *rci,
int left, int mep, double *lambda, int m, double *rr, int *ind, void **keep,
const struct spral_ssmfe_options *options, struct spral_ssmfe_inform *inform);
void spral_ssmfe_expert_generalized_double_complex(struct spral_ssmfe_rciz *rci,
int left, int mep, double *lambda, int m, double complex *rr, int *ind, void **keep,
const struct spral_ssmfe_options *options, struct spral_ssmfe_inform *inform);
To compute the eigenvalues of (8.2) in the vicinity of a given value sigma and the corresponding
eigenvectors using the shift-and-invert technique, one of the following routines must be called
repeatedly:
void spral_ssmfe_expert_generalized_shift_double (struct spral_ssmfe_rcid *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
void spral_ssmfe_expert_generalized_shift_double_complex(struct spral_ssmfe_rciz *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double complex *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
To compute the eigenvalues of (8.3) in the vicinity of a given value sigma and the corresponding
eigenvectors one of the following routines must be called repeatedly:
void spral_ssmfe_expert_buckling_double (struct spral_ssmfe_rcid *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
void spral_ssmfe_expert_buckling_double_complex(struct spral_ssmfe_rciz *rci,
double sigma, int left, int right, int mep, double *lambda, int m, double complex *rr,
int *ind, void **keep, const struct spral_ssmfe_options *options,
struct spral_ssmfe_inform *inform);
To use the solver procedures, the user needs to maintain a workspace W containing kw + 1 blocks of m vectors of
size .
A value kw = 7 is always sufficient. However, if options.minAprod
false and either
options.minBprod
false or the standard eigenvalue problem (8.1) is solved, then kw = 3 is sufficient; if options.minAprod
true and either
options.minBprod
true or ssmfe_generalized_shift or ssmfe_buckling are used, then kw must be at least 7; otherwise kw = 5 is
sufficient. Solver procedures use indices 1 to m to refer to vectors inside each block and indices 0 to kw to refer to
particular blocks. The first (zero-indexed) block holds the eigenvector approximations: the user must fill this block
with m linearly independent vectors before the first call to a solver procedure.
The number of desired eigenpairs may exceed m: whenever converged eigenpairs have been detected, a solver
procedure reports the indices of these eigenpairs and they must be moved by the user to a separate eigenvectors’
storage X.
When , it is expedient to
have storage BX for the -images
of the converged eigenvectors, i.e. BX = B*X.
To simplify the description of the reverse communication interface, below we assume that
an array W[kw+1][m][n] of package type is used as a workspace, and that arrays X[mep][n]
and BX[mep][n] of the call type are used for storing the computed eigenvectors and their
-images.
For convienence of notation we use the convention that x[i:j] denotes indices i through j (inclusive) of
the vector x. The transpose (real or complex, depending on call type ) of a matrix H is denoted by
H.
The meaning of the arguments of the solver procedures is as follows.
-
rci
- is used for the reverse communication interface. Before the first call, rci.job must be set to 0. No other values
may be assigned to rci.job by the user. After each call, the value of rci.job must be inspected by the user’s
code and the appropriate action taken:
-
-3
- : fatal error return, the computation must be terminated;
-
-2
- : not all desired eigenpairs converged to required accuracy, see Section 8.5;
-
-1
- : the computation is complete and successful.
-
1
- : (spral_ssmfe_standard_type() and spral_ssmfe_generalized_type() only) the user must
compute ,
where
W[rci.kx][ix:jx][:], with ix
rci.jx and jx
ix + rci.nx - 1,
W[rci.ky][iy:jy][:], with iy
rci.jy and jy
iy + rci.nx - 1.
-
2
- : (spral_ssmfe_standard_type() and spral_ssmfe_generalized_type() only) the user must
compute
if preconditioning is used or copy
to
otherwise, where
and
are as for rci.job = 1.
-
3
- : (spral_ssmfe_generalized_type(), spral_ssmfe_generalized_shift_type() and
spral_ssmfe_buckling_type() only) the user must compute
where
and
are as for rci.job = 1.
-
5
- : the user must save the converged eigenvectors to the eigenvector storage X and, optionally, for
problems (8.2) and (8.3), save their -images.
The converged eigenvectors are columns of the
matrix W[rci.kx][:][:] and their -images
are respective columns of W[rci.ky][:][:] that are identified by rci.i, rci.jx and rci.nx as
follows: if rci.i > 0, then the column numbers run from rci.jx to rci.jx + rci.nx - 1, and
if rci.i < 0, then they run from rci.jx - rci.nx + 1 to rci.jx.
-
9
- : (spral_ssmfe_standard_shift_type(), spral_ssmfe_generalized_shift_type() and
spral_ssmfe_buckling_type() only) the user must compute ,
where
and
is
identity, for problem (8.1),
for problem (8.2), and
for problem (8.3).
-
11
- : if rci.i = 0, then the user must perform a copy ,
where
and
are as for rci.job = 1, otherwise the columns of W[rci.kx][:][:] and W[rci.ky][:][:] (if
rci.kx
rci.ky) must be reordered using the index array ind so that the column ind[j] becomes column
j for j = 0, …, rci.nx-1.
-
12
- : for each i = 0, …, rci.nx-1, the user must compute the dot product of the columns
W[rci.kx][rci.jx+i][:]
and
W[rci.ky][rci.jy+i][:]
and place it in
rr[rci.k][rci.j+i][rci.i+i].
-
13
- : if rci.kx
rci.ky, then for each i = 0, …, rci.nx - 1, the user must perform the scaling
W[rci.kx][rci.jx+i][:] = W[rci.kx][rci.jx+i][:],
where
is the 2-norm of the column W[rci.kx][rci.jx+i][:], otherwise the user must perform the
scalings
W[rci.kx][rci.jx+i][:] = W[rci.kx][rci.jx+i][:]
W[rci.ky][rci.jy+i][:] = W[rci.ky][rci.jy+i][:],
where
is the square root of the dot product of the columns W[rci.kx][rci.jx+i][:] and W[rci.ky][rci.jy+i][:].
No scaling is to be applied to zero columns.
-
- 14: for each i = 0, …, rci.nx - 1, the user must perform axpy-updates:
W[rci.ky][rci.jy+i][:] = W[rci.ky][rci.jy+i][:] +
rr[rci.k][rci.j+i][rci.i+i] * W[rci.kx][rci.jx + i][:].
-
- 15: the user must perform the matrix multiplication:
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1] =
rci.alpha * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:]’ *
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] +
rci.beta * rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1].
-
- 16: the user must perform the matrix multiplication:
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] =
rci.alpha * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] *
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1] +
rci.beta * W[rci.ky][rci.jy
rci.jy+rci.ny-1][:].
-
- 17: the user must perform the multiplication:
W[rci.kx][rci.jx
rci.jx+rci.ny-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] *
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1].
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] can be used as a workspace.
-
- 21: the user must -orthogonalize
the columns of W specified by rci.nx, rci.jx and rci.kx to all vectors stored in X by solving the
system
(X’ * BX) Q = X’ * W[rci.ky][rci.jy
rci.jy+rci.nx-1][:]
for Q and updating
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] - X * Q.
For problems (8.2) and (8.3), the respective columns of W[rci.ky][:][:], which store -images
of the respective columns of W[rci.kx][:][:]), must be updated accordingly, either by applying
B to these vectors or using the columns of BX, i.e.
W[rci.ky][rci.jy
rci.jy+rci.nx-1][:] =
W[rci.ky][rci.jy
rci.jy+rci.nx-1][:] - BX * Q;
-
- 22: the user must solve the system
(X’ * BX) Q = X’ * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:])
for Q and perform the update
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] - BX * Q,
where X and BX are same as in the case rci.job = 21 (in the case of problem (8.1), rci.job =
21 and 22 require exactly the same computation).
-
- 999: If rci.k > 0, then a restart, normally with a larger block size m, is suggested with the aim of
achieving better convergence. If the suggestion is accepted, the user must compute the new block
size as m = rci.nx + k + l, where k
rci.i and l
rci.j, reallocate the workspace array W if the new block size is different from the old one, and
set rci.i = 0 and rci.j = 0. If the restart is not acceptable (e.g. the new block size exceeds
a certain limit set by the user), then nothing needs to be done. If rci.k == 0, then the restart
with the same block size m is required. In both restart cases, the first block W[0][:][:] of the new
workspace should retain the vectors W[0][i:j][:], where i = rci.jx and j = i + rci.nx - 1,
from the old workspace. The remaining m - rci.nx columns of W[0][:][:] must be filled with
arbitrary vectors that are linearly independent from the converged eigenvectors and such that the
entire set of the columns of W[0][:][:] is linearly independent.
Restriction: rci.job = 0, rci.i = 0 and rci.j = 0 are the only assignments to the components of rci
that can be done by the user. The first one can only be done before the first call. The other two can only be
done if rci.job = 999 and rci.k > 0.
-
sigma
- holds the shift, a value around which the desired eigenvalues are situated.
-
left
- holds the number of desired eigenvalues to the left of sigma. Restriction:
left + right
min(mep, n/2),
with right0
for spral_ssmfe_standard_type() and spral_ssmfe_generalized_type().
-
right
- holds the number of desired eigenvalues to the right of sigma. Restriction:
left +
right
min(mep, n/2).
-
mep
- holds the size of the array lambda. See Section 8.6 for guidance on setting mep. Restriction: mep is not less
than the number of desired eigenpairs.
-
lambda[mep]
- is used to store the computed eigenvalues. After a successful completion of the computation it contains
eigenvalues in ascending order. This array must not be changed by the user.
-
m
- holds the block size of the user’s workspace W. Restriction: 2
m
n.
-
rr[3][2*m][2*m]
- is a work array used as part of the reverse communication interface. It must only be changed by
the user when instructed to do so by rci.job.
-
ind[m]
- is a work array used as part of the reverse communication interface. It must not be changed by the user. It is
used for reordering the columns of some blocks of W.
-
*keep
- must be initialized to NULL before the first call. It holds private data and must not be modified by the
user.
-
*options
- specifies the algorithmic options used by the routines, as explained in Section 8.4.1. It must not be
changed by the user between calls.
-
*inform
- is used to return information about the execution of the routine, as explained in Section 8.4.2. It must not
be changed by the user.
8.3.3 spral_ssmfe_expert_free()
At the end of the computation, the memory allocated by the solver procedures should be released by
making a call to the following subroutine:
void spral_ssmfe_expert_free(void **keep, struct spral_ssmfe_inform *inform);
-
*keep
- must be unchanged since the last call to a solver routine.
-
*inform
- must be unchanged since the last call to a solver routine.
8.4 Derived types
8.4.1 struct spral_ssmfe_options
The structure struct spral_ssmfe_options is used to specify the options used within SSMFE_EXPERT. The
components, that must be given default values through a call to spral_ssmfe_default_options(),
are:
Convergence control options
-
double abs_tol_lambda
- holds an absolute tolerance used when testing the estimated eigenvalue error, see
Section 8.6. The default value is 0. Negative values are treated as the default.
-
double abs_tol_residual
- holds an absolute tolerance used when testing the residual, see Section 8.6. The
default value is 0. Negative values are treated as the default.
-
int max_iterations
- holds the maximum number of iterations to be performed. The default value is 100.
Restriction: max_it
0.
-
double rel_tol_lambda
- holds a relative tolerance used when testing the estimated eigenvalue error, see
Section 8.6. The default value is 0. Negative values are treated as the default.
-
double rel_tol_residual
- holds a relative tolerance used when testing the residual, see Section 8.6. If both
abs_tol_residual and rel_tol_residual are set to 0, then the residual norms are not taken into
consideration by the convergence test, see Section 8.6. The default value is 0. Negative values are
treated as the default.
-
double tol_x
- holds a tolerance used when testing the estimated eigenvector error, see Section 8.6. If tol_x
is set to zero, the eigenvector error is not estimated. If a negative value is assigned, the tolerance is set
to 10*epsilon(lambda). The default value is -1.0.
Printing options
-
int print_level
- determines the amount of printing. Possible values are:
: | no printing; |
: | error and warning messages only; |
: | the type (standard or generalized) and the size of the problem, the number of eigenpairs
requested, the error tolerances and the size of the subspace are printed before the iterations
start; |
: | as
but, for each eigenpair tested for convergence (see Section 8.6), the iteration number, the
index of the eigenpair, the eigenvalue, whether it has converged, the residual norm, and the
error estimates are printed; |
: | as
but with all eigenvalues, whether converged, residual norms and eigenvalue/eigenvector error
estimates printed on each iteration. |
The default value is 0. Note that for eigenpairs that are far from convergence, ‘rough’ error estimates are
printed (the estimates that are actually used by the stopping criteria, see Section 8.6, only become available on
the last few iterations).
-
int unit_error
- holds the Fortran unit number for error messages. Printing is suppressed if unit_error < 0. The
default value is 6.
-
int unit_diagnostic
- holds the Fortran unit number for messages monitoring the convergence. Printing is
suppressed if unit_diagnostics < 0. The default value is 6.
-
int unit_warning
- holds the Fortran unit number for warning messages. Printing is suppressed if unit_warning <
0. The default value is 6.
Advanced options
-
int err_est
- defines which error estimation scheme for eigenvalues and eigenvectors is to be used by the
stopping criterion. Two schemes are implemented. If err_est = 1, residual error bounds are used,
namely, a modified Davis-Kahan estimate for the eigenvector error and the Lehmann bounds for the
eigenvalue error. (see Section 8.6.5). If err_est = 2, then the eigenvector and eigenvalue errors are
estimated by analyzing the convergence curve for the eigenvalues (see Section 8.6.5). The default is
err_est = 2. Restriction: err_est = 1 or 2.
-
int extra_left
- holds the number of extra approximate eigenvectors corresponding to leftmost eigenvalues
that are of no interest to the user and are iterated solely to enhance convergence. The default is
extra_left = 0. Restriction: extra_left
0.
-
int extra_right
- holds the number of extra approximate eigenvectors corresponding to rightmost eigenvalues
that are of no interest to the user and are iterated solely to enhance convergence. The default is
extra_right = 0. Restriction: extra_right
0.
-
double left_gap
- that is only used when left is non-zero, and specifies the minimal acceptable distance
between the last computed left eigenvalue and the rest of the spectrum. For spral_ssmfe_standard_type()
and spral_ssmfe_generalized_type(), the last computed left eigenvalue is the rightmost of the
computed ones, and for the other procedures it is the leftmost. If set to a negative value ,
the minimal distance is taken as
times the average distance between the computed eigenvalues. Note that for this option to have any
effect, the value of mep must be larger than left + right: see Section 8.6 for further explanation. The
default value is 0.
-
int max_left
- holds the number of eigenvalues to the left from ,
or a negative value, if this number is not known (cf. Section 8.6.4). The default is max_left = -1.
-
int max_right
- holds the number of eigenvalues to the right from ,
or a negative value, if this number is not known. (cf. Section 8.6.4). The default is max_right = -1.
-
bool minAprod
- determines whether the number of multiplications by
is to be reduced at the expense of memory. If ,
on each iteration three returns to the user with rci.job = 1 are made for multiplications of rci.nx
vectors by .
Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s
work array W must be increased by 2. The default is minAprod = true. Restriction: minAprod = true
for spral_ssmfe_standard_shift_type(),
spral_ssmfe_generalized_shift_type() and spral_ssmfe_buckling_type().
-
bool minBprod
- determines whether the number of multiplications by
is to be reduced at the expense of memory. If ,
on each iteration at least three returns to the user with rci.job = 3 are made for multiplications of
rci.nx vectors by .
Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s
work array W must be increased by 2. The default is minBprod = true.
-
double right_gap
- has the same meaning as left_gap but for the rightmost eigenvalue. It is only used by
the routines spral_ssmfe_standard_shift_type(), spral_ssmfe_generalized_shift_type() and
spral_ssmfe_buckling_type() and only when right0.
The default value is 0.
-
int user_x
- specifies whether an initial guess for the eigenvectors is supplied. If user_x > 0 then the first
user_x columns of x[][] will be used as initial guesses for eigenvectors. Hence, if the user has good
approximations to some of the required eigenvectors, the computation time may be reduced by putting
these approximations into the first user_x columns of x[][]. The default value is 0, i.e. the columns of
x[][] are overwritten by the solver. Restriction: 0
user_x
m, the first user_x columns in x[][] must be linearly independent.
8.4.2 struct spral_ssmfe_inform
The structure spral_ssmfe_inform is used to hold information from the execution of the solver procedures. The
components are:
-
int converged[mep]
- stores convergence information. If, on some iteration i, an eigenpair (lambda[j],
X[j]) has been accepted as converged, then converged[j] = i; if the convergence stagnated then
converged[j] = -i; otherwise converged[j] = 0.
-
double err_lambda[mep]
- contains the estimated eigenvalue error for the approximate eigenvalue lambda[i]
if info.converged(i) is non-zero, and -1.0 otherwise.
-
double err_X[m]
- is used for storing the eigenvector errors in the same way as err_lamda[] is used for
storing the eigenvalue errors.
-
int flag
- is used as an error flag. If a call is successful, flag has value 0. A nonzero value of flag indicates
an error or a warning (see Section 8.5).
-
int iteration
- holds the number of iterations since the previous rci.job = 0 or rci.job = 999 call.
-
int left
- holds the number of converged eigenvalues on the left, i.e. the total number of converged eigenpairs
of (8.1) or the number of the converged eigenvalues of (8.2) or (8.3) to the left of sigma.
-
double next_left
- holds the non-converged eigenvalue next to the last converged on the left
(see options.left_gap).
-
double next_right
- holds the non-converged eigenvalue next to the last converged on the right
(see options.right_gap).
-
int non_converged
- holds the number of non-converged eigenpairs (see Section 8.5).
-
double residual_norms[mep]
- holds, on return with rci.job = 5, the Euclidean norm of the residual. The
norm for for lambda(i), X(i) is returned as residual_norms[i].
-
int right
- holds the number of converged eigenvalues of (8.2) or (8.3) to the right of sigma.
-
int stat
- holds the Fortran allocation status (see Section 8.5).
8.5 Error flags
A successful return from a solver procedure is indicated by
inform.flag0.
A negative value indicates an error, a positive value indicates a warning.
Possible negative values of inform.flag are as follows:
-
- -1 Incorrect value of rci.job.
-
- -2 Block size m is out-of-range.
-
- -3 Incorrect value of options.err_est.
-
- -4 Incorrect value of options.minAprod.
-
- -5 Incorrect value of options.extra_left or options.extra_right.
-
- -6 Incorrect value of options.min_gap.
-
- -11 Incorrect value of left.
-
- -12 Incorrect value of right.
-
- -13 mep is less than the number of desired eigenpairs.
-
- -100 Not enough memory; inform.stat contains the value of the Fortran stat parameter.
-
- -200
is not positive definite or initial eigenvectors are linearly dependent.
Possible positive values are:
-
- 1 The iterations have been terminated because no further improvement in accuracy is possible (this may
happen if the preconditioner is not positive definite, or if the components of the residual vectors are so
small that the round-off errors make them essentially random). The value of inform.non_converged
is set to the number of non-converged eigenpairs.
-
- 2 The maximal number of iterations has been exceeded. The value of inform.non_converged is set to
the number of non-converged eigenpairs.
-
- 3 Out of storage space for the converged eigenpairs. The value of inform.non_converged is set to the
number of non-converged eigenpairs.
8.6 Method
8.6.1 The algorithm
The solver procedures of SPRAL_SSMFE_EXPERT are interfaces to solver procedures of SPRAL_SSMFE_CORE, which
implement a block iterative algorithm based on the Jacobi-conjugate preconditioned gradients method [2,3]. Further
information on the algorithm used by SPRAL_SSMFE_EXPERT can be found in the specification document for
SPRAL_SSMFE_CORE and in [1].
8.6.2 Stopping criteria
An approximate eigenpair
is considered to have converged if the following three conditions are all satisfied:
- if options.abs_tol_lambda and options.rel_tol_lambda are not both equal to zero, then the
estimated error in the approximate eigenvalue must be less than or equal to
max(options.abs_tol_lambda, *options.rel_tol_lambda),
where
is the estimated average distance between eigenvalues.
- if options.tol_x is not zero, then the estimated sine of the angle between the approximate eigenvector
and the invariant subspace corresponding to the eigenvalue approximated by
must be less than or equal to options.tol_x.
- if options.abs_tol_residual and options.rel_tol_residual are not both equal to zero, then the
Euclidean norm of the residual, ,
must be less than or equal to
max(options.abs_tol_residual, options.rel_tol_residual*).
The extra eigenpairs are not checked for convergence, as their role is purely auxiliary.
8.6.3 Improving eigenvector accuracy
If the gap between the last computed eigenvalue and the rest of the spectrum is small, then the
accuracy of the corresponding eigenvector may be very low. To prevent this from happening, the user
should set the eigenpairs storage size mep to a value that is larger than the number of desired
eigenpairs, and set the options options.left_gap and options.right_gap to non-zero values
and
. These
values determine the size of the minimal acceptable gaps between the computed eigenvalues and the rest of the spectrum,
referring to either leftmost eigenvalues (for spral_ssmfe_standard_type() and
spral_ssmfe_generalized_type() only) or those to the left of the shift sigma, and
to those to the right of the
shift sigma. Positive values of
and
set the gap explicitly, and negative values require the gap to be not less than their absolute
value times the average distance between the computed eigenvalues. A recommended value of
and
is
.
The value of mep has little effect on the speed of computation, hence it might be set to any
reasonably large value. The larger the value of mep, the larger the size of an eigenvalue cluster for
which accurate eigenvectors can be computed, notably: to safeguard against clusters of size up to
, it is sufficient to set mep to the
number of desired eigenpairs plus .
8.6.4 The use of shifted matrix factorization
When using the solver procedures that employ the shift-and-invert technique, it is very important to ensure that
the numbers of desired eigenvalues each side of the shift do not exceed the actual numbers of these
eigenvalues, as the eigenpairs ‘approximating’ non-existing eigenpairs of the problem will not converge. It is
therefore strongly recommended that the user employs a linear system solver that performs the LDLT
factorization of the shifted system, e.g. HSL_MA97 or SPRAL_SSIDS. The LDLT factorization of the matrix
consists in finding a unit
lower triangular matrix ,
a block-diagonal matrix
with and
blocks on the main diagonal
and a permutation matrix
such that .
By inertia theorem, the number of eigenvalues to the left and right from the shift
is equal to the number of negative
and positive eigenvalues of ,
which allows quick computation of the eigenvalue numbers each side of the shift.
8.6.5 Error estimation
Standard problem
If options.err_est = 1, the error estimates for the eigenvalues are based on the eigenvalues of a matrix of the form
where is a diagonal
matrix with the leftmost
Ritz values on the diagonal,
and the columns of are the
respective residual vectors
divided by . If
is such that
, then the eigenvalues of
are the left-hand side bounds
for eigenvalues , and thus
the difference estimates
the eigenvalue error .
The unknown is
replaced by , and select
the maximal for which
the distance between
and exceeds
the sum of the absolute error tolerance for eigenvalues and the Frobenius norm of the matrix formed by the residuals
. If
is close
to the machine accuracy, it may be too polluted by round-off errors to rely upon. In such case, we use instead
The eigenvector errors are estimated based on the Davis-Kahan inequality:
where is the invariant
subspace corresponding to
leftmost eigenvalues.
If options.err_est2
the errors are estimated based on the eigenvalue decrements history, which produces an estimate for the average
eigenvalue error reduction per iteration, which in turn yields error estimates for both eigenvalues and eigenvectors.
Unlike the residual estimates mentioned in this section, such ‘kinematic’ error estimates are not guaranteed to be
upper bounds for the actual errors. However, the numerical tests have demonstrated that kinematic error estimates
are significantly more accurate, i.e. closer to the actual error, than the residual-based estimates. Furthermore, they
straightforwardly apply to the generalized case as well.
Generalized problem
In the case of the generalized eigenvalue problem (8.2) solved by iterations with
preconditioning, all of the residual norms in the previous section must be replaced with
-norm of the
residual
() or its upper
estimate, e.g. , where
is the smallest
eigenvalue of .
Hence, if
is known, then the error tolerances for eigenvalues and eigenvectors must be multiplied by
and
respectively. If no
estimate for -norm
is available, then the use of non-zero residual tolerances and
options.err_est1
is not recommended. In the case of problems (8.2) solved by iterations with shift-and-invert and the problem (8.3), the residuals
are computed as
where for (8.2)
and for (8.3),
and -norms of
are used, so that Lehmann
matrix becomes .
References
[1] E. E. Ovtchinnikov and J. Reid. A preconditioned block conjugate gradient algorithm for computing extreme
eigenpairs of symmetric and Hermitian problems. Technical Report RAL-TR-2010-19, 2010.
[2] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian
eigenvalue computation I: Computing an extreme eigenvalue, SIAM J. Numer. Anal., 46:2567–2592,
2008.
[3] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian
eigenvalue computation II: Computing several extreme eigenvalues, SIAM J. Numer. Anal., 46:2593–2619,
2008.
8.7 Examples
8.7.1 Preconditioning example
The following code computes the 5 leftmost eigenpairs of the matrix
of
order 100 that approximates the two-dimensional Laplacian operator on a 20-by-20 grid. One forward
and one backward Gauss-Seidel update are used for preconditioning, which halves the number of
iterations compared with solving the same problem without preconditioning. The header laplace2d.h
(examples/C/ssmfe/laplace2d.h) supplies the subroutine apply_laplacian() that multiplies a block of vectors by
,
and the subroutine apply_gauss_seidel_step() that computes
for a block
of vectors
by applying one forward and one backward update of the Gauss-Seidel method to the system
.
/* examples/C/ssmfe/precond_expert.c */
/* Laplacian on a square grid (using SPRAL_SSMFE_EXPERT routines) */
#include "spral.h"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <cblas.h>
/* Header that implements Laplacian and preconditioners */
#include "laplace2d.h"
void main(void) {
const int ngrid = 20; /* grid points along each side */
const int n = ngrid*ngrid; /* problem size */
const int nep = 5; /* eigenpairs wanted */
const int m = 3; /* dimension of the iterated subspace */
int state = SPRAL_RANDOM_INITIAL_SEED; /* PRNG state */
int ind[m]; /* permutation index */
double lambda[n]; /* eigenvalues */
double X[n][n]; /* eigenvectors */
/* Work arrays */
double rr[3][2*m][2*m];
double W[6][m][n];
double U[m][n];
/* Derived types */
struct spral_ssmfe_rcid rci; /* reverse communication data */
struct spral_ssmfe_options options; /* options */
void *keep; /* private data */
struct spral_ssmfe_inform inform; /* information */
/* Initialize options to default values */
spral_ssmfe_default_options(&options);
/* gap between the last converged eigenvalue and the rest of the spectrum
* must be at least 0.1 times average gap between computed eigenvalues */
options.left_gap = -0.1;
/* block size is small, convergence may be slow, allow more iterations */
options.max_iterations = 200;
/* Initialize W to lin indep vectors by randomizing */
for(int i=0; i<n; i++)
for(int j=0; j<m; j++)
W[0][j][i] = spral_random_real(&state, true);
int ncon = 0; /* number of converged eigenpairs */
rci.job = 0; keep = NULL;
while(true) { /* reverse communication loop */
spral_ssmfe_expert_standard_double(&rci, nep, n, lambda, m, &rr[0][0][0],
ind, &keep, &options, &inform);
switch ( rci.job ) {
case 1:
apply_laplacian(
ngrid, ngrid, rci.nx, &W[rci.kx][rci.jx][0], &W[rci.ky][rci.jy][0]
);
break;
case 2:
apply_gauss_seidel_step (
ngrid, ngrid, rci.nx, &W[rci.kx][rci.jx][0], &W[rci.ky][rci.jy][0]
);
break;
case 5:
if ( rci.i < 0 ) continue;
for(int k=0; k<rci.nx; k++) {
int j = ncon + k;
cblas_dcopy( n, &W[0][rci.jx+k][0], 1, &X[j][0], 1 );
}
ncon += rci.nx;
break;
case 11:
if ( rci.i == 0 ) {
if ( rci.kx != rci.ky || rci.jx > rci.jy ) {
cblas_dcopy(n*rci.nx, &W[rci.kx][rci.jx][0], 1, &W[rci.ky][rci.jy][0], 1);
} else if ( rci.jx < rci.jy ) {
for(int j=rci.nx-1; j>=0; j--)
cblas_dcopy(n, &W[rci.kx][rci.jx+j][0], 1, &W[rci.ky][rci.jy+j][0], 1);
}
} else {
for(int i=0; i<n; i++) {
for(int j=0; j<rci.nx; j++)
U[j][i] = W[rci.kx][ind[j]][i];
for(int j=0; j<rci.nx; j++)
W[rci.kx][j][i] = U[j][i];
if(rci.ky != rci.kx) {
for(int j=0; j<rci.nx; j++)
U[j][i] = W[rci.ky][ind[j]][i];
for(int j=0; j<rci.nx; j++)
W[rci.ky][j][i] = U[j][i];
}
}
}
break;
case 12:
for(int i=0; i<rci.nx; i++)
rr[rci.k][rci.j+i][rci.i+i] =
cblas_ddot(n, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1);
break;
case 13:
for(int i=0; i<rci.nx; i++) {
if( rci.kx == rci.ky ) {
double s = cblas_dnrm2(n, &W[rci.kx][rci.jx+i][0], 1);
if( s > 0 )
cblas_dscal(n, 1/s, &W[rci.kx][rci.jx+i][0], 1);
} else {
double s = sqrt(fabs(cblas_ddot(
n, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1)
));
if ( s > 0 ) {
cblas_dscal(n, 1/s, &W[rci.kx][rci.jx+i][0], 1);
cblas_dscal(n, 1/s, &W[rci.ky][rci.jy+i][0], 1);
} else {
for(int j=0; j<n; j++)
W[rci.ky][rci.jy+i][j] = 0.0;
}
}
}
break;
case 14:
for(int i=0; i<rci.nx; i++) {
double s = -rr[rci.k][rci.j+i][rci.i+i];
cblas_daxpy(n, s, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1);
}
break;
case 15:
if ( rci.nx > 0 && rci.ny > 0 )
cblas_dgemm(
CblasColMajor, CblasTrans, CblasNoTrans, rci.nx, rci.ny, n,
rci.alpha, &W[rci.kx][rci.jx][0], n, &W[rci.ky][rci.jy][0], n,
rci.beta, &rr[rci.k][rci.j][rci.i], 2*m
);
break;
case 16: // Fall through to 17
case 17:
if( rci.ny < 1 ) continue;
if( rci.nx < 1 ) {
if( rci.job == 17 ) continue;
if( rci.beta == 1.0 ) continue;
for(int j=rci.jy; j<rci.jy+rci.ny; j++)
cblas_dscal(n, rci.beta, &W[rci.ky][j][0], 1);
continue;
}
if( rci.job == 17 ) {
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
1.0, &W[rci.kx][rci.jx][0], n, &rr[rci.k][rci.j][rci.i], 2*m,
0.0, &W[rci.ky][rci.jy][0], n
);
cblas_dcopy(n*rci.ny, &W[rci.ky][rci.jy][0], 1, &W[rci.kx][rci.jx][0], 1);
} else {
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
rci.alpha, &W[rci.kx][rci.jx][0], n, &rr[rci.k][rci.j][rci.i],
2*m, rci.beta, &W[rci.ky][rci.jy][0], n
);
}
break;
case 21: // Fall through to 22
case 22:
if( ncon > 0 ) {
cblas_dgemm(
CblasColMajor, CblasTrans, CblasNoTrans, ncon, rci.nx, n,
1.0, &X[0][0], n, &W[rci.ky][rci.jy][0], n, 0.0, &U[0][0], n
);
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.nx, ncon,
-1.0, &X[0][0], n, &U[0][0], n, 1.0, &W[rci.kx][rci.jx][0], n
);
}
break;
case 999:
if( rci.k == 0 ) {
if( rci.jx > 1 ) {
for(int j=0; j<rci.jx; j++)
for(int i=0; i<n; i++)
W[0][j][i] = spral_random_real(&state, true);
}
}
break;
default:
goto finished;
}
}
finished:
if(inform.flag != 0) printf("inform.flag = %d\n", inform.flag);
printf("%3d eigenpairs converged in %d iterations\n", ncon, inform.iteration);
for(int i=0; i<ncon; i++)
printf(" lambda[%1d] = %13.7e\n", i, lambda[i]);
spral_ssmfe_expert_free(&keep, &inform);
}
This code produces the following output:
6 eigenpairs converged in 129 iterations
lambda[0] = 4.4676695e-02
lambda[1] = 1.1119274e-01
lambda[2] = 1.1119274e-01
lambda[3] = 1.7770878e-01
lambda[4] = 2.2040061e-01
lambda[5] = 2.2040061e-01
Note that the code computed one extra eigenpair because of the insufficient gap between the 5th and 6th
eigenvalues.