SPRAL_SSMFE_COREv1.0.0
Sparse Symmetric Matrix-Free Eigensolver,
core interface
C User Guide
This package computes extreme (leftmost and/or rightmost) eigenpairs
of the
following eigenvalue problems:
- the standard eigenvalue problem
- the generalized eigenvalue problem
- the eigenvalue problem
where and
are real symmetric (or
Hermitian) matrices and
is positive definite.
SPRAL_SSMFE_CORE delegates a considerable part of the computation to the user, which makes its solver
procedures rather difficult to use. For solving problems (7.1) and (7.2), the user is advised to consider using the
package SPRAL_SSMFE, which offers a simple interface to SPRAL_SSMFE_CORE, or the package SPRAL_SSMFE_EXPERT,
which delegates less work to the user.
Evgueni Ovtchinnikov (STFC Rutherford Appleton Laboratory)
Major version history
-
2014-11-20 Version 1.0.0
- Initial release
7.1 Installation
Please see the SPRAL install documentation. In particular note that:
- A BLAS library is required.
- A LAPACK library is required.
7.2 Usage overview
SPRAL_SSMFE_CORE implements a block iterative algorithm for simultaneous computation of several eigenpairs, i.e.
eigenvalues and corresponding eigenvectors, of problems (7.1)–(7.3). The block nature of this algorithm allows the
user to benefit from highly optimized linear algebra subroutines and from the ubiquitous multicore architecture of
modern computers. It also makes this algorithm more reliable than Krylov-based algorithms employed by e.g.
ARPACK in the presence of clustered eigenvalues. However, convergence of the iterations may be slow if the density
of the spectrum is high.
Thus, good performance (in terms of speed) is contingent on the following two factors: (i) the number of desired
eigenpairs must be substantial (i.e. not less than the number of CPU cores), and (ii) the employment of a
convergence acceleration technique. SPRAL_SSMFE_CORE allows the user to benefit from two acceleration techniques:
shift-and-invert and preconditioning. The shift-and-invert technique rewrites the eigenvalue problem for a matrix
as the eigenvalue problem
(7.1) for the matrix ,
where is the
identity matrix and
is a real value near eigenvalues of interest, and the generalized problem
as the problem
(7.3) for
and .
The preconditioning technique applies to (7.1) and (7.2) with positive definite
and requires a matrix or an operator (that is, an algorithm producing a vector
for a given
vector )
, called a preconditioner,
such that the vector is an
approximation to the solution
of the system .
This technique is more sophisticated and is likely to be of interest only to experienced users.
For futher detail on the algorithm, see the outline in Section 7.6 and associated references.
SPRAL_SSMFE_CORE delegates more computation to the user than the packages SPRAL_SSMFE and
SPRAL_SSMFE_EXPERT, which are built upon it. Not only are the storage and operations on all vectors of size
user’s
responsibility, but also the decision on whether a sufficient number of eigenpairs have been computed to sufficient
accuracy. The amount of computation performed by the solver subroutines of SPRAL_SSMFE_CORE and the memory
they use are negligible. These features facilitate the use of these subroutines for shared-memory, out-of-core and
hybrid computation.
7.2.1 Calling sequences
Access to the package requires inclusion of either spral.h (for the entire SPRALlibrary) or spral_ssmfe.h (for just
the SSMFE routines), i.e.
#include "spral.h"
The following procedures are available to the user:
- spral_ssmfe_core_default_options() initializes the options structure to default values
- spral_ssmfe_double() and spral_ssmfe_double_complex() computues specified numbers of left-
and right-most eigenvalues and the corresponding eigenvectors
- spral_ssmfe_largest_double() and spral_ssmfe_largest_double_complex() computes a specified
number of eigenvalues of largest magnitude and the corresponding eigenvectors
- spral_ssmfe_core_free() should be called after all other calls are complete. It frees memory
referenced by keep and inform.
The solver procedures spral_ssmfe_type() and spral_ssmfe_largest_type() must be called repeatedly using a
reverse communication interface. The procedure ssmfe_core_free() should be called once after the
final call to a solver procedure to free all memory that has been internally allocated by the solver
procedure.
7.2.2 Derived types
For each problem, the user must employ the derived types defined by the module to declare scalars of the types
struct spral_ssmfe_rcid (real version) or struct spral_ssmfe_rciz (complex version), struct
spral_ssmfe_core_options and struct spral_ssmfe_inform. The user must also declare a void * pointer
keep for the package’s private data structure. The options data structure must be initalized using
spral_ssmfe_core_default_options(), and the pointer keep must be initialized to NULL. The following
pseudocode illustrates this.
#include "spral.h"
...
struct spral_ssmfe_rcid rcid;
struct spral_ssmfe_core_options options;
struct spral_ssmfe_inform inform;
void *keep = NULL;
...
spral_ssmfe_core_default_options(&options);
The components of struct spral_ssmfe_core_options and struct spral_ssmfe_inform are explained in
Sections 7.4.1 and 7.4.2. Components of spral_ssmfe_rcid are used for the reverse communication interface and
are explained in Section 7.3.2. The void * pointer keep is allocated using Fortran, and must be passed to
spral_ssmfe_core_free() to free the associated memory.
7.3 Argument lists
7.3.1 spral_ssmfe_core_default_options()
To initialize a variable of type struct spral_ssmfe_core_options the following routine is
provided.
void spral_ssmfe_core_default_options(struct spral_ssmfe_core_options *options);
-
*options
- is the instance to be initialized.
7.3.2 spral_ssmfe_type() and spral_ssmfe_largest_type()
To compute specified numbers of leftmost and rightmost eigenvalues and corresponding eigenvectors,
one of the following routines must be called repeatedly:
void spral_ssmfe_double(struct spral_ssmfe_rcid *rci, int problem,
int left, int right, int m, double *lambda, double *rr, int *ind,
void **keep, const struct spral_ssmfe_core_options *options,
struct spral_ssmfe_inform *inform);
void spral_ssmfe_double_complex(struct spral_ssmfe_rciz *rci, int problem,
int left, int right, int m, double *lambda, double complex *rr, int *ind,
void **keep, const struct spral_ssmfe_core_options *options,
struct spral_ssmfe_inform *inform);
To compute a specified number of eigenvalues of largest magnitude and corresponding eigenvectors,
one of the following routine must be called repeatedly:
void spral_ssmfe_largest_double(struct spral_ssmfe_rcid *rci, int problem,
int nep, int m, double *lambda, double *rr, int *ind,
void **keep, const struct spral_ssmfe_core_options *options,
struct spral_ssmfe_inform *inform);
void spral_ssmfe_largest_double_complex(struct spral_ssmfe_rciz *rci,
int problem, int nep, int m, double *lambda, double complex *rr, int *ind,
void **keep, const struct spral_ssmfe_core_options *options,
struct spral_ssmfe_inform *inform);
To use the solver procedures, the user needs to maintain a workspace W containing kw + 1 blocks
of m vectors of size n. A value kw = 7 is always sufficient. However, if options.minAprod
false and either
options.minBprod
false or the standard eigenvalue problem (7.1) is solved, then kw = 3 is sufficient; if options.minAprod
true and either
options.minBprod
true or (7.3) is solved, then kw must be at least 7; otherwise kw = 5 is sufficient. Solver procedures use indices 0 to
m-1 to refer to vectors inside each block and indices 0 to kw to refer to particular blocks. The first (zero-indexed)
block holds the eigenvector approximations: the user must fill this block with m linearly independent vectors before
the first call to a solver procedure.
The number of desired eigenpairs may exceed m: whenever converged eigenpairs have been detected, a solver
procedure reports the indices of these eigenpairs and they must be moved by the user to a separate eigenvectors’
storage X.
When , it is expedient to
have storage BX for the -images
of the converged eigenvectors, i.e. BX = B*X.
To simplify the description of the reverse communication interface, below we assume that
an array W[kw+1][m][n] of package type is used as a workspace, and that arrays X[mep][n]
and BX[mep][n] of package type are used for storing the computed eigenvectors and their
-images,
where mep is not less than the number of desired eigenpairs. For convienence of notation
we use the convention that x[i:j] denotes indices i through j (inclusive) of the vector x.
The transpose (real or complex, depending on the package type) of a matrix H is denoted by
H.
The meaning of the arguments of the solver procedures is as follows.
-
rci
- is used for the reverse communication interface. Before the first call, rci.job must be set to 0. No other values
may be assigned to rci.job by the user. After each call, the value of rci.job must be inspected by the user’s
code and the appropriate action taken:
-
-3
- : fatal error return, the computation must be terminated;
-
-2
- : not all desired eigenpairs converged to required accuracy, see Section 7.5;
-
-1
- : the computation is complete and successful.
-
1
- : the user must compute ,
where
W[rci.kx][ix:jx][:], with ix
rci.jx and jx
ix + rci.nx - 1,
W[rci.kx][iy:jy][:], with iy
rci.jy and jy
iy + rci.nx - 1.
-
2
- : the user must compute
if preconditioning is used or copy
to
otherwise, where
and
are as for rci.job = 1.
-
3
- : the user must compute ,
where
and
are as for rci.job = 1.
-
4
- : for each i = 0,…,m-1 such that inform.converged[i] is zero, the user must set
inform.converged[i] to a positive value if the residual norm inform.res_norms[i] and the
estimated eigenvalue and eigenvector errors inform.err_lambda[i] and inform.err_x[i] are
small enough for this eigenpair to be deemed as converged.
-
5
- : the user must save the converged eigenvectors to the eigenvector storage X and, optionally, for
problems (7.2) and (7.3), save their -images.
The converged eigenvectors are columns of the
matrix W[rci.kx][:][:] and their -images
are respective columns of W[rci.ky][:][:] that are identified by rci.i, rci.jx and rci.nx as
follows: if rci.i > 0, then the column numbers run from rci.jx to rci.jx + rci.nx - 1, and
if rci.i < 0, then they run from rci.jx - rci.nx + 1 to rci.jx.
-
11
- : if rci.i = 0, then the user must perform a copy ,
where
and
are as for rci.job = 1, otherwise the columns of W[rci.kx][:][:] and W[rci.ky][:][:] (if
rci.kx
rci.ky) must be reordered using the index array ind[] so that the column ind[j] becomes
column j for j = 0,…,rci.nx-1.
-
12
- : for each i = 0, 1,..., rci.nx - 1, the user must compute the dot product of the columns
W[rci.kx][rci.jx+i][:]
and
W[rci.ky][rci.jy+i][:]
and place it in
rr[rci.k][rci.j+i][rci.i+i].
-
13
- : if rci.kx
rci.ky, then for each i = 0, 1,..., rci.nx - 1, the user must perform the scaling
W[rci.kx][rci.jx+i][:] = W[rci.kx][rci.jx+i][:],
where
is the 2-norm of the column W[rci.kx][rci.jx+i][:], otherwise the user must perform the
scalings
W[rci.kx][rci.jx+i][:] = W[rci.kx][rci.jx+i][:]
W[rci.ky][rci.jy+i][:] = W[rci.ky][rci.jy+i][:],
where
is the square root of the dot product of the columns W[rci.kx][rci.jx+i][:] and W[rci.ky][rci.jy+i][:].
No scaling is to be applied to zero columns.
-
- 14: for each i = 0, 1,..., rci.nx - 1, the user must perform axpy-updates:
W[rci.ky][rci.jy+i][:] = W[rci.ky][rci.jy+i][:] +
rr[rci.k][rci.j+i][rci.i+i] * W[rci.kx][rci.jx+i][:].
-
- 15: the user must perform the matrix multiplication:
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1] =
rci.alpha * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:]’ *
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] +
rci.beta * rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1].
-
- 16: the user must perform the matrix multiplication:
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] =
rci.alpha * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] *
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1] +
rci.beta * W[rci.ky][rci.jy
rci.jy+rci.ny-1][:].
-
- 17: the user must perform the multiplication:
W[rci.kx][rci.jx
rci.jx+rci.ny-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] *
rr[rci.k][rci.j
rci.j+rci.ny-1][rci.i
rci.i+rci.nx-1].
W[rci.ky][rci.jy
rci.jy+rci.ny-1][:] can be used as a workspace.
-
- 21: the user must -orthogonalize
the columns of W specified by rci.nx, rci.jx and rci.kx to all vectors stored in X by solving the
system
(X’ * BX) Q = X’ * W[rci.ky][rci.jy
rci.jy+rci.nx-1][:]
for Q and updating
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] - X * Q.
For problems (7.2) and (7.3), the respective columns of W[rci.ky][:][:], which store -images
of the respective columns of W[rci.kx][:][:], must be updated accordingly, either by applying
B to these vectors or using the columns of BX, i.e.
W[rci.ky][rci.jy
rci.jy+rci.nx-1][:] =
W[rci.ky][rci.jy
rci.jy+rci.nx-1][:] - BX * Q;
-
- 22: the user must solve the system
(X’ * BX) Q = X’ * W[rci.kx][rci.jx
rci.jx+rci.nx-1][:]
for Q and perform the update
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] =
W[rci.kx][rci.jx
rci.jx+rci.nx-1][:] - BX * Q,
where X and BX are same as in the case rci.job = 21 (in the case of problem (7.1), rci.job =
21 and 22 require exactly the same computation).
-
- 999: If rci.k > 0, then a restart, normally with a larger block size m, is suggested with the aim of
achieving better convergence. If the suggestion is accepted, the user must compute the new block
size as m = rci.nx + k + l, where k
rci.i and l
rci.j, reallocate the workspace array W if the new block size is different from the old one, and set
rci.i = 0 and rci.j = 0. The first block W[0][:][:] of the new workspace should retain the
vectors W[0][i:j][:]), where i = rci.jx and j = i + rci.nx - 1, from the old workspace.
The remaining m - rci.nx columns of W[0][:][:] must be filled with arbitrary vectors that are
linearly independent from the converged eigenvectors and such that the entire set of the columns
of W[0][:][:] is linearly independent. If the restart is not acceptable (e.g. the new block size
exceeds a certain limit set by the user), then nothing needs to be done.
Restriction: rci.job = 0, rci.i = 0 and rci.j = 0 are the only assignments to the components of rci
that can be done by the user. The first one can only be done before the first call. The other two can only be
done if rci.job = 999 and rci.k > 0.
-
problem
- specifies the problem to be solved: if problem is zero, then (7.1) is solved, if it is positive then (7.2) is
solved, otherwise (7.3) is solved.
-
left
- specifies the number of desired leftmost eigenvalues. On return with rci.job = 5, can be set to zero if a
sufficient number of leftmost eigenvalues have been computed.
-
right
- specifies the number of desired rightmost eigenvalues. On return with rci.job = 5, can be set to zero if
sufficient number of rightmost eigenvalues have been computed.
-
nep
- specifies the number of desired largest eigenvalues.
-
lambda[m]
- is used to return the computed eigenvalues. After a successful completion of the computation it contains
eigenvalues in ascending order. This array must not be changed by the user.
-
m
- holds the block size of the user’s workspace W. Restriction: if both left and right are non-zero or spral_ssmfe_largest_type()
is called then m ,
otherwise m .
-
rr[3][2*m][2*m]
- is a work array used as part of the reverse communication interface. It must only be changed by
the user when instructed to do so by rci.job.
-
ind[m]
- is a work array used as part of the reverse communication interface. It must not be changed by the user. It is
used for reordering the columns of some blocks of W.
-
*keep
- must be initialized to NULL before the first call. It holds private data and must not be modified by the
user.
-
*options
- specifies the algorithmic options used by the routines, as explained in Section 7.4.1. It must not be
changed by the user between calls.
-
*inform
- is used to return information about the execution of the routine, as explained in Section 7.4.2. It must not
be changed by the user.
7.3.3 spral_ssmfe_core_free()
At the end of the computation, the memory allocated by the solver procedures should be released by
making a call to the following subroutine:
void spral_ssmfe_core_free(void **keep, struct spral_ssmfe_inform *inform);
-
*keep
- must be unchanged since the last call to spral_ssmfe_type() or spral_ssmfe_largest_type().
-
*inform
- must be unchanged since the last call to spral_ssmfe_type() or spral_ssmfe_largest_type().
7.4 Derived types
7.4.1 struct spral_ssmfe_core_options
The structure struct spral_ssmfe_core_options is used to specify the options used within SSMFE_CORE. The
components, that must be given default values through a call to spral_ssmfe_core_default_options(),
are:
-
int array_base
- specifies the array indexing base. It must have the value either 0 (C indexing) or 1 (Fortran
indexing). If array_base1,
the entries of the array ind[], and the array indices given in the reverse communication interface (e.g.
rci.jx) will start at 1, not 0. The default value is array_base0.
-
int err_est
- defines which error estimation scheme for eigenvalues and eigenvectors is to be used by the
stopping criterion. Two schemes are implemented. If err_est = 1, residual error bounds are used,
namely, a modified Davis-Kahan estimate for the eigenvector error and the Lehmann bounds for the
eigenvalue error. (see Section 7.6.2). If err_est = 2, then the eigenvector and eigenvalue errors are
estimated by analyzing the convergence curve for the eigenvalues (see Section 7.6.2). The default is
err_est = 2. Restriction: err_est = 1 or 2.
-
int extra_left
- holds the number of extra approximate eigenvectors corresponding to leftmost eigenvalues
that are of no interest to the user and are iterated solely to enhance convergence. The default is
extra_left = 0. Restriction: extra_left
0.
-
int extra_right
- holds the number of extra approximate eigenvectors corresponding to rightmost eigenvalues
that are of no interest to the user and are iterated solely to enhance convergence. The default is
extra_right = 0. Restriction: extra_right
0.
-
bool minAprod
- determines whether the number of multiplications by
is to be reduced at the expense of memory. If ,
on each iteration three returns to the user with rci.job = 1 are made for multiplications of rci.nx
vectors by .
Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s
work array W must be increased by 2. The default is minAprod = true. Restriction: minAprod = true
if problem < 0.
-
bool minBprod
- determines whether the number of multiplications by
is to be reduced at the expense of memory. If ,
on each iteration at least three returns to the user with rci.job = 3 are made for multiplications of
rci.nx vectors by .
Otherwise, only one such return is made at each iteration but the number kw of blocks in the user’s
work array W must be increased by 2. The default is minBprod = true. Restriction: minBprod = true
if problem < 0.
-
double min_gap
- controls the restart. If the relative distance between the last eigenvalue of interest on either
margin of the spectrum and the rest of the spectrum is smaller than min_gap, the solver procedure
suggests restart (cf. rci.job ).
The values rci.i and rci.j are set to the numbers of eigenvalues on the left and right margin of
the spectrum that are too close to the eigenvalues of interest, causing slow convergence. The default is
min_gap = 0, i.e. by default no restart is ever suggested. Restriction:
min_gap .
-
double cf_max
- is used to detect the convergence stagnation based on the estimated asymptotic convergence
factor
given by (7.10) (see Section 7.6.2). If
is greater than cf_max for ,
the eigenpair is marked as stagnated by setting inform.converged(j) to a negative value. The default
is cf_max = 1, i.e. the estimated asymptotic convergence factor is not used for stagnation detection.
Restriction:
cf_max .
7.4.2 struct spral_ssmfe_inform
The structure spral_ssmfe_inform is used to hold information from the execution of the solver procedures. The
components are:
-
int converged[m]
- stores convergence information. If, on some iteration i, an eigenpair (lambda[j],
X[j]) has been accepted as converged, then converged[j] = i; if the convergence stagnated, then
converged[j] = -i; otherwise converged[j] = 0.
-
double err_lambda[m]
- contains the estimated eigenvalue error for the approximate eigenvalue lambda[i]
if info.converged[i] is non-zero, and -1.0 otherwise.
-
double err_X[m]
- is used for storing the eigenvector errors in the same way as err_lambda[] is used for
storing the eigenvalue errors.
-
int flag
- is used as an error flag. If a call is successful, flag has value 0. A nonzero value of flag indicates
an error or a warning (see Section 7.5).
-
int iteration
- holds the number of iterations since the previous rci.job = 0 or rci.job = 999 call.
-
double residual_norms[m]
- holds, on return with rci.job=4 or 5, the Euclidean norm of the residual. The
norm for lambda[i], X[i] is returned as residual_norms[i].
-
int stat
- holds the allocation status (see Section 7.5).
7.5 Error flags
A successful return from ssmfe and ssmfe_largest is indicated by
inform.flag0.
A negative value indicates an error, a positive value indicates a warning.
Possible negative values of inform.flag are as follows:
-
- -1 Block size m is out-of-range.
-
- -2 Incorrect value of rci.job.
-
- -3 Incorrect value of options.err_est.
-
- -4 Incorrect value of options.minAprod or options.minBprod.
-
- -5 Incorrect value of options.extra_left or options.extra_right.
-
- -6 Incorrect value of options.min_gap.
-
- -7 Incorrect value of options.cf_max.
-
- -11 Incorrect value of left (for ssmfe) or nep (for ssmfe_largest).
-
- -12 Incorrect value of right.
-
- -100 Not enough memory; inform.stat contains the value of the Fortran stat parameter.
-
- -200
is not positive definite or initial eigenvectors are linearly dependent.
The value inform.flag = 1 indicates that the iterations have been terminated because no further improvement in accuracy is
possible (this may happen if
or the preconditioner is not positive definite, or if the components of the residual vectors are so small that the
round-off errors make them essentially random).
7.6 Method
7.6.1 The algorithm
The algorithm implemented by spral_ssmfe_type() and spral_ssmfe_largest_type() is based on the
Jacobi-conjugate preconditioned gradients (JCPG) method. The outline of the algorithm, assuming for simplicity that 0
< left
m and right = 0 and using Matlab notation, is as follows.
The orthogonalization to constraints on step 2 ensures that the algorithm deals with the residuals for the
constrained problem rather than with those for the unconstrained one, which may not go to zeros if the constraints
are not close enough to exact eigenvectors.
(7.6) ensures optimality of the new search direction vector Y(:,j), notably, the search for the minimum of the
Rayleigh quotient (7.4) or, in the case of problem (7.3), (7.5) in the direction of this vector produces asymptotically
smallest possible value.
The search directions cleanup procedure employed on step 7 ensures that the convergence of iterations is
not damaged by the computational errors of the LAPACK eigensolver _SYGV, in the Rayleigh-Ritz
procedure of step 8. The accuracy of _SYGV is affected by the condition number of M. If the latter is very
large, poor accuracy in the computed Ritz vectors may lead to convergence failure. The ordering of
Y(:,j) ensures that the ‘least important’ search direction is dropped if the condition number of M is
unacceptably large. In practice, the loss of search direction is a rare and does not lead to convergence
problems.
If the number of sought eigenpairs exceeds m, then m is not reduced on step 3. Instead, the approximate
eigenvectors moved to C are replaced with vectors from Z.
7.6.2 Error estimation
Standard problem
If options.err_est = 1, the error estimates for the eigenvalues are based on the eigenvalues of a matrix of the form
where is a diagonal
matrix with the leftmost
Ritz values on the diagonal,
and the columns of are the
respective residual vectors
divided by . If
is such that
, then the eigenvalues of
are the left-hand side bounds
for eigenvalues , and thus
the difference estimates
the eigenvalue error .
The unknown is
replaced by , and select
the maximal for which
the distance between
and exceeds
the sum of the absolute error tolerance for eigenvalues and the Frobenius norm of the matrix formed by the residuals
. If
is close
to the machine accuracy, it may be too polluted by round-off errors to rely upon. In such case, we use instead
The eigenvector errors are estimated based on the Davis-Kahan inequality:
where is the invariant
subspace corresponding to
leftmost eigenvalues.
If options.err_est2
the errors are estimated based on the eigenvalue decrements history, which produces an estimate for the
asymptotic convergence facotr, the geometrical average of the eigenvalue error reduction per iteration:
| (7.10) |
where is the
approximation to
on -th
iteration (see Technical Report RAL-TR-2010-19 for further details). Unlike the residual estimates mentioned in this
section, such ‘kinematic’ error estimates are not guaranteed to be upper bounds for the actual errors. However, the
numerical tests have demonstrated that kinematic error estimates are significantly more accurate, i.e. closer to the
actual error, than the residual-based estimates. Furthermore, they straightforwardly apply to the generalized case as
well.
Generalized problems
In the case of the generalized eigenvalue problem (7.2), all of the residual norms in the previous section must be replaced
with -norm of
the residual
() or its upper
estimate, e.g. , where
is the smallest
eigenvalue of .
Hence, if
is known, then the error tolerances for eigenvalues and eigenvectors must be multiplied by
and
respectively. If no
estimate for -norm
is available, then the use of non-zero residual tolerances and
options.err_est1
is not recommended. In the case of problem (7.3), the residuals are computed as
,
-norms of
are used in (7.8) and (7.9), and
Lehmann matrix becomes .
References
[1] E. E. Ovtchinnikov and J. Reid. A preconditioned block conjugate gradient algorithm for computing extreme
eigenpairs of symmetric and Hermitian problems. Technical Report RAL-TR-2010-19, 2010.
[2] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian
eigenvalue computation I: Computing an extreme eigenvalue, SIAM J. Numer. Anal., 46:2567–2592,
2008.
[3] E. E. Ovtchinnikov, Jacobi correction equation, line search and conjugate gradients in Hermitian
eigenvalue computation II: Computing several extreme eigenvalues, SIAM J. Numer. Anal., 46:2593–2619,
2008.
7.7 Examples
7.7.1 Preconditioning example
The following code computes the 5 leftmost eigenpairs of the matrix
of
order 100 that approximates the two-dimensional Laplacian operator on a 20-by-20 grid. One forward
and one backward Gauss-Seidel update are used for preconditioning, which halves the number of
iterations compared with solving the same problem without preconditioning. The header laplace2d.h
(examples/C/ssmfe/laplace2d.h) supplies the subroutine apply_laplacian() that multiplies a block of vectors by
,
and a subroutine apply_gauss_seidel_step() that computes
for a block
of vectors
by applying one forward and one backward update of the Gauss-Seidel method to the system
.
/* examples/C/ssmfe/precond_core.f90 */
/* Laplacian on a square grid (using SPRAL_SSMFE_CORE routines) */
#include "spral.h"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <cblas.h>
/* Header that implements Laplacian and preconditioners */
#include "laplace2d.h"
void main(void) {
const int ngrid = 20; /* grid points along each side */
const int n = ngrid*ngrid; /* problem size */
const int nep = 5; /* eigenpairs wanted */
const int m = 3; /* dimension of the iterated subspace */
const double tol = 1.e-6; /* eigenvector tolerance */
int state = SPRAL_RANDOM_INITIAL_SEED; /* PRNG state */
int ind[m]; /* permutation index */
double lambda[n]; /* eigenvalues */
double X[n][n]; /* eigenvectors */
/* Work arrays */
double lmd[m];
double rr[3][2*m][2*m];
double W[7][m][n];
double U[m][n];
/* Derived types */
struct spral_ssmfe_rcid rci; /* reverse communication data */
struct spral_ssmfe_core_options options; /* options */
void *keep; /* private data */
struct spral_ssmfe_inform inform; /* information */
/* Initialize options to default values */
spral_ssmfe_core_default_options(&options);
/* Initialize W to lin indep vectors by randomizing */
for(int i=0; i<n; i++)
for(int j=0; j<m; j++)
W[0][j][i] = spral_random_real(&state, true);
int ncon = 0; /* number of converged eigenpairs */
rci.job = 0; keep = NULL;
while(true) { /* reverse communication loop */
spral_ssmfe_double(&rci, 0, nep, 0, m, lmd, &rr[0][0][0], ind,
&keep, &options, &inform);
switch ( rci.job ) {
case 1:
apply_laplacian(
ngrid, ngrid, rci.nx, &W[rci.kx][rci.jx][0], &W[rci.ky][rci.jy][0]
);
break;
case 2:
apply_gauss_seidel_step (
ngrid, ngrid, rci.nx, &W[rci.kx][rci.jx][0], &W[rci.ky][rci.jy][0]
);
break;
case 4:
for(int j=0; j<m; j++) {
if ( inform.converged[j] != 0 ) continue;
if ( inform.err_X[j] > 0 && inform.err_X[j] < tol )
inform.converged[j] = 1;
}
break;
case 5:
if ( rci.i < 0 ) continue;
for(int k=0; k<rci.nx; k++) {
int j = ncon + k;
lambda[j] = lmd[rci.jx + k];
cblas_dcopy( n, &W[0][rci.jx+k][0], 1, &X[j][0], 1 );
}
ncon += rci.nx;
if ( ncon >= nep || inform.iteration > 300 ) goto finished;
break;
case 11:
if ( rci.i == 0 ) {
if ( rci.kx != rci.ky || rci.jx > rci.jy ) {
cblas_dcopy(n*rci.nx, &W[rci.kx][rci.jx][0], 1, &W[rci.ky][rci.jy][0], 1);
} else if ( rci.jx < rci.jy ) {
for(int j=rci.nx-1; j>=0; j--)
cblas_dcopy(n, &W[rci.kx][rci.jx+j][0], 1, &W[rci.ky][rci.jy+j][0], 1);
}
} else {
for(int i=0; i<n; i++) {
for(int j=0; j<rci.nx; j++)
U[j][i] = W[rci.kx][ind[j]][i];
for(int j=0; j<rci.nx; j++)
W[rci.kx][j][i] = U[j][i];
if(rci.ky != rci.kx) {
for(int j=0; j<rci.nx; j++)
U[j][i] = W[rci.ky][ind[j]][i];
for(int j=0; j<rci.nx; j++)
W[rci.ky][j][i] = U[j][i];
}
}
}
break;
case 12:
for(int i=0; i<rci.nx; i++)
rr[rci.k][rci.j+i][rci.i+i] =
cblas_ddot(n, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1);
break;
case 13:
for(int i=0; i<rci.nx; i++) {
if( rci.kx == rci.ky ) {
double s = cblas_dnrm2(n, &W[rci.kx][rci.jx+i][0], 1);
if( s > 0 )
cblas_dscal(n, 1/s, &W[rci.kx][rci.jx+i][0], 1);
} else {
double s = sqrt(fabs(cblas_ddot(
n, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1)
));
if ( s > 0 ) {
cblas_dscal(n, 1/s, &W[rci.kx][rci.jx+i][0], 1);
cblas_dscal(n, 1/s, &W[rci.ky][rci.jy+i][0], 1);
} else {
for(int j=0; j<n; j++)
W[rci.ky][rci.jy+i][j] = 0.0;
}
}
}
break;
case 14:
for(int i=0; i<rci.nx; i++) {
double s = -rr[rci.k][rci.j+i][rci.i+i];
cblas_daxpy(n, s, &W[rci.kx][rci.jx+i][0], 1, &W[rci.ky][rci.jy+i][0], 1);
}
break;
case 15:
if ( rci.nx > 0 && rci.ny > 0 )
cblas_dgemm(
CblasColMajor, CblasTrans, CblasNoTrans, rci.nx, rci.ny, n,
rci.alpha, &W[rci.kx][rci.jx][0], n, &W[rci.ky][rci.jy][0], n,
rci.beta, &rr[rci.k][rci.j][rci.i], 2*m
);
break;
case 16: // Fall through to 17
case 17:
if( rci.ny < 1 ) continue;
if( rci.nx < 1 ) {
if( rci.job == 17 ) continue;
if( rci.beta == 1.0 ) continue;
for(int j=rci.jy; j<rci.jy+rci.ny; j++)
cblas_dscal(n, rci.beta, &W[rci.ky][j][0], 1);
continue;
}
if( rci.job == 17 ) {
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
1.0, &W[rci.kx][rci.jx][0], n, &rr[rci.k][rci.j][rci.i], 2*m,
0.0, &W[rci.ky][rci.jy][0], n
);
cblas_dcopy(n*rci.ny, &W[rci.ky][rci.jy][0], 1, &W[rci.kx][rci.jx][0], 1);
} else {
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
rci.alpha, &W[rci.kx][rci.jx][0], n, &rr[rci.k][rci.j][rci.i],
2*m, rci.beta, &W[rci.ky][rci.jy][0], n
);
}
break;
case 21: // Fall through to 22
case 22:
if( ncon > 0 ) {
cblas_dgemm(
CblasColMajor, CblasTrans, CblasNoTrans, ncon, rci.nx, n,
1.0, &X[0][0], n, &W[rci.ky][rci.jy][0], n, 0.0, &U[0][0], n
);
cblas_dgemm(
CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.nx, ncon,
-1.0, &X[0][0], n, &U[0][0], n, 1.0, &W[rci.kx][rci.jx][0], n
);
}
break;
default:
goto finished;
}
}
finished:
if(inform.flag != 0) printf("inform.flag = %d\n", inform.flag);
printf("%3d eigenpairs converged in %d iterations\n", ncon, inform.iteration);
for(int i=0; i<ncon; i++)
printf(" lambda[%1d] = %13.7e\n", i, lambda[i]);
spral_ssmfe_core_free(&keep, &inform);
}
The code generates the following output:
5 eigenpairs converged in 72 iterations
lambda(1) = 4.4676695E-02
lambda(2) = 1.1119274E-01
lambda(3) = 1.1119274E-01
lambda(4) = 1.7770878E-01
lambda(5) = 2.2040061E-01