SPRAL_SSIDSv1.0.0
Sparse Symmetric Indefinite Direct Solver
C User Guide
This package solves one or more sets of
sparse
symmetric
equations
using a multifrontal method on an
NVIDIA GPU. The following cases are covered:
-
is indefinite. SSIDS computes the sparse factorization
where
is a permutation matrix,
is unit lower triangular, and
is block diagonal with blocks of size
and .
-
is positive definite. SSIDS computes the sparse Cholesky factorization
where
is a permutation matrix and
is lower triangular. However, as SSIDS is designed primarily for indefinite systems, this may be slower
than a dedicated Cholesky solver.
SSIDS returns bit-compatible results.
An option exists to scale the matrix. In this case, the factorization of the scaled matrix
is computed,
where is
a diagonal scaling matrix.
Jonathan Hogg (STFC Rutherford Appleton Laboratory)
Evgueni Ovtchinnikov (STFC Rutherford Appleton Laboratory)
Jennifer Scott (STFC Rutherford Appleton Laboratory)
Major version history
-
2014-03-17 Version 1.0.0
- Initial release
5.1 Installation
Please see the SPRAL install documentation. In particular note that:
- A CUDA compiler (nvcc) is required.
- A METIS library is required.
- A BLAS library is required (in addition to CUBLAS).
5.2 Usage overview
5.2.1 Calling sequences
Access to the package requires inclusion of either spral.h (for the entire SPRALlibrary) or spral_ssids.h (for just
the relevant routines). i.e.
#include "spral.h"
The following functions are available to the user:
- spral_ssids_default_options() initializes the options structure to default values.
- spral_ssids_analyse() accepts the matrix data in compressed sparse column format and optionally
checks it for duplicates and out-of-range entries. The user may supply an elimination order; otherwise
one is generated. Using this elimination order, spral_ssids_analyse() analyses the sparsity pattern
of the matrix and prepares the data structures for the factorization.
- spral_ssids_analyse_coord() is an alternative to spral_ssids_analyse() that may be used if the
user has the matrix data in coordinate format. Again, the user may supply an elimination order;
otherwise one is generated. spral_ssids_analyse_coord() checks the matrix data for duplicates and
out-of-range entries, stores it in compressed sparse column format and then proceeds in the same way
as spral_ssids_analyse().
- spral_ssids_factor() uses the data structures set up by spral_ssids_analyse() to compute
a sparse factorization. More than one call to spral_ssids_factor() may follow a call to
spral_ssids_analyse() (allowing more than one matrix with the same sparsity pattern but different
numerical values to be factorized without multiple calls to spral_ssids_analyse()). An option exists
to scale the matrix.
- spral_ssids_solve1() and spral_ssids_solve() use the computed factors generated by spral_ssids_factor()
to solve systems
for one or more right-hand sides .
Multiple calls to spral_ssids_solve1() and spral_ssids_solve() may follow a call to spral_ssids_factor().
An option is available to perform a partial solution.
- spral_ssids_free() should be called after all other calls are complete for a problem (including after
an error return that does not allow the computation to continue). It frees memory and resources used
by internal data structures.
In addition, the following routines may be called:
- spral_ssids_free_akeep() and spral_ssids_free_fkeep() may be called to free the memory and
resources allocated by spral_ssids_analyse() or spral_ssids_factorize() to be freed separately.
- spral_ssids_enquire_posdef() may be called in the positive-definite case to obtain the pivots used.
- spral_ssids_enquire_indef() may be called in the indefinite case to obtain the pivot sequence used
by the factorization and the entries of .
- spral_ssids_alter() may be called in the indefinite case to alter the entries of .
Note that this means that
is no longer a factorization of .
5.2.2 Derived types
For each problem, the user must employ the derived types defined by the package to declare scalars of the types
struct spral_ssids_options and struct spral_ssids_inform. The user must also declare two void * pointers
akeep and fkeep for the package’s private data structures. The options data structure must be initialized using
spral_ssids_default_options(), and the pointers akeep and fkeep must be initialized to NULL. The following
pseudo-code illustrates this.
#include "spral.h"
...
struct spral_ssids_options options;
struct spral_ssids_inform inform;
void *akeep = NULL;
void *fkeep = NULL;
...
spral_ssids_default_options(&options);
...
The components of spral_ssids_options and spral_ssids_inform are explained in Sections 5.5.1 and 5.5.2.
The void * pointers are allocated using Fortran, and must be passed to spral_ssids_free_akeep(),
spral_ssids_free_fkeep() and/or spral_ssids_free() to free the associated memory.
5.2.3 Achieving bit-compatibility
Care has been taken to ensure bit-compatibility is achieved using this solver. That is, consecutive runs with the same
data on the same machine produces exactly the same solution.
5.2.4 Data formats
Compressed Sparse Column (CSC) Format
This standard data format consists of the following data:
int n; /* size of matrix */
int ptr[n+1]; /* column pointers */
int row[ ptr[n]-1 ]; /* row indices */
double val[ ptr[n]-1 ]; /* numerical values */
Non-zero matrix entries are ordered by increasing column index and stored in the arrays row[] and val[] such that
row[k] holds the row number and val[k] holds the value of the k-th entry. The ptr[] array stores column
pointers such that ptr[i] is the position in row[] and val[] of the first entry in the i-th column, and
ptr[n] is the total number of entries. Entries that are zero, including those on the diagonal, need not be
specified.
If this format is used, SSIDS requires only the lower triangular entries of
, and
there should be no duplicate entries. If the check argument to ssids_analyse() is true, out-of-range entries
(including those in the upper triangle) will be discarded and any duplicates will be summed.
To illustrate the CSC format, the following arrays describe the matrix shown in Figure 5.1.
int n = 5;
int ptr[] = { 0, 3, 4, 6, 8, 9 };
int row[] = { 0, 1, 3, 2, 2, 4, 3, 4, 4 };
double val[] = { 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9 };
Coordinate Format
This standard data format consists of the following data:
int n; /* size of matrix */
int ne; /* number of non-zero entries */
int row[ne]; /* row indices */
int col[ne]; /* column indices */
double val[ne]; /* numerical values */
The arrays should be set such that the k-th entry is in row row[k] and column col[k] with value val[k]. Entries
that are zero, including those on the diagonal, need not be specified.
If this format is used, SSIDS requires that each entry of
should
be present only in the lower or upper triangular part. Entries present in both will be summed, as will any duplicate
entries. Out-of-range entries are ignored.
To illustrate the coordinate format, the following arrays describe the matrix shown in Figure 5.1.
int n = 5;
int ne = 9;
int row[] = { 1, 2, 3, 4, 3, 5, 4, 5, 5 };
int col[] = { 1, 1, 2, 1, 3, 3, 4, 4, 5 };
double val[] = { 1.1, 2.2, 4.4, 3.3, 5.5, 6.6, 7.7, 8.8, 9.9 };
5.3 Basic Subroutines
5.3.1 spral_ssids_default_options()
To initialize a variable of type struct spral_ssids_options, the following routine is provided.
void spral_ssids_default_options(struct spral_ssids_options *options);
-
*options
- is the instance to be initialized.
5.3.2 spral_ssids_analyse() and spral_ssids_analyse_coord()
To analyse the sparsity pattern and prepare for the factorization, the following routines are
provided.
- for Compressed Sparse Column (CSC) format:
void spral_ssids_analyse(bool check, int n, int *order, const int *ptr,
const int *row, const double *val, void **akeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform);
- for Coordinate format:
void spral_ssids_analyse_coord(int n, int *order, int ne, const int *row,
const int *col, const double *val, void **akeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform);
Matrix data should be supplied as described in Section 5.2.4. As the package uses CSC format internally,
spral_ssids_analyse() with checking disabled provides the most efficient interface.
-
check
- determines if the matrix is checked for errors (true) or not (false). If set to true the matrix data is
checked for errors and the cleaned matrix (duplicates are summed and out-of-range entries discarded)
is stored in akeep. Otherwise, for data in CSC format, no checking of the matrix data is carried out and
ptr[] and row[] must be passed unchanged to the factorization routines. Checking is always performed
when the coordinate format is used.
-
n, ptr[n+1], row[ptr[n]]
- specify the lower triangular part of
in CSC format (see Section 5.2.4).
-
n, ne, row[ne], col[ne]
- specify the lower (or upper) triangular part of
in coordinate format (see Section 5.2.4).
-
order[n]
- may be NULL. If options.ordering0,
order(:) must non-NULL and order[i] must hold the position of variable
in the elimination order. On exit, order[] contains the elimination order that spral_ssids_factor()
will be given (it is passed to these routines as part of *akeep); this order may give slightly more fill-in
than any user-supplied order and, in the indefinite case, may be modified by spral_ssids_factor()
to maintain numerical stability.
-
val[ne]
- may be NULL, if non-NULL it must hold the numerical values of the entries of the matrix, as
described in Section 5.2.4. val[] must not be NULL if a matching-based elimination ordering is required
(options.ordering2),
and is otherwise ignored.
-
*akeep
- is allocated to hold data about the problem being solved and must be passed unchanged to the other
subroutines.
-
*options
- specifies the algorithmic options used by the routine, as explained in Section 5.5.1.
-
*inform
- is used to return information about the execution of the routine, as explained in Section 5.5.2.
5.3.3 spral_ssids_factor()
To factorize the matrix, the following routine is provided.
void spral_ssids_factor(bool posdef, const int *ptr, const int *row, const double *val,
double *scale, void *akeep, void **fkeep, const struct spral_ssids_options *options,
struct spral_ssids_inform *inform);
-
posdef
- must be true if the matrix is positive-definite, and false if it is indefinite.
-
ptr[n+1]
- and row[ptr[n]] may both be NULL unless spral_ssids_analyse() was called with check set
to false. In this case, they must both be non-NULL and must be unchanged since that call.
-
val[ptr[n]]
- must hold the numerical values of the entries of the matrix, as described in Section 5.2.4.
-
scale[n]
- may be NULL. If non-NULL and options.scaling0,
it must contain the diagonal entries of the scaling matrix .
On exit, scale[i] will contain the i-th diagonal entry of the scaling matrix .
-
akeep
- must be unchanged since the call to spral_ssids_analyse() or spral_ssids_analyse_coord().
-
*fkeep
- is allocated to hold data about the problem being solved and must be passed unchanged to the other
subroutines.
-
*options
- specifies the algorithmic options used by the subroutine, as explained in Section 5.5.1.
-
*inform
- is used to return information about the execution of the subroutine, as explained in Section 5.5.2.
5.3.4 spral_ssids_solve()
To solve the linear system ,
after a call to spral_ssids_factor(), the following routines are provided.
- for a single right-hand side:
void spral_ssids_solve1(int job, double *x1, void *akeep, void *fkeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform);
- for one or more right-hand sides:
void spral_ssids_solve(int job, int nrhs, double *x, int ldx, void *akeep,
void *fkeep, const struct spral_ssids_options *options,
struct spral_ssids_inform *inform);
-
job
- is used to specify the solve to be performed. In the positive-definite case, the Cholesky factorization that
has been computed may be expressed in the form
where
is a permutation matrix and
is lower triangular. In the indefinite case, the factorization that has been computed may be expressed
in the form
where is a permutation matrix,
is unit lower triangular,
and is block diagonal with
blocks of order 1 and 2. is
a diagonal scaling matrix (
is equal to the identity, if options.scaling=0 and scale is NULL on the last call to spral_ssids_factor()).
A partial solution may be computed by setting job to have one of the following values:
-
0
- for solving
-
1
- for solving
-
2
- for solving
(indefinite case only)
-
3
- for solving
-
4
- for solving
(indefinite case only)
-
x1[n]
- must be set such that x1[i] holds the component of the right-hand side for variable
. On exit, x1[i] holds
the solution for variable .
-
nrhs
- holds the number of right-hand sides.
-
x[nrhs][ldx]
- must be set so that x[j][i] holds the component of the right-hand side for variable
to the
-th system. On exit, x[j][i]
holds the solution for variable
to the -th
system.
-
ldx
- must be set to the leading dimension of array x[].
-
akeep
- must be unchanged since the last call to spral_ssids_factor().
-
fkeep
- must be unchanged since the last call to spral_ssids_factor().
-
*options
- specifies the algorithmic options used by the subroutine, as explained in Section 5.5.1.
-
inform
- is used to return information about the execution of the subroutine, as explained in Section 5.5.2.
5.3.5 spral_ssids_free(), spral_ssids_free_akeep(), and spral_ssids_free_fkeep()
Once all other calls are complete for a problem or after an error return that does not allow the computation to
continue, a call should be made to free memory and CUDA resources allocated by SSIDS and associated with the
derived data types akeep and/or fkeep using calls to spral_ssids_free().
To free memory and resources, the following routines are provided.
- allocated by spral_ssids_analyse() or spral_ssids_analyse_coord():
int spral_ssids_free_akeep(void **akeep);
- allocated by spral_ssids_factor():
int spral_ssids_free_fkeep(void **fkeep);
- both at once:
int spral_ssids_free(void **akeep, void **fkeep);
A non-zero return value gives a CUDA error code. This may indicate either a failure to deallocate GPU memory, or a
pre-existing CUDA error condition. Note that due to the asynchronous nature of GPU execution, the reported error
may have a cause errors external to SSIDS.
-
*akeep
- must be passed unchanged by the user. On exit, associated memory and CUDA resources will have
been released, and *akeep set to NULL.
-
*fkeep
- that must be passed unchanged by the user. On exit, associated memory and CUDA resources will
have been released, and *fkeep set to NULL.
5.4 Advanced subroutines
5.4.1 spral_ssids_enquire_posdef()
To obtain the matrix
following a positive-definite factorization, the following routine is provided.
void spral_ssids_enquire_posdef(const void *akeep, const void *fkeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform,
double *d);
-
akeep
- must be unchanged since the call to spral_ssids_factor().
-
fkeep
- must be unchanged since the call to spral_ssids_factor().
-
*options
- specifies the algorithmic options used by the subroutine, as explained in Section 5.5.1.
-
*inform
- is used to return information about the execution of the subroutine, as explained in Section 5.5.2.
-
d[n]
- holds, on exit, the diagonal entries of the matrix .
5.4.2 spral_ssids_enquire_indef()
To obtain the matrix
and/or the pivot order following an indefinite factorization, the following routine is provided.
void spral_ssids_enquire_indef(const void *akeep, const void *fkeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform,
int *piv_order, double *d);
-
akeep
- must be unchanged since the call to spral_ssids_factor().
-
fkeep
- must be unchanged since the call to spral_ssids_factor().
-
*options
- specifies the algorithmic options used by the subroutine, as explained in Section 5.5.1.
-
*inform
- is used to return information about the execution of the subroutine, as explained in Section 5.5.2.
-
piv_order[n]
- may be NULL. If non-NULL then, on exit, piv_order[i] holds the position of variable
in the pivot order.
-
d[n][2]
- may be NULL. If non-NULL, on exit diagonal entries of
will be placed in d[i][0], ,
the off-diagonal entries of
will be placed in d[i][1], ,
and d[n][1] will be set to zero.
5.4.3 spral_ssids_alter()
To alter
following an indefinite factorization, the following routine is provided.
void spral_ssids_alter(const double *d, const void *akeep, void *fkeep,
const struct spral_ssids_options *options, struct spral_ssids_inform *inform);
Note that this routine is not compatabile with the option
options.presolve1.
-
d[n][2]
- must hold the new entries of the matrix .
The diagonal entries of
will be altered to d[i][0], ,
and the off-diagonal entries will be altered to d[i][1],
(and
will no longer be a factorization of ).
-
akeep
- must be unchanged since the call to spral_ssids_factor().
-
fkeep
- must be unchanged since the call to spral_ssids_factor().
-
*options
- specifies the algorithmic options used by the subroutine, as explained in Section 5.5.1.
-
*inform
- is used to return information about the execution of the subroutine, as explained in Section 5.5.2.
5.5 Derived types
5.5.1 struct spral_ssids_options
The structure spral_ssids_options is used to specify the options used within SSIDS. The components, that must
be given default values through a call to spral_ssids_default_options(), are:
C specific options
-
int array_base
- specifies the array indexing base. It must have the value either 0 (C indexing) or 1 (Fortran
indexing). If array_base1,
the entries of arrays ptr[], row[], col[], order[], and piv_order[] start at 1, not 0. Further, entries
of piv_order[] may also have negative sign to indicate they are part of a
pivot. The default value is array_base0.
Printing options
-
int print_level
- is used to control the level of printing. The different levels are:
-
-
0 No printing.
-
- = 0 Error and warning messages only.
-
- = 1 As 0, plus basic diagnostic printing.
-
-
1 As 1, plus some additional diagnostic printing.
The default is print_level0.
-
int unit_diagnostics
- holds Fortran the unit number for diagnostic printing. Printing is suppressed if
unit_diagnostics. The
default is unit_diagnostics6.
-
int unit_error
- holds the Fortran unit number for error messages. Printing of error messages is suppressed if
unit_error0. The
default is unit_error6.
-
int unit_warning
- holds the Fortran unit number for warning messages. Printing of warning messages is suppressed if
unit_warning0. The
default is unit_warning6.
Options used by spral_ssids_analyse() and spral_ssids_analyse_coord()
-
int ordering
- controls the ordering algorithm used. If set to 0, the user must supply an elimination order in
order[]; otherwise ssids_analyse() or ssids_analyse_coord() will an compute an elimination order. The
options are:
-
- 0 User-supplied ordering is used.
-
- 1 METIS ordering with default settings is used.
-
- 2 A matching-based elimination ordering is computed (the Hungarian algorithm is used to identify
large off-diagonal entries. A restricted METIS ordering is then used that forces these on to the
subdiagonal). This option should only be chosen for indefinite systems. A scaling is also computed
that may be used in spral_ssids_factor() (see options.scaling below).
The default is ordering1.
Restriction: ordering0,
1, 2.
-
int nemin
- controls node amalgamation. Two neighbours in the elimination tree
are merged if they both involve fewer than nemin eliminations. The default is
nemin8. The default
is used if nemin1.
Options used by spral_ssids_factor()
-
int scaling
- controls the use of scaling. The available options are:
-
0
-
No scaling (if scale is NULL), or user-supplied scaling (if scale is non-NULL).
-
1
-
Compute a scaling using a weighted bipartite matching via the Hungarian Algorithm (MC64
algorithm).
-
2
-
Compute a scaling using a weighted bipartite matching via the Auction Algorithm (may be lower
quality than that computed using the Hungarian Algorithm, but can be considerably faster).
-
3
-
A matching-based ordering has been generated during the analyse phase using options.ordering
2. Use the scaling generated as a side-effect of this process. The scaling will be the same as that
generated with options.scaling
1 if the matrix values have not changed. This option will generate an error if a matching-based
ordering was not used.
-
4
-
Compute a scaling using the norm-equilibration algorithm of Ruiz.
The default is scaling0.
Options used by spral_ssids_factor() with
posdeffalse
(
indefinite)
-
bool action
- controls the behaviour for singular matrices. If the matrix is found to be singular (has rank less
than the number of non-empty rows), the computation continues after issuing a warning if action has
the value true or terminates with an error if it has the value false. The default is actiontrue.
-
double u
- holds the relative pivot tolerance .
The default is u0.01.
Values outside the range
are treated as the closest value in that range.
Options used by spral_ssids_factor() and spral_ssids_solve()
-
bool use_gpu_solve
- controls whether to use the CPU or GPU for the spral_ssids_solve(). If true,
the GPU is used, but some more advanced features are not available (see the description of the job
parameter in Section 5.3.4). Setting use_gpu_solvefalse
is only compatible with options.presolve0.
The default value is use_gpu_solvetrue.
-
int presolve
- controls the amount of extra work performed during the factorization to accelerate the solve. It can
take the following values:
-
0
- Minimal work is performed during spral_ssids_factor() to prepare for the solve.
-
1
- The explicit inverse of the nelimnelim
diagonal block in each supernode is precalculated during spral_ssids_factor() (where nelim is
the number of variables eliminated at that supernode). As the matrix
is overwritte, the routine spral_ssids_alter() cannot be used. This option is not compatible
with options.use_gpu_solvefalse.
The default option is presolve0.
5.5.2 struct spral_ssids_inform
The structure spral_ssids_inform is used to hold parameters that give information about the progress
and needs of the algorithm. The components of struct spral_ssids_inform (in alphabetical order)
are:
-
ing flag
- gives the exit status of the algorithm (details in Section 5.6).
-
int matrix_dup
- holds, on exit from spral_ssids_analyse() with check set to true or on exit from
spral_ssids_analyse_coord(), the number of duplicate entries that were found and summed.
-
int matrix_missing_diag
- holds, on exit from spral_ssids_analyse() with check set to true, or exit
from ssids_analyse_coord(), the number of diagonal entries without an explicitly provided value.
-
int matrix_outrange
- holds, on exit from spral_ssids_analyse() with check set to true or exit from
spral_ssids_analyse_coord(), the number of out-of-range entries that were found and discarded.
-
int matrix_rank
- holds, on exit from spral_ssids_analyse() or on exit from spral_ssids_analyse_coord(),
the structural rank of ,
if available (otherwise, it is set to n). On exit from spral_ssids_factor(), it holds the computed rank
of the factorized matrix.
-
int maxdepth
- holds, on exit from spral_ssids_analyse() or spral_ssids_analyse_coord(), the
maximum depth of the assembly tree.
-
int maxfront
- holds, on exit from spral_ssids_analyse() or spral_ssids_analyse_coord(), the
maximum front size in the positive-definite case (or in the indefinite case with the same pivot sequence).
On exit from spral_ssids_factor(), it holds the maximum front size.
-
int num_delay
- holds, on exit from spral_ssids_factor(), the number of eliminations that were delayed,
that is, the total number of fully-summed variables that were passed to the father node because of
stability considerations. If a variable is passed further up the tree, it will be counted again.
-
long num_factor
- holds, on exit from spral_ssids_analyse() or spral_ssids_analyse_coord(), the
number of entries that will be in the factor
in the positive-definite case (or in the indefinite case with the same pivot sequence). On exit from
ssids_factor(), it holds the actual number of entries in the factor .
In the indefinite case, 2n entries of
are also held.
-
long num_flops
- holds, on exit from spral_ssids_analyse() or spral_ssids_analyse_coord(), the
number of floating-point operations that will be needed to perform the factorization in the
positive-definite case (or in the indefinite case with the same pivot sequence). On exit from
spral_ssids_factor(), it holds the number of floating-point operations performed.
-
int num_neg
- holds, on exit from spral_ssids_factor(), the number of negative eigenvalues of the matrix
.
-
int num_sup
- holds, on exit from spral_ssids_analyse() or spral_ssids_analyse_coord(), the number
of supernodes in the problem.
-
int num_two
- holds, on exit from spral_ssids_factor(), the number of
pivots used by the factorization, that is, the number of
blocks in .
-
int stat
- holds, in the event of an allocation or deallocation error, the Fortran stat parameter if it is
available (and is set to 0 otherwise).
-
int cublas_error
- holds, in the event of an error return from the CUBLAS library, the error code returned
(and is 0 otherwise).
-
int cuda_error
- holds, in the event of a CUDA error, the error code returned (and is 0 otherwise). Note
that due to the asynchronous nature of GPU execution, the reported error may have a cause errors
external to SSIDS.
5.6 Error flags
A successful return from a routine in the package is indicated by inform.flag having the value zero. A negative
value is associated with an error message that by default will be output on unit options.unit_error.
Possible negative values are:
-
- 1
An error has been made in the sequence of calls (this includes calling a subroutine after an error that
cannot be recovered from).
-
- 2
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if n0.
Also returned by spral_ssids_analyse_coord() if ne1.
-
- 3
Returned by spral_ssids_analyse() if there is an error in ptr[].
-
- 4
Returned by spral_ssids_analyse() if all the variable indices in one or more columns are out-of-range.
Also returned by spral_ssids_analyse_coord() if all entries are out-of-range.
-
- 5
Returned by spral_ssids_factor() if posdeffalse
and options.action = false when the matrix is found to be singular. The user may reset the matrix
values in val[] and recall spral_ssids_factor().
-
- 6
Returned by spral_ssids_factor() if posdeftrue
and the matrix is found to be not positive definite. This may be because the Hungarian scaling
determined that the matrix was structurally singular. The user may reset the matrix values in val[]
and recall spral_ssids_factor().
-
- 7
Returned by spral_ssids_factor() if spral_ssids_analyse() was called with check set to false
but ptr[] and/or row[] was NULL.
-
- 8
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if options.ordering is
out-of-range, or options.ordering0
and the user has either failed to provide an elimination order or an error has been found in the
user-supplied elimination order (as supplied in order[]).
-
- 9
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if options.ordering2,
but val[] was NULL.
-
- 10
Returned by spral_ssids_solve() if there is an error in the size of array x[][] (that is, ldxn
or nrhs1).
The user may reset ldx and/or nrhs and recall spral_ssids_solve().
-
- 11
Returned by spral_ssids_solve() if job is out-of-range. The user may reset job and recall
spral_ssids_solve().
-
- 12
Returned by spral_ssids_solve() and spral_ssids_alter() if the selected combination of options.use_gpu_solve
and options.presolve are not compatible with the requested operation.
-
- 13
Returned by spral_ssids_enquire_posdef() if posdeffalse
on the last call to spral_ssids_factor().
-
- 14
Returned by spral_ssids_enquire_indef() if posdeftrue
on the last call to spral_ssids_factor().
-
- 15
Returned by spral_ssids_factor() if options.scaling3
but a matching based ordering was not used during the call to ssids_analyse() or ssids_analyse_coord()
(i.e. was called with
options.ordering2).
-
- 50
Allocation error. If available, the Fortran stat parameter is returned in inform.stat.
-
- 51
CUDA error. The CUDA error return value is returned in inform.cuda_error.
-
- 52
CUBLAS error. The CUBLAS error return value is returned in inform.cublas_error.
A positive value of inform.flag is used to warn the user that the input matrix data may be faulty or that the
subroutine cannot guarantee the solution obtained. Possible values are:
-
- 1
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if out-of-range variable
indices found. Any such entries are ignored and the computation continues. inform.matrix_outrange
is set to the number of such entries.
-
- 2
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if duplicated indices found.
Duplicates are recorded and the corresponding entries are summed. inform.matrix_dup is set to the
number of such entries.
-
- 3
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if both out-of-range and
duplicated variable indices found.
-
- 4
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if one and more diagonal
entries of
is missing.
-
- 5
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if one and more diagonal
entries of
is missing and out-of-range and/or duplicated variable indices have been found.
-
- 6
Returned by spral_ssids_analyse() and spral_ssids_analyse_coord() if
is found be (structurally) singular. This will overwrite any of the above warnings.
-
- 7
Returned by spral_ssids_factor() if options.action is set to true and the matrix is found to be
(structurally or numerically) singular.
-
- 8
Returned by spral_ssids_factor() if options.ordering2
(i.e. a matching-based ordering was used) but the associated scaling was not (i.e. options.scaling
3).
5.7 Method
spral_ssids_analyse() and spral_ssids_analyse_coord()
If check is set to true on the call to spral_ssids_analyse() or if spral_ssids_analyse_coord() is called, the
user-supplied matrix data is checked for errors. The cleaned integer matrix data (duplicates are summed
and out-of-range indices discarded) is stored in akeep. The use of checking is optional on a call to
spral_ssids_analyse() as it incurs both time and memory overheads. However, it is recommended since the
behaviour of the other routines in the package is unpredictable if duplicates and/or out-of-range variable indices are
entered.
If the user has supplied an elimination order it is checked for errors. Otherwise, an elimination order is generated
by the package. The elimination order is used to construct an assembly tree. On exit from spral_ssids_analyse()
(and spral_ssids_analyse_coord()), order[] is set so that order[i] holds the position of variable
in the
elimination order. If an ordering was supplied by the user, this order may differ, but will be equivalent in terms of
fill-in.
spral_ssids_factor()
spral_ssids_factor() optionally computes a scaling and then performs the numerical factorization. The user must
specify whether or not the matrix is positive definite. If posdef is set to true, no pivoting is performed and the
computation will terminate with an error if a non-positive pivot is encountered.
The factorization uses the assembly tree that was set up by the analyse phase. At each node, entries from
and, if
it is not a leaf node, the generated elements and any delayed pivots from its child nodes must be assembled. Separate
kernels handle each of these.
The kernel that performs the assembly from the child nodes considers one parent-child assembly at a time. Each
generated element from a child is divided into a number of tiles, and a thread block launched to assemble each tile
into a dense submatrix using a simple mapping array to determine the destination row and column of each
entry. Bit-compatibility is achieved by ensuring the child entries are always assembled in the same
order.
A dense partial factorization of the fully summed columns is then performed. The fully summed
columns are split into a number of tiles that are each handled by an associated block. Factorization
proceeds one column of tiles at a time. The pivoting condition is chosen to ensure that all entries of
have absolute value less
than . This limits the
growth of the entries of the
factor and ensures that any solves will be backwards stable. The details are described in [1].
If a pivot candidate does not pass the pivot tests at a node, it is delayed to its parent node, where further
elimination operations may make it acceptable. Delaying pivots leads to additional fill-in and floating-point
operations beyond that predicted by spral_ssids_analyse() (and spral_ssids_analyse_coord()), and may
result in additional memory allocations being required. The number of delayed pivots can often be reduced by using
appropriate scaling.
At each non-root node, the majority of the floating-point operations involve the formation of the generated
element. This is handled by a single dedicated kernel; again, see [1] for details.
At the end of the factorization, data structures for use in future calls to spral_ssids_solve() are prepared. If options.presolve=1,
the block of
corresponding to the eliminated variables is explicitly inverted to accelerate future calls to spral_ssids_solve() at
the cost of making spral_ssids_factor() slower.
spral_ssids_solve()
If options.use_gpu_solvefalse,
data is moved to the CPU if required and the BLAS calls are used to perform a solve using the assembly tree and
factors generated on previous calls.
Otherwise, the solve is conducted on the GPU in a similar fashion. If options.presolve=0, custom GPU
implementations of _trsv() and _gemv() are used to handle multiple independent operations. If multiple right-hand
sides are to be solved for, the single right-hand side solve is looped over. If options.presolve=1, _trsv() can be
replaced by the much more parallel (and hence faster) _gemv(). In this case multiple right-hand sides are handled at
the same time.
References
[1] J.D. Hogg, E. Ovtchinnikov and J.A. Scott. (2014). A sparse symmetric indefinite direct solver for GPU
architectures. RAL Technical Report. RAL-P-2014-0xx, to appear.
5.8 Example
Suppose we wish to factorize the matrix
and then solve for the right-hand side
The following code may be used.
/* examples/C/ssids.c - Example code for SPRAL_SSIDS package */
#include "spral.h"
#include <stdlib.h>
#include <stdio.h>
void main(void) {
/* Derived types */
void *akeep, *fkeep;
struct spral_ssids_options options;
struct spral_ssids_inform inform;
/* Initialize derived types */
akeep = NULL; fkeep = NULL; /* Important that these are NULL to start with */
spral_ssids_default_options(&options);
/* Data for matrix:
* ( 2 1 )
* ( 1 4 1 1 )
* ( 1 3 2 )
* ( 2 )
* ( 1 2 ) */
bool posdef = false;
int n = 5;
int ptr[] = { 1, 3, 6, 8,8, 9 };
int row[] = { 1, 2, 2, 3, 5, 3, 4, 5 };
double val[] = { 2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, 2.0 };
/* The right-hand side with solution (1.0, 2.0, 3.0, 4.0, 5.0) */
double x[] = { 4.0, 17.0, 19.0, 6.0, 12.0 };
/* Perform analyse and factorise with data checking */
bool check = true;
spral_ssids_analyse(check, n, NULL, ptr, row, NULL, &akeep, &options,
&inform);
if(inform.flag<0) {
spral_ssids_free(&akeep, &fkeep);
exit(1);
}
spral_ssids_factor(posdef, NULL, NULL, val, NULL, akeep, &fkeep, &options,
&inform);
if(inform.flag<0) {
spral_ssids_free(&akeep, &fkeep);
exit(1);
}
/* Solve */
spral_ssids_solve1(0, x, akeep, fkeep, &options, &inform);
if(inform.flag<0) {
spral_ssids_free(&akeep, &fkeep);
exit(1);
}
printf("The computed solution is:\n");
for(int i=0; i<n; i++) printf(" %18.10e", x[i]);
printf("\n");
/* Determine and print the pivot order */
int piv_order[5];
spral_ssids_enquire_indef(akeep, fkeep, &options, &inform, piv_order, NULL);
printf("Pivot order:");
for(int i=0; i<n; i++) printf(" %5d", piv_order[i]);
printf("\n");
int cuda_error = spral_ssids_free(&akeep, &fkeep);
if(cuda_error!=0) exit(1);
}
This produces the following output:
Warning from ssids_analyse. Warning flag = 4
one or more diagonal entries is missing
The computed solution is:
1.0000000000e+00 2.0000000000e+00 3.0000000000e+00 4.0000000000e+00 5.0000000000e+00
Pivot order: 3 4 1 0 2
Funded by EPSRC grant EP/J010553/1