************************************************* SSIDS - Sparse Symmetric Indefinite Direct Solver ************************************************* .. code-block:: C #include /* or for all packages */ ======= Purpose ======= This package solves one or more sets of :math:`n\times n` sparse **symmetric** equations :math:`AX = B` using a multifrontal method. The following cases are covered: 1. :math:`A` is **indefinite**. SSIDS computes the sparse factorization .. math:: A = PLD(PL)^T where :math:`P` is a permutation matrix, :math:`L` is unit lower triangular, and :math:`D` is block diagonal with blocks of size :math:`1 \times 1` and :math:`2 \times 2`. 2. :math:`A` is **positive definite**. SSIDS computes the **sparse Cholesky factorization** .. math:: A = PL(PL)^T where :math:`P` is a permutation matrix and :math:`L` is lower triangular. *However, as SSIDS is designed primarily for indefinite systems, this may be slower than a dedicated Cholesky solver.* The code optionally supports hybrid computation using one or more NVIDIA GPUs. SSIDS returns bit-compatible results. An option exists to scale the matrix. In this case, the factorization of the scaled matrix :math:`\overline{A} = S A S` is computed, where :math:`S` is a diagonal scaling matrix. Version history --------------- [For detail, see ChangeLog] 2016-09-23 Version 2.0.0 * Add support for CPU and hybrid execution. * Add support for 64-bit ptr(:) on input matrix :math:`A`. * options%presolve removed. 2014-03-17 Version 1.0.0 * Initial release: GPU only ============== Usage overview ============== Solving :math:`AX=B` using SSIDS is a four stage process. 1. Call :c:func:`spral_ssids_analyse()` to perform a symbolic factorization, stored in `akeep`. 2. Call :c:func:`spral_ssids_factor()` to perform a numeric factorization, stored in `fkeep`. More than one numeric factorization can refer to the same `akeep`. 3. Call :c:func:`spral_ssids_solve1()` or :c:func:`spral_ssids_solve()` to perform a solve with the factors. More than one solve can be performed with the same `fkeep`. 4. Once all desired solutions have been performed, free memory with :c:func:`spral_ssids_free()`. In addition, advanced users may use the following routines: * :c:func:`spral_ssids_enquire_posdef()` and :c:func:`spral_ssids_enquire_indef()` return the diagonal entries of the factors and the pivot sequence. * :c:func:`spral_ssids_alter()` allows altering the diagonal entries of the factors. .. note:: **Bit-compatibility:** If used with bit-compatible BLAS and compiler options, this routine will return bit compatible results. That is, consecutive runs with the same data on the same machine produces exactly the same solution. ================= Basic Subroutines ================= .. note:: For the most efficient use of the package, CSC format should be used without checking. .. c:function:: void spral_ssids_default_options(struct spral_ssids_options *options) Intialises members of options structure to default values. :param options: Structure to be initialised. .. c:function:: void spral_ssids_analyse(bool check, int n, int *order, const long *ptr, const int *row, const double *val, void **akeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) Perform the analyse (symbolic) phase of the factorization for a matrix supplied in :doc:`CSC format`. The resulting symbolic factors stored in :c:type:`akeep` should be passed unaltered in the following call to :c:func:`spral_ssids_factor()`. :param check: if true, matrix data is checked. Out-of-range entries are dropped and duplicate entries are summed. :param n: number of columns in :math:`A`. :param order: may be `NULL`; otherwise must be an array of size `n` used on entry a user-supplied ordering (:c:type:`options.ordering=0 `). On return, the actual ordering used. :param ptr[n+1]: column pointers for :math:`A` (see :doc:`CSC format`). :param row[ptr[n]]: row indices for :math:`A` (see :doc:`CSC format`). :param val: may be `NULL`; otherwise must be an array of size `ptr[n]` containing non-zero values for :math:`A` (see :doc:`CSC format`). Only used if a matching-based ordering is requested. :param akeep: returns symbolic factorization, to be passed unchanged to subsequent routines. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). .. note:: If a user-supplied ordering is used, it may be altered by this routine, with the altered version returned in `order[]`. This version will be equivalent to the original ordering, except that some supernodes may have been amalgamated, a topographic ordering may have been applied to the assembly tree and the order of columns within a supernode may have been adjusted to improve cache locality. .. c:function:: void spral_ssids_analyse_ptr32(bool check, int n, int *order, const int *ptr, const int *row, const double *val, void **akeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) As :c:func:`spral_ssids_analyse()`, except ptr has type ``int``. Provided for backwards comptability, users are encourage to use 64-bit ptr in new code. .. c:function:: void spral_ssids_analyse_coord(int n, int *order, long ne, const int *row, const int *col, const double *val, void **akeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) As :c:func:`spral_ssids_analyse()`, but for coordinate data. The variant parameters are: :param ne: number of non-zero entries in :math:`A`. :param row[ne]: row indices for :math:`A` (see :doc:`Coordinate format`). :param col[ne]: column indices for :math:`A` (see :doc:`Coordinate format`). .. c:function:: void spral_ssids_factor(bool posdef, const long *ptr, const int *row, const double *val, double *scale, void *akeep, void **fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) :param posdef: true if matrix is positive-definite :param ptr: may be `NULL`; otherwise a length `n+1` array of column pointers for :math:`A`, only required if :f:type:`akeep` was obtained by running :c:func:`spral_ssids_analyse()` with `check=true`, in which case it must be unchanged since that call. :param row: may be `NULL`; otherwise a length `ptr[n]` array of row indices for :math:`A`, only required if :f:type:`akeep` was obtained by running :c:func:`spral_ssids_analyse()` with `check=true`, in which case it must be unchanged since that call. :param val[]: non-zero values for :math:`A` in same format as for the call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param scale: may be `NULL`; otherwise a length `n` array for diagonal scaling. `scale[i-1]` contains entry :math:`S_ii` of :math:`S`. Must be supplied by user on entry if :c:member:`options.scaling=0 ` (user-supplied scaling). On exit, returns scaling used. :param akeep: symbolic factorization returned by preceding call to :c:func:`ssids_analyse()` or :c:func:`ssids_analyse_coord()`. :param fkeep: returns numeric factorization, to be passed unchanged to subsequent routines. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). .. c:function:: void spral_ssids_solve1(int job, double *x1, void *akeep, void *fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) Solve (for a single right-hand side) one of the following equations: +---------------+--------------------------+ | `job` | Equation solved | +===============+==========================+ | 0 | :math:`Ax=b` | +---------------+--------------------------+ | 1 | :math:`PLx=Sb` | +---------------+--------------------------+ | 2 | :math:`Dx=b` | +---------------+--------------------------+ | 3 | :math:`(PL)^TS^{-1}x=b` | +---------------+--------------------------+ | 4 | :math:`D(PL)^TS^{-1}x=b` | +---------------+--------------------------+ Recall :math:`A` has been factorized as either: * :math:`SAS = (PL)(PL)^T~` (positive-definite case); or * :math:`SAS = (PL)D(PL)^T` (indefinite case). :param job: specifies equation to solve, as per above table. :param x1[n]: right-hand side :math:`b` on entry, solution :math:`x` on exit. :param akeep: symbolic factorization returned by preceding call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param fkeep: numeric factorization returned by preceding call to :c:func:`spral_ssids_factor()`. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). .. c:function:: void spral_ssids_solve(int job, int nrhs, double *x, int ldx, void *akeep, void *fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) Solve (for multiple right-hand sides) one of the following equations: +---------------+--------------------------+ | `job` | Equation solved | +===============+==========================+ | 0 | :math:`AX=B` | +---------------+--------------------------+ | 1 | :math:`PLX=SB` | +---------------+--------------------------+ | 2 | :math:`DX=B` | +---------------+--------------------------+ | 3 | :math:`(PL)^TS^{-1}X=B` | +---------------+--------------------------+ | 4 | :math:`D(PL)^TS^{-1}X=B` | +---------------+--------------------------+ Recall :math:`A` has been factorized as either: * :math:`SAS = (PL)(PL)^T~` (positive-definite case); or * :math:`SAS = (PL)D(PL)^T` (indefinite case). :param job: specifies equation to solve, as per above table. :param nrhs: number of right-hand sides. :param x[ldx*nrhs]: right-hand sides :math:`B` on entry, solutions :math:`X` on exit. The `i`-th entry of right-hand side `j` is in position `x[j*ldx+i]`. :param ldx: leading dimension of `x`. :param akeep: symbolic factorization returned by preceding call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param fkeep: numeric factorization returned by preceding call to :c:func:`spral_ssids_factor()`. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). .. c:function:: int spral_ssids_free_akeep(void **akeep) Frees memory and resources associated with :c:type:`akeep`. :param akeep: symbolic factors to be freed. :returns: 0 on success, or a CUDA error code on failure. .. c:function:: int spral_ssids_free_fkeep(void **fkeep) Frees memory and resources associated with :c:type:`fkeep`. :param fkeep: numeric factors to be freed. :returns: 0 on success, or a CUDA error code on failure. .. c:function:: int spral_ssids_free(void **akeep, void **fkeep) Frees memory and resources associated with :c:type:`akeep` and :c:type:`fkeep`. :param akeep: symbolic factors to be freed. :param fkeep: numeric factors to be freed. :returns: 0 on success, or a CUDA error code on failure. .. warning:: The free routine(s) must be called by the user. Merely deallocating :c:type:`akeep` or :c:type:`fkeep`, or allowing them to go out of scope will result in memory leaks. :c:type:`akeep` should only be deallocated after all associated numeric factorizations :c:type:`fkeep` have been freed. ==================== Advanced subroutines ==================== .. c:function:: void spral_ssids_enquire_posdef(const void *akeep, const void *fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform, double *d) Return the diagonal entries of the Cholesky factor. :param akeep: symbolic factorization returned by preceding call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param fkeep: numeric factorization returned by preceding call to :c:func:`spral_ssids_factor()`. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). :param d[n]: returns the diagonal of :math:`L`. d[i-1] stores the entry :math:`L_{ii}`. .. c:function:: void spral_ssids_enquire_indef(const void *akeep, const void *fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform, int *piv_order, double *d) Return the pivot order and/or values of :math:`D` of the Symmetric Indefinite Factorization. :param akeep: symbolic factorization returned by preceding call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param fkeep: numeric factorization returned by preceding call to :c:func:`spral_ssids_factor()`. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). :param piv_order: may be `NULL`; otherwise a length `n` array for the pivot order. On return, :math:`|\,\texttt{piv_order[i]}|` gives the position of variable :math:`i` in the pivot order. :param d: may be `NULL`; otherwise a length `2*n` array for the :math:`2\times2` block diagonal of :math:`D`. `d[2*(i-1)+0]` stores :math:`D_{ii}` and `d[2*(i-1)+1]` stores :math:`D_{(i+1)i}`. .. c:function:: void spral_ssids_alter(const double *d, const void *akeep, void *fkeep, const struct spral_ssids_options *options, struct spral_ssids_inform *inform) Alter the entries of the diagonal factor :math:`D` for a symmetric indefinite factorization. The pivot order remains the same. :param d[2*n]: New entries of :math:`D`. `d[2*(i-1)+0]` stores :math:`D_{ii}` and `d[2*(i-1)+1]` stores :math:`D_{(i+1)i}`. :param akeep: symbolic factorization returned by preceding call to :c:func:`spral_ssids_analyse()` or :c:func:`spral_ssids_analyse_coord()`. :param fkeep: numeric factorization returned by preceding call to :c:func:`spral_ssids_factor()`. :param options: specifies algorithm options to be used (see :c:type:`spral_ssids_options`). :param inform: returns information about the execution of the routine (see :c:type:`spral_ssids_inform`). **Note:** This routine is not compatabile with the option :c:member:`options.presolve=1 `. ============= Derived types ============= .. c:type:: struct spral_ssids_options The derived data type ssids\_options is used to specify the options used within ``SSIDS``. The components, that are automatically given default values in the definition of the type, are: .. c:member:: int array_base Indexing base for arrays. Either 0 (C indexing) or 1 (Fortran indexing). Default is 0. .. c:member:: int print_level Level of printing: +---------------+-------------------------------------------------+ | < 0 | No printing. | +---------------+-------------------------------------------------+ | = 0 (default) | Error and warning messages only. | +---------------+-------------------------------------------------+ | = 1 | As 0, plus basic diagnostic printing. | +---------------+-------------------------------------------------+ | > 1 | As 1, plus some additional diagnostic printing. | +---------------+-------------------------------------------------+ The default is 0. .. c:member:: int unit_diagnostics Fortran unit number for diagnostics printing. Printing is suppressed if <0. The default is 6 (stdout). .. c:member:: int unit_error Fortran unit number for printing of error messages. Printing is suppressed if <0. The default is 6 (stdout). .. c:member:: int unit_warning Fortran unit number for printing of warning messages. Printing is suppressed if <0. The default is 6 (stdout). .. c:member:: int ordering Ordering method to use in analyse phase: +-------------+---------------------------------------------------------+ | 0 | User-supplied ordering is used (`order` argument to | | | :c:func:`spral_ssids_analyse()` or | | | :c:func:`spral_ssids_analyse_coord()`). | +-------------+---------------------------------------------------------+ | 1 (default) | METIS ordering with default settings. | +-------------+---------------------------------------------------------+ | 2 | Matching-based elimination ordering is computed (the | | | Hungarian algorithm is used to identify large | | | off-diagonal entries. A restricted METIS ordering is | | | then used that forces these on to the subdiagonal). | | | | | | **Note:** This option should only be chosen for | | | indefinite systems. A scaling is also computed that may | | | be used in :c:func:`spral_ssids_factor()` (see | | | :c:member:`scaling ` | | | below). | +-------------+---------------------------------------------------------+ The default is 1. .. c:member:: int nemin Supernode amalgamation threshold. Two neighbours in the elimination tree are merged if they both involve fewer than `nemin` eliminations. The default is used if `nemin<1`. The default is 8. .. c:member bool ignore_numa: If true, all CPUs and GPUs are treated as belonging to a single NUMA region. Default is `true`. .. c:member bool use_gpu Use an NVIDIA GPU if present. Default is `true`. .. c:memeber long min_gpu_work Minimum number of flops in subtree before scheduling on GPU. Default is `5e9`. .. c:member: float max_load_inbalance Maximum permissiable load inbalance for leaf subtree allocations. Values less than 1.0 are treated as 1.0. Default is `1.2`. .. c:member:: float gpu_perf_coeff GPU perfromance coefficient. How many times faster a GPU is than CPU at factoring a subtree. Default is `1.0`. .. c:member:: int scaling Scaling algorithm to use: +---------------+-------------------------------------------------------+ | <=0 (default) | No scaling (if ``scale[]`` is not present on call to | | | :c:func:`spral_ssids_factor()`, or user-supplied | | | scaling (if ``scale[]`` is present). | +---------------+-------------------------------------------------------+ | =1 | Compute using weighted bipartite matching via the | | | Hungarian Algorithm (MC64 algorithm). | +---------------+-------------------------------------------------------+ | =2 | Compute using a weighted bipartite matching via the | | | Auction Algorithm (may be lower quality than that | | | computed using the Hungarian Algorithm, but can be | | | considerably faster). | +---------------+-------------------------------------------------------+ | =3 | Use matching-based ordering generated during the | | | analyse phase using :c:member:`options.ordering=2 | | | `. The scaling | | | will be the same as that generated with | | | :c:member:`options.scaling=1 | | | ` | | | if the matrix values have not changed. This option | | | will generate an error if a matching-based ordering | | | was not used during analysis. | +---------------+-------------------------------------------------------+ | >=4 | Compute using the norm-equilibration algorithm of | | | Ruiz (see :doc:`scaling`). | +---------------+-------------------------------------------------------+ The default is 0. .. c:member:: long small_subtree_threshold Maximum number of flops in a subtree treated as a single task. See :ref:`method section `. The default is `4e6`. .. c:member:: int cpu_block_size Block size to use for parallelization of large nodes on CPU resources. Default is `256`. .. c:member:: bool action Continue factorization of singular matrix on discovery of zero pivot if true (a warning is issued), or abort if false. The default is true. .. c:member:: int pivot_method Pivot method to be used on CPU, one of: +-------------+----------------------------------------------------------+ | 0 | Aggressive a posteori pivoting. Cholesky-like | | | communication pattern is used, but a single failed pivot | | | requires restart of node factorization and potential | | | recalculation of all uneliminated entries. | +-------------+----------------------------------------------------------+ | 1 (default) | Block a posteori pivoting. A failed pivot only requires | | | recalculation of entries within its own block column. | +-------------+----------------------------------------------------------+ | 2 | Threshold partial pivoting. Not parallel. | +-------------+----------------------------------------------------------+ Default is `1`. .. c:member:: double small Threshold below which an entry is treated as equivalent to `0.0`. The default is `1e-20`. .. c:member:: double u Relative pivot threshold used in symmetric indefinite case. Values outside of the range :math:`[0,0.5]` are treated as the closest value in that range. The default is `0.01`. .. c:type:: struct spral_ssids_inform Used to return information about the progress and needs of the algorithm. .. c:member:: long cpu_flops Number of flops performed on CPU .. c:member:: int cublas_error CUBLAS error code in the event of a CUBLAS error (0 otherwise). .. c:member:: int cuda_error CUDA error code in the event of a CUDA error (0 otherwise). Note that due to asynchronous execution, CUDA errors may not be reported by the call that caused them. .. c:member:: int flag Exit status of the algorithm (see table below). .. c:member:: long gpu_flops Number of flops performed on GPU .. c:member:: int matrix_dup Number of duplicate entries encountered (if :c:func:`spral_ssids_analyse()` called with check=true, or any call to :c:func:`spral_ssids_analyse_coord()`). .. c:member:: int matrix_missing_diag Number of diagonal entries without an explicit value (if :c:func:`spral_ssids_analyse()` called with check=true, or any call to :c:func:`spral_ssids_analyse_coord()`). .. c:member:: int matrix_outrange Number of out-of-range entries encountered (if :c:func:`spral_ssids_analyse()` called with check=true, or any call to :c:func:`spral_ssids_analyse_coord()`). .. c:member:: int matrix_rank (Estimated) rank (structural after analyse phase, numerical after factorize phase). .. c:member:: int maxdepth Maximum depth of the assembly tree. .. c:member:: int maxfront Maximum front size (without pivoting after analyse phase, with pivoting after factorize phase). .. c:member:: int num_delay Number of delayed pivots. That is, the total number of fully-summed variables that were passed to the father node because of stability considerations. If a variable is passed further up the tree, it will be counted again. .. c:member:: long num_factor Number of entries in :math:`L` (without pivoting after analyse phase, with pivoting after factorize phase). .. c:member:: long num_flops Number of floating-point operations for Cholesky factorization (indefinte needs slightly more). Without pivoting after analyse phase, with pivoting after factorize phase. .. c:member:: int num_neg Number of negative eigenvalues of the matrix :math:`D` after factorize phase. .. c:member:: int num_sup Number of supernodes in assembly tree. .. c:member:: int num_two Number of :math:`2 \times 2` pivots used by the factorization (i.e. in the matrix :math:`D`). .. c:member:: int stat Fortran allocation status parameter in event of allocation error (0 otherwise). +-------------+-------------------------------------------------------------+ | inform.flag | Return status | +=============+=============================================================+ | 0 | Success. | +-------------+-------------------------------------------------------------+ | -1 | Error in sequence of calls (may be caused by failure of a | | | preceding call). | +-------------+-------------------------------------------------------------+ | -2 | n<0 or ne<1. | +-------------+-------------------------------------------------------------+ | -3 | Error in ptr[]. | +-------------+-------------------------------------------------------------+ | -4 | CSC format: All variable indices in one or more columns are | | | out-of-range. | | | | | | Coordinate format: All entries are out-of-range. | +-------------+-------------------------------------------------------------+ | -5 | Matrix is singular and options.action=false | +-------------+-------------------------------------------------------------+ | -6 | Matrix found not to be positive definite. | +-------------+-------------------------------------------------------------+ | -7 | ptr[] and/or row[] not present, but required as | | | :c:func:`spral_ssids_analyse()` was called with check=false.| +-------------+-------------------------------------------------------------+ | -8 | options.ordering out of range, or options.ordering=0 and | | | order parameter not provided or not a valid permutation. | +-------------+-------------------------------------------------------------+ | -9 | options.ordering=-2 but val[] was not supplied. | +-------------+-------------------------------------------------------------+ | -10 | ldx`, that subtree is treated as a single task. References ---------- .. [1] J.D. Hogg, E. Ovtchinnikov and J.A. Scott. (2014). *A sparse symmetric indefinite direct solver for GPU architectures*. ACM Transactions on Mathematical Software 42(1), Article 1, 25 pages. [`DOI: 10.1145/2756548 `_] [`Preprint RAL-P-2014-006 `_] .. [2] J.D. Hogg. (2016). *A new sparse LDLT solver using a posteriori threshold pivoting*. RAL Technical Report. RAL-TR-2016-0xx, to appear.