The Solver in MathKeisan is for the direct solution of sparse linear systems , where the matrix is symmetric. It factorizes and then calculates by forward and back triangular solution. Here, is sparse unit lower triangular, is diagonal, and denotes transpose. is a permutation matrix constructed to reduce fill-in in the sparse factor . It is calculated by either the Nested Dissection method, or the Multiple Minimum Degree method. There is no pivoting for numerical stability during the factorization, but there is a subroutine which may improve the accuracy of the solution by iterative refinement.
User interface information is available from several sources:
Solving a sparse linear system involves three major stages:
ldlt_preprocess ( token, n, colptr, rowind, Lnnz, order )
During this stage, a permutation is constructed.
Given the original sparse matrix and a permutation , the nonzero structure of the factor is determined through symbolic factorization.
The preprocessing stage also performs dynamic memory allocation, to facilitate numerical factorization and triangular solves performed in subsequent calculations.
ldlt_fact ( token, colptr, rowind, nzvals, flops )
A unit lower triangular matrix and a diagonal matrix are constructed, such that . This is the most computationally intensive stage.
ldlt_solve ( token, x, rhs, flops )
Once the matrix has been factored, and a right-hand side provided, the linear system is solved as follows:
Three other routines:
ldlt_getdiag ( token, diag )
ldlt_refine (token, colptr, rowind, nzvals, rhs, x, newx, flops )
ldlt_free ( token )
The MathKeisan Solver uses either Multiple
Minimum Degree (MMD)  or
(ND)  matrix ordering.
The selection is made by passing an appropriate integer argument
order to the
order = 0, a modified MMD routine is used. It extracts
supernode information during the ordering process. A supernode is
a group of columns in the factor L which share the same sparsity
structure. The matrices from physical systems will generally yield a factor
rich in supernodes. The supernode
information allows the factorization to be carried out efficiently.
order = 1, the original MMD algorithm is used.
order = 2 or
order = 3, the
node-based or edge-based ND algorithm is used, respectively.
In general, the node-based algorithm (
order = 2)
provides a better ordering. The time to calculate the ND ordering
will generally be larger than that to calculate the MMD ordering,
but the ND ordering will generally lead to smaller fill-in, with
total time for order + factorization + solve faster than
that using MMD.
order = 1,
supernode information is collected during ordering. A set of routines
Liu, Ng, and Peyton 
are used to identify supernodes and estimate storage requirements for
after matrix ordering has been performed.
The factorization can be organized in several different ways, depending on the pattern by which data is accessed and modified. The most popular ones are left-looking , right-looking , and multifrontal . A comparison of these different implementations is reported in Rothberg . In MathKeisan, a left-looking algorithm is used.
In dense linear algebra libraries like LAPACK, algorithms are blocked for optimum performance. This allows for calls to level 2 and 3 BLAS, with their efficient reuse of data. The MathKeisan Solver uses supernodes. This gives blocked algorithms for the sparse systems, with data reuse similar to the dense case, and calls to level 2 and 3 BLAS. The use of supernodes also reduces indirect addressing.
Typically, numerical factorization is the most time consuming stage in solving a sparse linear system. The MathKeisan Solver allows the numerical factorization to be done in parallel with OpenMP.
If the environment variable
OMP_NUM_THREADS is set to
np, then the factorization will run on
np threads. If
OMP_NUM_THREADS is not
set, it will run on
mp threads, where
mp is the maximum number of processes in the resource
group or node.
During the factorization, there is no pivoting for numerical stability. If the matrix is well-conditioned or symmetric positive definite (SPD), the solution will be stable. If the matrix is indefinite or ill-conditioned, the accuracy of the solution may be improved using iterative refinement.
Let the calculated solution be given by , where is the true solution, and is the error. Then , where is the residual given by . The error can be calculated by solving .
The iterative refinement subroutine has three steps:
|Reorder matrix and allocate memory for |
factorization using reordering from |
|Solve using factorization from |
|Return diagonal of factorization from |
|Free memory allocated in |