The Solver in MathKeisan is for the direct solution of
sparse linear systems
,
where the matrix
is symmetric.
It factorizes
and then calculates
by forward and
back triangular solution. Here,
is sparse unit lower triangular,
is diagonal, and
denotes transpose.
is a permutation matrix constructed
to reduce fill-in in the sparse factor
.
It is calculated by either the Nested Dissection method, or the Multiple
Minimum Degree method. There is no pivoting for numerical stability during
the factorization, but there is a subroutine which may improve the accuracy
of the solution by iterative refinement.
User interface information is available from several sources:
Solving a sparse linear system
involves three major stages:
ldlt_preprocess ( token, n, colptr, rowind, Lnnz, order )
During this stage, a permutation
is constructed.
Given the original sparse matrix and a permutation
, the nonzero structure of the factor
is determined through
symbolic factorization.
The preprocessing stage also performs dynamic memory allocation, to facilitate numerical factorization and triangular solves performed in subsequent calculations.
ldlt_fact ( token, colptr, rowind, nzvals, flops )
A unit lower triangular matrix
and a diagonal matrix
are
constructed, such that
.
This is the most computationally intensive stage.
ldlt_solve ( token, x, rhs, flops )
Once the matrix has been factored, and a right-hand side
provided, the linear system is solved as follows:
| Solve | |
| Set | |
| Solve | |
| Set | |
Three other routines:
ldlt_getdiag ( token, diag )
ldlt_refine (token, colptr, rowind, nzvals, rhs, x, newx, flops )
ldlt_free ( token )
The MathKeisan Solver uses either Multiple
Minimum Degree (MMD) [4] or
Nested Dissection
(ND) [3] matrix ordering.
The selection is made by passing an appropriate integer argument
order to the ldlt_preprocess subroutine.
When order = 0, a modified MMD routine is used. It extracts
supernode information during the ordering process. A supernode is
a group of columns in the factor L which share the same sparsity
structure. The matrices from physical systems will generally yield a factor
rich in supernodes. The supernode
information allows the factorization to be carried out efficiently.
When order = 1, the original MMD algorithm is used.
When order = 2 or order = 3, the
node-based or edge-based ND algorithm is used, respectively.
In general, the node-based algorithm (order = 2)
provides a better ordering. The time to calculate the ND ordering
will generally be larger than that to calculate the MMD ordering,
but the ND ordering will generally lead to smaller fill-in, with
total time for order + factorization + solve faster than
that using MMD.
When order = 1, 2, or 3, no
supernode information is collected during ordering. A set of routines
developed by
Liu, Ng, and Peyton [6]
are used to identify supernodes and estimate storage requirements for
after matrix ordering has been performed.
The
factorization can be organized in
several different ways, depending on the pattern by which data is accessed
and modified. The most popular ones are
left-looking [7],
right-looking [8], and
multifrontal [5].
A comparison of these different implementations is reported in
Rothberg [8].
In MathKeisan, a left-looking algorithm is used.
In dense linear algebra libraries like LAPACK, algorithms are blocked for optimum performance. This allows for calls to level 2 and 3 BLAS, with their efficient reuse of data. The MathKeisan Solver uses supernodes. This gives blocked algorithms for the sparse systems, with data reuse similar to the dense case, and calls to level 2 and 3 BLAS. The use of supernodes also reduces indirect addressing.
Typically, numerical factorization is the most time consuming stage in solving a sparse linear system. The MathKeisan Solver allows the numerical factorization to be done in parallel with OpenMP.
If the environment variable OMP_NUM_THREADS is set to
np, then the factorization will run on
np threads. If OMP_NUM_THREADS is not
set, it will run on mp threads, where
mp is the maximum number of processes in the resource
group or node.
During the factorization, there is no pivoting for numerical stability. If the matrix is well-conditioned or symmetric positive definite (SPD), the solution will be stable. If the matrix is indefinite or ill-conditioned, the accuracy of the solution may be improved using iterative refinement.
Let the calculated solution
be given by
,
where
is the true solution, and
is the error. Then
, where
is the residual given by
.
The error
can be calculated
by solving
.
The iterative refinement subroutine has three steps:
| Name | Description |
|---|---|
ldlt_preprocess
| Reorder matrix and allocate memory for ldlt_fact.
|
ldlt_fact
| Calculate ldlt_preprocess.
|
ldlt_solve
| Solve using factorization from ldlt_fact.
|
ldlt_refine
| Iterative refinement. |
ldlt_getdiag
| Return diagonal of factorization from ldlt_fact.
|
ldlt_free
| Free memory allocated in ldlt_preprocess.
|