NEC

MathKeisan User's Guide


Solver

Introduction

The Solver in MathKeisan is for the direct solution of sparse linear systems $ Ax=b $, where the matrix $ A $ is symmetric. It factorizes $ LDL^T \leftarrow PAP^T $ and then calculates $ x $ by forward and back triangular solution. Here, $ L $ is sparse unit lower triangular, $ D $ is diagonal, and $ ^T $ denotes transpose. $ P $ is a permutation matrix constructed to reduce fill-in in the sparse factor $ L $. It is calculated by either the Nested Dissection method, or the Multiple Minimum Degree method. There is no pivoting for numerical stability during the factorization, but there is a subroutine which may improve the accuracy of the solution by iterative refinement.

User Interface

User interface information is available from several sources:

Solver Stages

Solving a sparse linear system $ Ax=b $ involves three major stages:

1. Preprocessing

ldlt_preprocess ( token, n, colptr, rowind, Lnnz, order )

During this stage, a permutation $ P $ is constructed.

Given the original sparse matrix and a permutation $ P $, the nonzero structure of the factor $ L $ is determined through symbolic factorization.

The preprocessing stage also performs dynamic memory allocation, to facilitate numerical factorization and triangular solves performed in subsequent calculations.

2. Numerical Factorization

ldlt_fact ( token, colptr, rowind, nzvals, flops )

A unit lower triangular matrix $ L $ and a diagonal matrix $ D $ are constructed, such that $ PAP^T = LDL^T $. This is the most computationally intensive stage.

3. Triangular Solve

ldlt_solve ( token, x, rhs, flops )

Once the matrix has been factored, and a right-hand side $ b $ provided, the linear system is solved as follows:

Solve $ Ly=Pb $ for $ y $.
Set $ y \leftarrow D^{-1}y $.
Solve $ Lx=y $ for $ x $.
Set $ x \leftarrow P^{-T}x $.

Auxilliary Subroutines

Three other routines:

ldlt_getdiag ( token, diag )
ldlt_refine (token, colptr, rowind, nzvals, rhs, x, newx, flops )
ldlt_free ( token )
are provided to extract the diagonal $ D $, refine the solution, and free up all working storage associated with the linear system being solved.

Matrix ordering

The MathKeisan Solver uses either Multiple Minimum Degree (MMD) [4] or Nested Dissection (ND) [3] matrix ordering. The selection is made by passing an appropriate integer argument order to the ldlt_preprocess subroutine.

When order = 0, a modified MMD routine is used. It extracts supernode information during the ordering process. A supernode is a group of columns in the factor L which share the same sparsity structure. The matrices from physical systems will generally yield a factor $ L $ rich in supernodes. The supernode information allows the factorization to be carried out efficiently.

When order = 1, the original MMD algorithm is used.

When order = 2 or order = 3, the node-based or edge-based ND algorithm is used, respectively. In general, the node-based algorithm (order = 2) provides a better ordering. The time to calculate the ND ordering will generally be larger than that to calculate the MMD ordering, but the ND ordering will generally lead to smaller fill-in, with total time for order + factorization + solve faster than that using MMD.

When order = 1, 2, or 3, no supernode information is collected during ordering. A set of routines developed by Liu, Ng, and Peyton [6] are used to identify supernodes and estimate storage requirements for $ L $ after matrix ordering has been performed.

Factorization Algorithm

The $ LDL^T $ factorization can be organized in several different ways, depending on the pattern by which data is accessed and modified. The most popular ones are left-looking [7], right-looking [8], and multifrontal [5]. A comparison of these different implementations is reported in Rothberg [8]. In MathKeisan, a left-looking algorithm is used.

In dense linear algebra libraries like LAPACK, algorithms are blocked for optimum performance. This allows for calls to level 2 and 3 BLAS, with their efficient reuse of data. The MathKeisan Solver uses supernodes. This gives blocked algorithms for the sparse systems, with data reuse similar to the dense case, and calls to level 2 and 3 BLAS. The use of supernodes also reduces indirect addressing.

Parallel Factorization

Typically, numerical factorization is the most time consuming stage in solving a sparse linear system. The MathKeisan Solver allows the numerical factorization to be done in parallel with OpenMP.

If the environment variable OMP_NUM_THREADS is set to np, then the factorization will run on np threads. If OMP_NUM_THREADS is not set, it will run on mp threads, where mp is the maximum number of processes in the resource group or node.

Iterative Refinement

During the factorization, there is no pivoting for numerical stability. If the matrix is well-conditioned or symmetric positive definite (SPD), the solution will be stable. If the matrix is indefinite or ill-conditioned, the accuracy of the solution may be improved using iterative refinement.

Let the calculated solution $ x $ be given by $ x=\hat{x}+\delta x $, where $ \hat{x} $ is the true solution, and $ \delta x $ is the error. Then $ A(\hat{x}+\delta x)=b+\delta b $, where $ \delta b $ is the residual given by $ \delta b=Ax-b $. The error $ \delta x $ can be calculated by solving $ A \delta x = \delta b $.

The iterative refinement subroutine has three steps:

  1. Calculate the residual $ \delta b=Ax-b $. In some implementations, this calculation is performed in extended precision. In MathKeisan, no extended precision is used.
  2. Calculate the error $ \delta x $ by solving $ A \delta x = \delta b $.
  3. Calculate the refined solution $ x \leftarrow x + \delta x $.

Solver Routine List

Name Description
ldlt_preprocess Reorder matrix and allocate memory for ldlt_fact.
ldlt_fact Calculate $ LDL^T $ factorization using reordering from ldlt_preprocess.
ldlt_solve Solve using factorization from ldlt_fact.
ldlt_refine Iterative refinement.
ldlt_getdiag Return diagonal of factorization from ldlt_fact.
ldlt_free Free memory allocated in ldlt_preprocess.

Further Reading

  1. J. W. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, 1997.
  2. N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996.
  3. G. Karypis and V. Kumar. "A fast and high quality multilevel scheme for partitioning irregular graphs." SIAM Journal on Scientific Computing, 20(1):359-392, 1999.
  4. J. W. Liu. "Modification of the minimum-degree algorithm by multiple elimination." ACM Transactions on Mathematical Software, 11(2):141-153, 1985.
  5. J. W. Liu. "The multifrontal method for sparse matrix solution: Theory and practice." SIAM Review, 34(1):82-109, 1992.
  6. J. W. Liu, E. Ng, and B. Peyton. "On finding supernodes for sparse matrix computations." SIAM J. Matrix Ahal. Apply., 14:242-252, 1993.
  7. E. G. Ng and B. W. Peyton. "Block sparse Cholesky algorithms on advanced uniprocessor computers." SIAM J. SCI. COMPUT., 14:1034-1056, 1993.
  8. E. Rothberg and A. Gupta. "An evaluation of left-looking, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical-memory machines." Int. J. High Speed Computing, 5:537--593, 1993.