January, 2004
NEC Corporation
The aim of MathKeisan is to provide a highly tuned and well-tested collection of Math libraries for NEC high performance computers. This version is for the NEC SX-5, SX-6, and SX-7 vector computers. See www.mathkeisan.com for other versions for NEC's Itanium® Processor Family servers. Unless noted otherwise, all references to MathKeisan in these release notes are to MathKeisan for SX-5, SX-6, SX-7.
The libraries in MathKeisan are listed in Table 1.
Table 1: Libraries in MathKeisan
|
name |
description |
|---|---|
|
BLAS |
Basic Linear Algebra Subprograms |
|
LAPACK |
Linear algebra for high performance computers |
|
ScaLAPACK |
Scalable Linear Algebra package (contains PBLAS) |
|
BLACS |
Basic Linear Algebra Communication Subprograms |
|
PARBLAS |
Shared memory Parallel BLAS |
|
CBLAS |
C interface to BLAS |
|
SBLAS |
Sparse BLAS |
|
FFT |
FFT's with HP's VECLIB interface and CRAY LIBSCI 3.1 interface |
|
PARFFT |
Parallel FFT's with HP's VECLIB interface and CRAY LIBSCI 3.1 interface |
|
METIS |
Matrix/Graph ordering and partitioning library |
|
ParMETIS |
Parallel Matrix/Graph ordering and partition library |
|
SOLVER |
Direct solver for sparse symmetric systems |
|
ARPACK |
Solution of large scale eigenvalue problems |
For a list of machines and SUPER/UX revisions compatible with MathKeisan 1.5.0 please follow the "Compatibility" link at www.mathkeisan.com. The following compilers were used to build the MathKeisan libraries
| Fortran | f90 for SX, Rev.267 |
| C | C++/SX, Rev.061 |
| MPI | MPI/SX r121 |
Table 2: FFT subroutines with increased size of the
trigonometric and factorization data argument array
| scfft2d | dzfft2d |
| scfft3d | dzfft3d |
| scfft | dzfft |
| scfftm | dzfftm |
| c1dfft | z1dfft |
| ccfft2d | zzfft2d |
| ccfft3d | zzfft3d |
| ccfft | zzfft |
| ccfftm | zzfftm |
| cfft2d | zfft2d |
| cfft2 | zfft2 |
| cfft3d | zfft3d |
| cfft | zfft |
| crc1ft | zrc1ft |
| src1ft | drc1ft |
| s1dfft | d1dfft |
| src1ft | drc1f |
If you are using the F90 flag -dw (the default), load with the libraries in Table 3. If you are using the f90 flag -ew, load with the libraries in Table 4. Load libraries in the order given, or use the ld flag -h lib_cyclic. Table 5 has $(LIBDIR) for size_t32 or size_t64 libraries. $(LIBDIR) is given for self and cross compile machines, and these are default locations. If libraries are not in these default locations, ask your system administrator where they are.
Table 3: Loading for -dw
|
name |
load libraries (see Table 5 for $(LIBDIR)) |
|---|---|
|
BLAS |
-L$(LIBDIR) -lblas |
|
LAPACK |
-L$(LIBDIR) -llapack -lblas |
|
ScaLAPACK |
-L$(LIBDIR) -lscalapack -lblacsF90init -lblacs -lblacsF90init -lblas -lmpi |
|
BLACS |
-L$(LIBDIR) -lblacsF90init -lblacs -lblacsF90init -lmpi |
|
PARBLAS |
-L$(LIBDIR) -lparblas -Popenmp |
|
CBLAS |
-L$(LIBDIR) -lcblas -lblas |
|
SBLAS |
-L$(LIBDIR) -lsblas |
|
FFT |
-L$(LIBDIR) -lfft |
|
PARFFT |
-L$(LIBDIR) -lparfft -Popenmp |
|
METIS |
-L$(LIBDIR) -lmetis_32 |
|
|
-L$(LIBDIR) -lmetis |
|
ParMETIS |
-L$(LIBDIR) -lparmetis_32 -lmpi |
|
|
-L$(LIBDIR) -lparmetis -lmpi |
|
SOLVER |
-L$(LIBDIR) -lsolver -lmetis -lblas -Popenmp |
|
ARPACK |
-L$(LIBDIR) -larpack -llapack -lblas |
Table 4: Loading for -ew
|
name |
load libraries (see Table 5 for $(LIBDIR)) |
|---|---|
|
BLAS |
-L$(LIBDIR) -lblas_64 |
|
LAPACK |
-L$(LIBDIR) -llapack_64 -lblas_64 |
|
ScaLAPACK |
not available |
|
BLACS |
not available |
|
PARBLAS |
-L$(LIBDIR) -lparblas_64 -Popenmp |
|
CBLAS |
not available |
|
SBLAS |
-L$(LIBDIR) -lsblas_64 |
|
FFT |
-L$(LIBDIR) -lfft_64 |
|
PARFFT |
-L$(LIBDIR) -lparfft_64 -Popenmp |
|
METIS |
-L$(LIBDIR) -lmetis_64 |
|
ParMETIS |
-L$(LIBDIR) -lparmetis_64 -lmpiw |
|
SOLVER |
-L$(LIBDIR) -lsolver_64 -lmetis_64 -lblas_64 -Popenmp |
|
ARPACK |
-L$(LIBDIR) -larpack_64 -llapack_64 -lblas_64 |
Table 5: default locations for $(LIBDIR)
| machine | F90 or C++ load flags | default $(LIBDIR) | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F90 | C++ | self compile | -size_t32 (default) | -Nover2g (default) | $(LIBDIR) = /usr/lib | -size_t64 | -over2g | $(LIBDIR) = /usr/lib/lib64 | cross compile | -size_t32 (default) | -Nover2g (default) | $(LIBDIR) = /SX/usr/lib | -size_t64 | -over2g | $(LIBDIR) = /SX/usr/lib/lib64 |
The data types for MathKeisan library files are listed in Tables 6 and 7.
Table 6: Data types for MathKeisan library files
|
|
Integer and floating point data type |
|
|---|---|---|
|
name |
I32R32+I32R64 |
I64R64+I64R64 |
|
BLAS |
libblas.a |
libblas_64.a |
|
LAPACK |
liblapack.a |
liblapack_64.a |
|
ScaLAPACK |
libscalapack.a |
not available |
|
BLACS |
libblacs.a |
not available |
|
PARBLAS |
libparblas.a |
libparblas_64.a |
|
CBLAS |
libcblas.a |
not available |
|
SBLAS |
libsblas.a |
libsblas_64.a |
|
FFT |
libfft.a |
libfft_64.a |
|
PARFFT |
libparfft.a |
libparfft_64.a |
|
ARPACK |
libarpack.a |
libarpack_64.a |
Table 7: Data types for MathKeisan library files
|
|
Integer and floating point data type |
||
|---|---|---|---|
|
name |
I32R32 |
I32R64 |
I64R64 |
|
METIS |
libmetis_32.a |
libmetis.a |
libmetis_64.a |
|
ParMETIS |
libparmetis_32.a |
libparmetis.a |
libparmetis_64.a |
|
SOLVER |
not available |
libsolver.a |
libsolver_64.a |
Files in column I32R32 + I32R64 of Table 6 are for 32 bit integer data type (Fortran integer*4). The floating point data type is determined by the first letter of the subroutine or function name as follows
Code compiled with the f90 default -dw should be linked to these files.
Files in column I64R64+I64R64 have 64 bit integer and floating point data type. Subroutine and function names still have first letter s,d,c,or z, but data type is 64 bit for both integer and floating point. Code compiled with the f90 flag -ew should be linked to these files.
In Table 7, files have data type indicated by the column name, for example, column name I32R32 for 32 bit integer 32 bit real. If you are compiling with the f90 default flag –dw, link to the I32R32 file if your reals are 32 bit, or link to the I32R64 file if your reals are 64 bit. If you are compiling with the f90 flag –ew, link to the I64R64 libraries.
Table 8: Symbolic links for SUPER/UX machine
| inst link | $INSTALLD/inst | -> | $INSTALLD/MK1_5_0 |
| lib links | /usr/lib0/libblas.a | -> | $INSTALLD/inst/lib0/libblas.a |
| /usr/lib0/liblapack.a | -> | $INSTALLD/inst/lib0/liblapack.a | |
| /usr/lib0/lib64/libblas.a | -> | $INSTALLD/inst/lib0/lib64/libblas.a | |
| /usr/lib0/lib64/liblapack.a | -> | $INSTALLD/inst/lib0/lib64/liblapack.a | |
| include link | /usr/include/cblas.h | -> | $INSTALLD/inst/include/cblas.h |
| man page link | /usr/share/man/C/mathkeisan | -> | $INSTALLD/inst/man/C |
Table 9: Symbolic links for cross compile machine
| inst link | $INSTALLD/inst | -> | $INSTALLD/MK1_5_0 |
| lib links | /SX/usr/lib0/libblas.a | -> | $INSTALLD/inst/lib0/libblas.a |
| /SX/usr/lib0/liblapack.a | -> | $INSTALLD/inst/lib0/liblapack.a | |
| /SX/usr/lib0/lib64/libblas.a | -> | $INSTALLD/inst/lib0/lib64/libblas.a | |
| /SX/usr/lib0/lib64/liblapack.a | -> | $INSTALLD/inst/lib0/lib64/liblapack.a | |
| include link | /SX/usr/include/cblas.h | -> | $INSTALLD/inst/include/cblas.h |
| man page link | no man page link on non SUPER/UX machine. Instructions are given on setting the environment variable $MANPATH to allow users to access man pages | ||
Below is output from running install.sh on a SUPER/UX machine. The "enter" key was pressed at each prompt to get default behavior.
____________________________________________________________
Do you want to install MathKeisan MK1_5_0
(default y: y,n ? )
____________________________________________________________
This install script will do the following:
1. Prompt for a directory in which to install MathKeisan
2. Install all of MathKeisan in this directory by untaring the
file MK1_5_0.tar
3. Set up symbolic links for inst, libraries, include files,
and man pages if you have the write permission on the
required directories
< Press RETURN to continue >
____________________________________________________________
Where should this package <MK1_5_0> be installed ?
(default: /usr/opt/mathkeisan )
____________________________________________________________
Please only continue if you have the required space
198174720 bytes in directory /usr/opt/mathkeisan
Do you want to continue
(default y: y,n ? )
Please wait while MK1_5_0.tar is untared
Tar: blocksize = 20
____________________________________________________________
do you want to create link
inst->MK1_5_0 in directory /usr/opt/mathkeisan
(default y: y,n ? )
____________________________________________________________
do you want to create MAN page link
/usr/share/man/C/mathkeisan->/usr/opt/mathkeisan/inst/man/C
(default y: y,n ? )
____________________________________________________________
do you want to create include file link
/usr/include/cblas.h->/usr/opt/mathkeisan/inst/include/cblas.h
(default y: y,n ? )
____________________________________________________________
do you want to create symbolic links like
/usr/lib0/libblas.a->/usr/opt/mathkeisan/inst/lib0/libblas.a
/usr/lib0/lib64/libblas.a->/usr/opt/mathkeisan/inst/lib0/lib64/libblas.a
for MathKeisan libraries in directories
/usr/opt/mathkeisan/inst/lib0
/usr/opt/mathkeisan/inst/lib0/lib64
(default y: y,n ? )
#--------------------------------------#
| Install is complete |
#--------------------------------------#
1. A log file for this install is in
/usr/opt/mathkeisan/MK1_5_0/doc/install.log
please e-mail a copy of this file to technical@atcc.necsys.com.
It will be used to debug any future problems.
2. An uninstall script is in
/usr/opt/mathkeisan/MK1_5_0/uninstall/uninstall.sh
Do not run this script unless you want to uninstall MathKeisan
Below are notes on each of the libraries in MathKeisan
The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for performing basic vector and matrix operations. Level 1 BLAS are for vector-vector operations, Level 2 BLAS are for matrix-vector operations, and Level 3 BLAS are for matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they're commonly used in the development of high quality linear algebra software, LAPACK and ScaLAPACK for example.
The BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorithms Group Ltd.), R. J. Hanson (Sandia National Lab.), D. Kincaid (University of Texas), F.T. Krogh, C.L. Lawson (Jet Propulsion Lab.).
PARBLAS contains shared memory parallel versions of the BLAS level 2 and level 3 subroutines. The level 1 subroutines are serial, or single processor, the same as in BLAS. These subroutine have the same interface as BLAS. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.
The shared memory parallel BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorithms Group Ltd.), R. J. Hanson (Sandia National Lab.), D. Kincaid (University of Texas), F.T. Krogh, C.L. Lawson (Jet Propulsion Lab.).
BLAS is a C language interface to the FORTRAN BLAS, a set of subroutines used to perform vector-vector(level1), matrix-vector(level2), and matrix-matrix(level3) operations.
The CBLAS is based on the BLAS Technical Forum reference implementation by K. Teranishi (University of Tennessee) with updates by J. Horner (University of Tennessee). The specification was authored by R. Whaley (University of Tennessee).
Sparse BLAS is a set of subroutines used to perform sparse BLAS operations. The Sparse BLAS are based on ACM Algorithm 692 by D.S.Dodson (Convex), R.G.Grimes and J.G.Lewis (Boeing).
LAPACK (Linear Algebra PACKage) provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.
LAPACK supersedes LINPACK and EISPACK. On shared memory vector and parallel processors LINPACK and EISPACK are inefficient because their memory access patterns disregard the multi-layered memory hierarchies of the machines, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication, in the innermost loops. Whenever possible, LAPACK calls BLAS (usually level 2 & level 3). Because of the coarse granularity of the level 3 BLAS operations, their use promotes high efficiency.
The LAPACK included in MathKeisan is based on the original version of LAPACK version 3.0 which was developed by the LAPACK project team which was composed ofE. Anderson (University of Tennessee, Knoxville), Z. Bai (University of Kentucky and University of California, Davis), C. Bischof (Institute for Scientific Computing, Technical University Aachen, Germany), S. Blackford (University of Tennessee, Knoxville), J. Demmel (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxville, and Oak Ridge National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), A. Greenbaum (University of Washington), S. Hammarling (Numerical Algorithms Group Ltd.), A. McKenney, D. Sorensen (Rice University)
The BLACS (Basic Linear Algebra Communication Subprograms) are a message-passing library designed for linear algebra. The computational model consists of a one-or two-dimensional process grid, where each process stores pieces of the matrices and vectors. The BLACS include synchronous send/receive routines to communicate a matrix or submatrix from one process to another, to broadcast submatrices to many processes, or to compute global data reductions (sums, maxima and minima). There are also routines to construct, change, or query the process grid. Since several ScaLAPACK algorithms require broadcasts or reductions among different subsets of processes, the BLACS permit a process to be a member of several overlapping or disjoint process grids, each one labeled by a context. In MPI this is called a communicator. The BLACS provide facilities for safe inter-operation of system contexts and BLACS contexts.
The BLACS included in MathKeisan is the original version 1.1 with patch03 written by J.J. Dongarra, and R.C. Whaley (University of Tennessee, Knoxville).
ScaLAPACK is a library of high-performance linear algebra routines for distributed-memory message passing computers. ScaLAPACK can solve systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorization or estimating condition numbers. Dense and band matrices are provided for, but not general sparse matrices. Similar functionality is provided for real and complex matrices. The name ScaLAPACK is an acronym for Scalable Linear Algebra PACKage, or Scalable LAPACK.
As in LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. The fundamental building block of the ScaLAPACK library is a distributed memory version of the Level 1, 2, and 3 BLAS, called PBLAS (Parallel BLAS). The PBLAS are in turn built on the BLAS for computation on a single node and on a set of Basic Linear Algebra Communication Subprograms (BLACS). PBLAS is contained in the ScaLAPACK library as an integral part of the ScaLAPACK library.
The ScaLAPACK included in MathKeisan is the original version 1.7 + errata written by L . S. Blackford (University of Tennessee, Knoxville), J. Choi (Soongsil University, Korea), A. Cleary (Lawrence Livermore National Lab.), E. D'Azevedo (Oak Ridge National Lab.), J. Demmel (University of California, Berkeley), I. Dhillon (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxville, and Oak Ridge National Lab.), S. Hammarling (Numerical Algorithms Group Ltd.), G. Henry (Intel Corporation), A. Petitet (University of Tennessee, Knoxville), K. Stanley (University of California, Berkeley), D. Walker (University of Wales, Cardiff), R. C. Whaley (University of Tennessee, Knoxville)
The Fast Fourier Transforms (FFTs) contained in MathKeisan have equivalent interface and functionality to HP's VECLIB Library and also CRAY's LIBSCI 3.1. There are 1D,2D,3D and simultaneous 1D Complex-Complex FFT's, Real-Complex FFT's and Complex-Real FFT's.
The FFT libraries were developed internally at NEC.
OpenMP parallel FFTs with the same functionality as for FFT above. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.
ARPACK is a collection of Fortran 77 subroutines designed to solve large-scale eigenvalue problems. ARPACK stands for ARnoldi PACKage. It is capable of solving large-scale symmetric(Hermitian), non-symmetric (non-Hermitian), standard, or generalized eigenvalue problems from significant application areas. The ARPACK library is designed to compute a few, say k, eigenvalues with user-specified features such as those of largest real part or largest magnitude using n*O(k) + O(k*k) storage. No auxiliary storage is required. A set of Schur basis vectors for the desired k-dimensional eigenspace is computed which is numerically orthogonal to working precision. Eigenvectors are also available upon request. ARPACK is dependent upon a number of subroutines from LAPACK and BLAS. The performance scales asymptotically to the Level 2 BLAS operation GEMV.
The ARPACK included in MathKeisan is based on the original version written by Rich Lehoucq, Kristi Maschhoff, Danny Sorensen and Chao Yang (Rice University).
METIS is a library for partitioning and ordering matrices/graphs. It is used by SOLVER to order the original matrix to reduce fill-ins in the factored matrix.
The METIS in MathKeisan is the original version 4.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.
---------------------------------------------------------------- "This software package includes/uses METIS, developed by George Karypis and Vipin Kumar at the University of Minnesota. Additional information about METIS can be found at http://www.cs.umn.edu/~karypis/metis METIS is Copyright 1997 Regents of the University of Minnesota. Twin Cities. All Rights Reserved. ----------------------------------------------------------------
PARMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs and for computing fill-reducing orderings of sparse matrices.
The PARMETIS in MathKeisan is the original version 2.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.
SOLVER contains subroutines used to solve sparse symmetric linear systems. It uses the left-looking algorithm to factor a sparse matrix A into A = L D L T, where L is lower triangular with unit diagonal and D is diagonal. It takes advantage of the supernodal structure of the matrix. The current version uses the METIS library to order the matrix. Both serial and parallel numerical factorization are supported. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.
A subroutine mkversion in libblas.a and libblas_64.a outputs MathKeisan version information to standard output. In Fortran use "call mkversion()", in C use "mkversion_()". In both cases link with f90 to the library libblas.a or libblas_64. See also the mkversion man page. Output for MathKeisan 1.5.0 is below.
MathKeisan 1.5.0 for SX BLAS - legacy blas LAPACK - version 3.0 + UPDATES ScaLAPACK - version 1.7 + errata BLACS - version 1.1 + patch 03 METIS - version 4.0 PARMETIS - version 2.0