MathKeisan 1.5.0 for SX Release Notes


January, 2004

NEC Corporation

Introduction

The aim of MathKeisan is to provide a highly tuned and well-tested collection of Math libraries for NEC high performance computers. This version is for the NEC SX-5, SX-6, and SX-7 vector computers. See www.mathkeisan.com for other versions for NEC's Itanium® Processor Family servers. Unless noted otherwise, all references to MathKeisan in these release notes are to MathKeisan for SX-5, SX-6, SX-7.

The libraries in MathKeisan are listed in Table 1.


Table 1: Libraries in MathKeisan

name

description

BLAS

Basic Linear Algebra Subprograms

LAPACK

Linear algebra for high performance computers

ScaLAPACK

Scalable Linear Algebra package (contains PBLAS)

BLACS

Basic Linear Algebra Communication Subprograms

PARBLAS

Shared memory Parallel BLAS

CBLAS

C interface to BLAS

SBLAS

Sparse BLAS

FFT

FFT's with HP's VECLIB interface and CRAY LIBSCI 3.1 interface

PARFFT

Parallel FFT's with HP's VECLIB interface and CRAY LIBSCI 3.1 interface

METIS

Matrix/Graph ordering and partitioning library

ParMETIS

Parallel Matrix/Graph ordering and partition library

SOLVER

Direct solver for sparse symmetric systems

ARPACK

Solution of large scale eigenvalue problems

Compatibility

For a list of machines and SUPER/UX revisions compatible with MathKeisan 1.5.0 please follow the "Compatibility" link at www.mathkeisan.com. The following compilers were used to build the MathKeisan libraries

Fortran f90 for SX, Rev.267
C C++/SX, Rev.061
MPI MPI/SX r121

New in MathKeisan 1.5.0

Loading

If you are using the F90 flag -dw (the default), load with the libraries in Table 3. If you are using the f90 flag -ew, load with the libraries in Table 4. Load libraries in the order given, or use the ld flag -h lib_cyclic. Table 5 has $(LIBDIR) for size_t32 or size_t64 libraries. $(LIBDIR) is given for self and cross compile machines, and these are default locations. If libraries are not in these default locations, ask your system administrator where they are.

Table 3: Loading for -dw

name

load libraries (see Table 5 for $(LIBDIR))

BLAS

-L$(LIBDIR) -lblas

LAPACK

-L$(LIBDIR) -llapack -lblas

ScaLAPACK

-L$(LIBDIR) -lscalapack -lblacsF90init -lblacs -lblacsF90init -lblas -lmpi

BLACS

-L$(LIBDIR) -lblacsF90init -lblacs -lblacsF90init -lmpi

PARBLAS

-L$(LIBDIR) -lparblas -Popenmp

CBLAS

-L$(LIBDIR) -lcblas -lblas

SBLAS

-L$(LIBDIR) -lsblas

FFT

-L$(LIBDIR) -lfft

PARFFT

-L$(LIBDIR) -lparfft -Popenmp

METIS

-L$(LIBDIR) -lmetis_32

 

-L$(LIBDIR) -lmetis

ParMETIS

-L$(LIBDIR) -lparmetis_32 -lmpi

 

-L$(LIBDIR) -lparmetis -lmpi

SOLVER

-L$(LIBDIR) -lsolver -lmetis -lblas -Popenmp

ARPACK

-L$(LIBDIR) -larpack -llapack -lblas

 

Table 4: Loading for -ew

name

load libraries (see Table 5 for $(LIBDIR))

BLAS

-L$(LIBDIR) -lblas_64

LAPACK

-L$(LIBDIR) -llapack_64 -lblas_64

ScaLAPACK

not available

BLACS

not available

PARBLAS

-L$(LIBDIR) -lparblas_64 -Popenmp

CBLAS

not available

SBLAS

-L$(LIBDIR) -lsblas_64

FFT

-L$(LIBDIR) -lfft_64

PARFFT

-L$(LIBDIR) -lparfft_64 -Popenmp

METIS

-L$(LIBDIR) -lmetis_64

ParMETIS

-L$(LIBDIR) -lparmetis_64 -lmpiw

SOLVER

-L$(LIBDIR) -lsolver_64 -lmetis_64 -lblas_64 -Popenmp

ARPACK

-L$(LIBDIR) -larpack_64 -llapack_64 -lblas_64

 

Table 5: default locations for $(LIBDIR)

machine F90 or C++ load flags default $(LIBDIR)
F90 C++ self compile -size_t32 (default) -Nover2g (default) $(LIBDIR) = /usr/lib -size_t64 -over2g $(LIBDIR) = /usr/lib/lib64 cross compile -size_t32 (default) -Nover2g (default) $(LIBDIR) = /SX/usr/lib -size_t64 -over2g $(LIBDIR) = /SX/usr/lib/lib64

Data types

The data types for MathKeisan library files are listed in Tables 6 and 7.

 

Table 6: Data types for MathKeisan library files

 

Integer and floating point data type

name

I32R32+I32R64

I64R64+I64R64

BLAS

 libblas.a

 libblas_64.a

LAPACK

 liblapack.a

 liblapack_64.a

ScaLAPACK

 libscalapack.a

 not available

BLACS

 libblacs.a

 not available

PARBLAS

 libparblas.a

 libparblas_64.a

CBLAS

 libcblas.a

 not available

SBLAS

libsblas.a

libsblas_64.a

FFT

 libfft.a

 libfft_64.a

PARFFT

 libparfft.a

 libparfft_64.a

ARPACK

 libarpack.a

 libarpack_64.a

 

Table 7: Data types for MathKeisan library files

 

Integer and floating point data type

name

I32R32

I32R64

I64R64

METIS

libmetis_32.a

libmetis.a

 libmetis_64.a

ParMETIS

libparmetis_32.a

libparmetis.a

libparmetis_64.a

SOLVER

not available

 libsolver.a

libsolver_64.a

 

Files in column I32R32 + I32R64 of Table 6 are for 32 bit integer data type (Fortran integer*4). The floating point data type is determined by the first letter of the subroutine or function name as follows

Code compiled with the f90 default -dw should be linked to these files.

Files in column I64R64+I64R64 have 64 bit integer and floating point data type. Subroutine and function names still have first letter s,d,c,or z, but data type is 64 bit for both integer and floating point. Code compiled with the f90 flag -ew should be linked to these files.

In Table 7, files have data type indicated by the column name, for example, column name I32R32 for 32 bit integer 32 bit real. If you are compiling with the f90 default flag –dw, link to the I32R32 file if your reals are 32 bit, or link to the I32R64 file if your reals are 64 bit. If you are compiling with the f90 flag –ew, link to the I64R64 libraries.

Man pages

MathKeisan includes man pages. There is a man page for each library, and the following libraries have man pages for individual subroutines: BLAS, LAPACK, ScaLAPACK, FFT, SOLVER, SBLAS, ARPACK. As an example, to view the man page for BLAS, type "man blas", to view the man page for the BLAS subroutine dgemm, type "man dgemm".

Installation instructions

The MathKeisan distribution contains three files
  • tar file
  • install.sh script
  • README
To install MathKeisan type "install.sh". This will do the following
  1. Prompts for directory to install MathKeisan. The default for a SUPER/UX machine is $INSTALLD=/usr/opt/mathkeisan/. For a cross compile machine it is $INSTALLD=/SX/opt/mathkeisan
  2. Extract the tar file in the install directory. The complete MathKeisan distribution will now be in $INSTALLD/MK1_5_0
  3. If you have write permission, the symbolic links in Tables 8 and 9 will be created. Any files over-written are backed up
  4. An uninstall script and log file are created

Table 8: Symbolic links for SUPER/UX machine

inst link $INSTALLD/inst -> $INSTALLD/MK1_5_0
lib links /usr/lib0/libblas.a -> $INSTALLD/inst/lib0/libblas.a
  /usr/lib0/liblapack.a -> $INSTALLD/inst/lib0/liblapack.a
 
:
 
:
  /usr/lib0/lib64/libblas.a -> $INSTALLD/inst/lib0/lib64/libblas.a
  /usr/lib0/lib64/liblapack.a -> $INSTALLD/inst/lib0/lib64/liblapack.a
 
:
 
:
include link /usr/include/cblas.h -> $INSTALLD/inst/include/cblas.h
man page link /usr/share/man/C/mathkeisan -> $INSTALLD/inst/man/C

 

Table 9: Symbolic links for cross compile machine

inst link $INSTALLD/inst -> $INSTALLD/MK1_5_0
lib links /SX/usr/lib0/libblas.a -> $INSTALLD/inst/lib0/libblas.a
  /SX/usr/lib0/liblapack.a -> $INSTALLD/inst/lib0/liblapack.a
 
:
 
:
  /SX/usr/lib0/lib64/libblas.a -> $INSTALLD/inst/lib0/lib64/libblas.a
  /SX/usr/lib0/lib64/liblapack.a -> $INSTALLD/inst/lib0/lib64/liblapack.a
 
:
 
:
include link /SX/usr/include/cblas.h -> $INSTALLD/inst/include/cblas.h
man page link no man page link on non SUPER/UX machine. Instructions are given on setting the environment variable $MANPATH to allow users to access man pages

 

Below is output from running install.sh on a SUPER/UX machine. The "enter" key was pressed at each prompt to get default behavior.

 ____________________________________________________________
 Do you want to install MathKeisan MK1_5_0
 (default y: y,n ? ) 
 ____________________________________________________________
 This install script will do the following:
 1. Prompt for a directory in which to install MathKeisan
 2. Install all of MathKeisan in this directory by untaring the
    file MK1_5_0.tar
 3. Set up symbolic links for inst, libraries, include files,
    and man pages if you have the write permission on the
    required directories
 < Press RETURN to continue >
 
 ____________________________________________________________
 Where should this package <MK1_5_0> be installed ?
 (default: /usr/opt/mathkeisan ) 
 ____________________________________________________________
 Please only continue if you have the required space
      198174720 bytes     in directory     /usr/opt/mathkeisan
 Do you want to continue
 (default y: y,n ? ) 
 Please wait while MK1_5_0.tar is untared
 
 Tar: blocksize = 20
 ____________________________________________________________
 do you want to create link 
      inst->MK1_5_0     in directory     /usr/opt/mathkeisan
 (default y: y,n ? ) 
 ____________________________________________________________
 do you want to create MAN page link
      /usr/share/man/C/mathkeisan->/usr/opt/mathkeisan/inst/man/C
 (default y: y,n ? ) 
 ____________________________________________________________
 do you want to create include file link
      /usr/include/cblas.h->/usr/opt/mathkeisan/inst/include/cblas.h
 (default y: y,n ? ) 
 ____________________________________________________________
 do you want to create symbolic links like
 /usr/lib0/libblas.a->/usr/opt/mathkeisan/inst/lib0/libblas.a
 /usr/lib0/lib64/libblas.a->/usr/opt/mathkeisan/inst/lib0/lib64/libblas.a
 for MathKeisan libraries in directories
      /usr/opt/mathkeisan/inst/lib0
      /usr/opt/mathkeisan/inst/lib0/lib64
 (default y: y,n ? ) 
 
 #--------------------------------------#
 |          Install is complete         |
 #--------------------------------------#
 
  1. A log file for this install is in
       /usr/opt/mathkeisan/MK1_5_0/doc/install.log
     please e-mail a copy of this file to technical@atcc.necsys.com.
     It will be used to debug any future problems.
 
  2. An uninstall script is in
       /usr/opt/mathkeisan/MK1_5_0/uninstall/uninstall.sh
     Do not run this script unless you want to uninstall MathKeisan
 

Product Description

Below are notes on each of the libraries in MathKeisan

BLAS

The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for performing basic vector and matrix operations.  Level 1 BLAS are for vector-vector operations, Level 2 BLAS are for matrix-vector operations, and Level 3 BLAS are for matrix-matrix operations.  Because the BLAS are efficient, portable, and widely available, they're commonly used in the development of high quality linear algebra software, LAPACK and ScaLAPACK for example.

The BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorithms Group Ltd.), R. J. Hanson (Sandia National Lab.), D. Kincaid (University of Texas), F.T. Krogh, C.L. Lawson (Jet Propulsion Lab.).

PARBLAS

PARBLAS contains shared memory parallel versions of the BLAS level 2 and level 3 subroutines.  The level 1 subroutines are serial, or single processor, the same as in BLAS.  These subroutine have the same interface as BLAS. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.

The shared memory parallel BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorithms Group Ltd.), R. J. Hanson (Sandia National Lab.), D. Kincaid (University of Texas), F.T. Krogh, C.L. Lawson (Jet Propulsion Lab.).

CBLAS

BLAS is a C language interface to the FORTRAN BLAS, a set of subroutines used to perform vector-vector(level1), matrix-vector(level2), and matrix-matrix(level3) operations.  

The CBLAS is based on the BLAS Technical Forum reference implementation by K. Teranishi (University of Tennessee) with updates by J. Horner (University of Tennessee). The specification was authored by R. Whaley (University of Tennessee).

SBLAS

Sparse BLAS is a set of subroutines used to perform sparse BLAS operations. The Sparse BLAS are based on ACM Algorithm 692 by D.S.Dodson (Convex), R.G.Grimes and J.G.Lewis (Boeing).

LAPACK

LAPACK (Linear Algebra PACKage) provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

LAPACK  supersedes LINPACK and EISPACK. On shared memory vector and parallel processors LINPACK and EISPACK are inefficient because their memory access patterns disregard the multi-layered memory hierarchies of the machines, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication, in the innermost loops. Whenever possible, LAPACK calls BLAS (usually level 2 & level 3). Because of the coarse granularity of the level 3 BLAS operations, their use promotes high efficiency.

The LAPACK included in MathKeisan is based on the original version of LAPACK version 3.0 which was developed by the LAPACK project team which was composed ofE. Anderson (University of Tennessee, Knoxville), Z. Bai (University of Kentucky and University of California, Davis), C. Bischof (Institute for Scientific Computing, Technical University Aachen, Germany), S. Blackford (University of Tennessee, Knoxville), J. Demmel (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxville, and Oak Ridge National Lab.), J. Du Croz (Numerical Algorithms Group Ltd.), A. Greenbaum (University of Washington), S. Hammarling (Numerical Algorithms Group Ltd.), A. McKenney, D. Sorensen (Rice University)

BLACS

The BLACS (Basic Linear Algebra Communication Subprograms) are a message-passing library designed for linear algebra. The computational model consists of a one-or two-dimensional process grid, where each process stores pieces of the matrices and vectors. The BLACS include synchronous send/receive routines to communicate a matrix or submatrix from one process to another, to broadcast submatrices to many processes, or to compute global data reductions (sums, maxima and minima). There are also routines to construct, change, or query the process grid. Since several ScaLAPACK algorithms require broadcasts or reductions among different subsets of processes, the BLACS permit a process to be a member of several overlapping or disjoint process grids, each one labeled by a context. In MPI this is called a communicator. The BLACS provide facilities for safe inter-operation of system contexts and BLACS contexts.

The BLACS included in MathKeisan is the original version 1.1 with patch03 written by J.J. Dongarra, and R.C. Whaley (University of Tennessee, Knoxville).

ScaLAPACK

ScaLAPACK is a library of high-performance linear algebra routines for distributed-memory message passing computers.  ScaLAPACK can solve systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems.  ScaLAPACK can also handle many associated computations such as matrix factorization or estimating condition numbers.  Dense and band matrices are provided for, but not general sparse matrices. Similar functionality is provided for real and complex matrices.  The name ScaLAPACK is an acronym for Scalable Linear Algebra PACKage, or Scalable LAPACK.

As in LAPACK, the ScaLAPACK routines are based on block-partitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy.  The fundamental building block of the ScaLAPACK library is a distributed memory version of the Level 1, 2, and 3 BLAS, called PBLAS (Parallel BLAS).  The PBLAS are in turn built on the BLAS for computation on a single node and on a set of Basic Linear Algebra Communication Subprograms (BLACS). PBLAS is contained in the ScaLAPACK library as an integral part of the ScaLAPACK library.

The ScaLAPACK included in MathKeisan is the original version 1.7 + errata written by L . S. Blackford (University of Tennessee, Knoxville), J. Choi (Soongsil University, Korea), A. Cleary (Lawrence Livermore National Lab.), E. D'Azevedo (Oak Ridge National Lab.), J. Demmel (University of California, Berkeley), I. Dhillon (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxville, and Oak Ridge National Lab.), S. Hammarling (Numerical Algorithms Group Ltd.), G. Henry (Intel Corporation), A. Petitet (University of Tennessee, Knoxville), K. Stanley (University of California, Berkeley), D. Walker (University of Wales, Cardiff), R. C. Whaley (University of Tennessee, Knoxville)

FFT

The Fast Fourier Transforms (FFTs) contained in MathKeisan have equivalent interface and functionality to HP's VECLIB Library and also CRAY's LIBSCI 3.1. There are 1D,2D,3D and simultaneous 1D Complex-Complex FFT's,  Real-Complex FFT's and Complex-Real FFT's.

The FFT libraries were developed internally at NEC.

PARFFT

OpenMP parallel FFTs with the same functionality as for FFT above. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.

ARPACK

ARPACK is a collection of Fortran 77 subroutines designed to solve large-scale eigenvalue problems. ARPACK stands for ARnoldi PACKage. It is capable of solving large-scale symmetric(Hermitian), non-symmetric (non-Hermitian), standard, or generalized eigenvalue problems from significant application areas. The ARPACK library is designed to compute a few, say k, eigenvalues with user-specified features such as those of largest real part or largest magnitude using n*O(k) + O(k*k) storage. No auxiliary storage is required. A set of Schur basis vectors for the desired k-dimensional eigenspace is computed which is numerically orthogonal to working precision. Eigenvectors are also available upon request. ARPACK is dependent upon a number of subroutines from LAPACK and BLAS. The performance scales asymptotically to the Level 2 BLAS operation GEMV.  

The ARPACK included in MathKeisan is based on the original version written by Rich Lehoucq, Kristi Maschhoff, Danny Sorensen and Chao Yang (Rice University).

METIS

METIS is a library for partitioning and ordering matrices/graphs. It is used by SOLVER to order the original matrix to reduce fill-ins in the factored matrix. 

The METIS in MathKeisan is the original version 4.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.

 ----------------------------------------------------------------
 "This software package includes/uses METIS, developed by George 
    Karypis and Vipin Kumar at the University of Minnesota. 
    Additional information about METIS can be found at 
    http://www.cs.umn.edu/~karypis/metis
 METIS is Copyright 1997 Regents of the University of Minnesota. 
    Twin Cities. All Rights Reserved.
 ----------------------------------------------------------------
 

ParMETIS

PARMETIS is an MPI-based parallel library that implements a variety of algorithms for partitioning unstructured graphs and for computing fill-reducing orderings of sparse matrices. 

The PARMETIS in MathKeisan is the original version 2.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.

SOLVER

SOLVER contains subroutines used to solve sparse symmetric linear systems. It uses the left-looking algorithm to factor a sparse matrix A into A = L D L T, where L is lower triangular with unit diagonal and D is diagonal. It takes advantage of the supernodal structure of the matrix. The current version uses the METIS library to order the matrix. Both serial and parallel numerical factorization are supported. The number of parallel threads is specified by calling the OMP function OMP_SET_NUM_THREADS(n), where n is the number of parallel threads, or by setting the environment variable OMP_NUM_THREADS. For C shell use 'setenv OMP_NUM_THREADS n'. For Bourne shell, use 'export OMP_NUM_THREADS ; OMP_NUM_THREADS=n'. More information on setting the number of threads is in the FORTRAN90/SX Multitasking User's Guide.

Auxiliary Subroutine

A subroutine mkversion in libblas.a and libblas_64.a outputs MathKeisan version information to standard output. In Fortran use "call mkversion()", in C use "mkversion_()". In both cases link with f90 to the library libblas.a or libblas_64. See also the mkversion man page. Output for MathKeisan 1.5.0 is below.

  MathKeisan 1.5.0 for SX
  BLAS      - legacy blas
  LAPACK    - version 3.0 + UPDATES
  ScaLAPACK - version 1.7 + errata
  BLACS     - version 1.1 + patch 03
  METIS     - version 4.0
  PARMETIS  - version 2.0