June, 2001
NEC Corporation
The aim of MathKeisan is to provide a highly tuned and welltested collection of Math libraries for the NEC SX5 series.
The libraries in MathKeisan 1.2.0 are listed in Table 1.
Table 1: Libraries in MathKeisan 1.2.0
name 
description 
BLAS 
Basic Linear Algebra Subprograms 
LAPACK 
Linear algebra for high performance computers 
ScaLAPACK 
Scalable Linear Algebra package (contains PBLAS) 
BLACS 
Basic Linear Algebra Communication Subprograms 
PARBLAS 
Shared memory Parallel BLAS 
CBLAS 
C interface to BLAS 
ARPACK 
Solution of large scale eigenvalue problems 
FFT 
FFT's with HP's VECLIB interface and CRAY LIBSCI 3.1 interface 
SOLVER 
Direct solver for sparse symmetric systems 
METIS 
Matrix/Graph ordering and partitioning library 
ParMETIS 
Parallel Matrix/Graph ordering and partition library 
MathKeisan 1.2.0 is compatible with SUPERUX R11.1. or later.
If you are using the F90 flag dw (the default), link to the libraries in table 2. Link in the order given, or use the ld flag h lib_cyclic.
Table 2: Linking for dw
name 
link to 
BLAS 
lblas 
LAPACK 
llapack lblas 
BLACS 
lblacsF90init lblacs lblacsF90init lmpi 
ScaLAPACK 
lscalapack lblacsF90init lblacs lblacsF90init lblas lmpi 
PARBLAS 
lparblas 
CBLAS 
lcblas lblas 
FFT 
lfft 
SOLVER 
lsolver lmetis lblas 
METIS 
lmetis_32 

lmetis 
ParMETIS 
lparmetis_32 lmpi 

lparmetis lmpi 
ARPACK 
larpack llapack lblas 
Table 3: Linking for ew
name 
link to 
BLAS 
lblas_64 
LAPACK 
llapack_64 lblas_64 
BLACS 
not available 
ScaLAPACK 
not available 
PARBLAS 
lparblas_64 
CBLAS 
not available 
FFT 
lfft_64 
SOLVER 
lsolver_64 lmetis_64 lblas_64 
METIS 
lmetis_64 
ParMETIS 
lparmetis_64 lmpiw 
ARPACK 
larpack_64 llapack_64 lblas_64 
The data types for MathKeisan 1.2.0 library files are listed in Tables 4 and 5.
Table 4: Data types for MathKeisan 1.2.0 library files

Integer and floating point data type 

name 
I32R32+I32R64 
I64R64+I64R64 
BLAS 
libblas.a 
libblas_64.a 
LAPACK 
liblapack.a 
liblapack_64.a 
ScaLAPACK 
libscalapack.a 
not available 
BLACS 
libblacs.a 
not available 
PARBLAS 
libparblas.a 
libparblas_64.a 
CBLAS 
libcblas.a 
not available 
ARPACK 
libarpack.a 
libarpack_64.a 
FFT 
libfft.a 
libfft_64.a 
Table 5: Data types for MathKeisan 1.2.0 library files

Integer and floating point data type 

name 
I32R32 
I32R64 
I64R64 
METIS 
libmetis_32.a 
libmetis.a 
libmetis_64.a 
ParMETIS 
libparmetis_32.a 
libparmetis.a 
libparmetis_64.a 
SOLVER 
not available 
libsolver.a 
libsolv er_64.a 
Files in column I32R32 + I32R64 of Table 4 are for 32 bit integer data type (Fortran integer*4). The floating point data type is determined by the first letter of the subroutine or function name as follows
Code compiled with the f90 default dw should be linked to these files.
Files in column I64R64+I64R64 have 64 bit integer and floating point data type. Subroutine and function names still have first letter s,d,c,or z, but all are for 64 bit integer and floating point data type. Code compiled with the f90 flag ew should be linked to these files.
In Table 5, files have data type indicated by the column name, for example, column name I32R32 for 32 bit integer 32 bit real. If you are compiling with the f90 default flag $BKE(Bw, link to the I32R32 file if your reals are 32 bit, or link to the I32R64 file if your reals are 64 bit. If you are compiling with the f90 flag $BKF(Bw, link to the I64R64 libraries.
1. 
Prompts for directories to install lib, include, man page and Users' Guide files, and to backup files that will be over written 
2. 
Finds space required, approximately 79MB for lib files, 11MB for man pages files 
3. 
Files in the lib, include, man page directories that will be over written are moved to the backup directory 
4. 
MathKeisan lib, include, man page, and Users' Guide files are copied to the directories supplied in 1. 
5. 
An uninstall script and log file are created 
MathKeisan Installation ... MathKeisan contains 2084 manpage files MathKeisan contains 1 include files MathKeisan contains 14 Users Guide files MathKeisan contains the following library files: libarpack.a libarpack_64.a libblacs.a libblacsCinit.a libblacsF90init.a libblas.a libblas_64.a libcblas.a libfft.a libfft_64.a liblapack.a liblapack_64.a libmetis.a libmetis_32.a libmetis_64.a libparblas.a libparblas_64.a libparmetis.a libparmetis_32.a libparmetis_64.a libscalapack.a libsolver.a libsolver_64.a Press RETURN to continue This install script will do the following: 1. Copy MathKeisan library, man page, include, Users' Guide and Japanese Users' Guide files to the specified directories, and give them the correct permission 2. Any file that is over written will be backed up in the backup directory 3. An uninstall script is created and placed in the backup directory Below are default directories for install and backup: libraries /SX/usr/lib/ manpages /SX/usr/man/manl/ include /SX/usr/include Users' Guide /SX/usr/opt/MathKeisan/MK1_2_0/UsersGuide Japanese Users' Guide /SX/usr/opt/MathKeisan/MK1_2_0/JUsersGuide backup /tmp/mathkeisan/MK1_2_0/bkpdir At t he prompt below say yes and use the default directories, or say no and enter a directory name. The directories must exist, and you must have write permission. The backup directory must be empty. Is directory for installing libraries correct yes/no: yes Is directory for installing manpages correct yes/no: yes Is directory for installing include files correct yes/no: yes Is directory for installing Users' Guide correct yes/no: yes Is directory for installing Japanese Users' Guide correct yes/no: yes Is directory for backup correct yes/no: no enter directory for backup /tmp/bkup Below are chosen directories: libraries /SX/usr/lib/ manpages /SX/usr/man/manl/ include files /SX/usr/include Users' Guide /SX/usr/opt/MathKeisan/MK1_2_0/UsersGuide Japanese Users' Guide /SX/usr/opt/MathKeisan/MK1_2_0/JUsersGuide backup /tmp/bkup Are these directories correct yes/no: yes finding space required for install ~~ finding space required for install ~~ space required for new library files = 79221886 byte space required for n ew manpage files = 10722326 byte space required for new include files = 32428 byte space required for new Users' Guide files = 150029 byte space required for new Japanese Users' Guide files = 0 byte space required for backup of files over written = 0 byte ##  IMPORTANT!! only continue if you have enough space in chosen directories to   1. install new library, manpage, include, Users' Guide and   Japanese Users' Guide files   2. make backup of any file over written  ## Do you want to continue yes/no: yes backing up and installing libraries  backing up and installing man pages !! backing up and installing include files backing up and installing Users Guide backing up and installing Japanese Users Guide writing uninstall script ##  Install is complete  ## 1. A log file for this install is in /tmp/a91anch/MK1_2_0/install.log132594 please email a copy of this file to technical@atcc.necsyl.com. It will be used to debug any future problems. When your log file is received, a login and password will be issued so you can access MathKeisan patches that are posted at www.mathkeisan.com. 2. To uninstall, save directory below, and its contents /tmp/a91anch/bkup An uninstall script is in /tmp/a91anch/bkup/uninstall.sh Running the uninstall script will delete the lib and man pages that this script installed, and it will restore all files over written. Do not run the uninstall script unless you want to uninstall MathKeisan
Below are notes on each of the libraries in MathKeisan 1.2.0.
The BLAS (Basic Linear Algebra Subprograms) are high quality "building block" routines for performing basic vector and matrix operations. Level 1 BLAS are for vectorvector operations, Level 2 BLAS are for matrixvector operations, and Level 3 BLAS are for matrixmatrix operations. Because the BLAS are efficient, portable, and widely available, they're commonly used in the development of high quality linear algebra software, LAPACK and ScaLAPACK for example.
The BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Laboratory), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorith ms Group Ltd.), R. J. Hanson (Sandia National Labs), D. Kincaid (University of Texas), F.T. Krogh (Jet Propulsion Lab), C.L. Lawson (Jet Propulsion Lab).
PARBLAS contains shared memory parallel versions of the BLAS level 2 and level 3 subroutines. The level 1 subroutines are serial, or single processor, the same as in BLAS. These subroutine have the same interface as BLAS. The number of parallel tasks is set in the following order of precedence.
1. 
The value specified with RESERVE parallelization directive 
2. 
The value specified with reserve compile option 
3. 
The value specified with F_RSVTASK environment variable 
4. 
The value of M AX parameter of CPU resource block 
The shared memory parallel BLAS included in MathKeisan is based on the original version of BLAS which was developed by J.J. Dongarra (Argonne National Laboratory), J. Du Croz (Numerical Algorithms Group Ltd.), I. S. Duff (AERE Harwell), S. Hammarling (Numerical Algorithms Group Ltd.), R. J. Hanson (Sandia National Labs), D. Kincaid (University of Texas), F.T. Krogh (Jet Propulsion Lab), C.L. Lawson (Jet Propulsion Lab).
BLAS is a C language interface to the FORTRAN BLAS, a set of subroutines used to perform vectorvector(level1), matrixvector(level2), and matrixmatrix(level3) operations.
The CBLAS is based on the BLAS Technical Forum reference implementation by K. Teranishi (University of Tennessee) with updates by J. Horner (University of Tennessee). The specification was authored by R. Whaley (University of Tennessee).
LAPACK (Linear Algebra PACKage) provides routines for solving systems of simultaneous linear equations, leastsquares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.
LAPACK supersedes LINPACK and EISPACK. On shared memory vector and parallel processors LINPACK and EISPACK are inefficient because their memory access patterns disregard the multilayered memory hierarchies of the machines, thereby sp ending too much time moving data instead of doing useful floatingpoint operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication, in the innermost loops. Whenever possible, LAPACK calls BLAS (usually level 2 & level 3). Because of the coarse granularity of the level 3 BLAS operations, their use promotes high efficiency.
The LAPACK included in MathKeisan is based on the original version of LAPACK version 3.0 which was developed by the LAPACK project team which was composed ofE. Anderson (University of Tennessee, Knoxville), Z. Bai (University of Kentucky and University of California, Davis), C. Bischof (Institute for Scientific Computing, Technical University Aachen, Germany), S. Blackford (University of Tennessee, Knoxville), J. Demmel (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxvi le, and Oak Ridge National Lab.), J. Du Croz (Numerica l Algorithms Group Ltd.), A. Greenbaum (University of Washington), S. Hammarling (Numerical Algorithms Group Ltd.), A. McKenney, D. Sorensen (Rice University)
The BLACS (Basic Linear Algebra Communication Subprograms) are a messagepassing library designed for linear algebra. The computational model consists of a oneor twodimensional process grid, where each process stores pieces of the matrices and vectors. The BLACS include synchronous send/receive routines to communicate a matrix or submatrix from one process to another, to broadcast submatrices to many processes, or to compute global data reductions (sums, maxima and minima). There are also routines t construct, change, or query the process grid. Since several ScaLAPACK algorithms require broadcasts or reductions among different subsets of processes, the BLACS permit a process to be a member of several overlapping or disjoint process grids, each one labeled by a context. In MPI this is called a communicator. The BLACS provide facilities for safe interoperation of system contexts and BLACS contexts.
The BLAC S included in MathKeisan is the original version 1.1 with patch03 written by J.J. Dongarra, and R.C. Whaley (University of Tennessee, Knoxville).
The PBLAS library is contained in the ScaLAPACK library as an integral part of the ScaLAPACK library. PBLAS is a parallel set of BLAS, which perform messagepassing and whose interface is similar to the BLAS. PBLAS aims to provide a distributed memory standard, just as the BLAS have provided a shared memory standard.
ScaLAPACK is a library of highperformance linear algebra routines for distributedmemory message passing computers. ScaLAPACK can solve systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorization or estimating conditio n numbers. Dense and band matrices are provided for, but not general sparse matrices. Similar functionality is provided for real and complex matrices. The name ScaLAPACK is an acronym for Scalable Linear Algebra PACKage, or Scalable LAPACK.
As in LAPACK, the ScaLAPACK routines are based on blockpartitioned algorithms in order to minimize the frequency of data movement between different levels of the memory hierarchy. The fundamental building block of the ScaLAPACK library is a distributed memory version of the Level 1, 2, and 3 BLAS, called PBLAS (Parallel BLAS). The PBLAS are in turn built on the BLAS for computation on a single node and on a set of Basic Linear Algebra Communication Subprograms (BLACS).
The ScaLAPACK included in MathKeisan is the original version 1.6 written by L . S. Blackford (University of Tennessee, Knoxville), J. Choi (Soongsil University, Korea), A. Cleary (Lawrence Livermore National Laborato ry), E. D'Azevedo (Oak Ridge National Laboratory), J. Demmel (University of California, Berkeley), I. Dhillon (University of California, Berkeley), J. Dongarra (University of Tennessee, Knoxville, and Oak Ridge National Lab.), S. Hammarling (Numerical Algorithms Group Ltd.), G. Henry (Intel Corporation), A. Petitet (University of Tennessee, Knoxville), K. Stanley (University of California, Berkeley), D. Walker (University of Wales, Cardiff), R. C. Whaley (University of Tennessee, Knoxville)
The Fast Fourier Transforms (FFTs) contained in MathKeisan 1.1.0 have equivalent interface and functionality to HP's VECLIB Library and also CRAY's LIBSCI 3.1. There are 1D,2D,3D and simultaneous 1D ComplexComplex FFT's, RealComplex FFT's and ComplexReal FFT's.
The FFT libraries were developed internally at NEC.
AR PACK is a collection of Fortran 77 subroutines designed to solve largescale eigenvalue problems. ARPACK stands for ARnoldi PACKage. It is capable of solving largescale symmetric(Hermitian), nonsymmetric (nonHermitian), standard, or generalized eigenvalue problems from significant application areas. The ARPACK library is designed to compute a few, say k, eigenvalues with userspecified features such as those of largest real part or largest magnitude using n*O(k) + O(k*k) storage. No auxi iary storage is required. A set of Schur basis vectors for the desired kdimensional eigenspace is computed which is numerically orthogonal to working precision. Eigenvectors are also available upon request. ARPACK is dependent upon a number of subroutines from LAPACK and BLAS. The performance scales asymptotically to the Level 2 BLAS operation GEMV.
The ARPACK included in MathKeisan is based on the original version written by Rich Lehoucq, Kristi Maschhoff, Danny Sorensen and Chao Yang (Rice University).
METIS is a library for partitioning and o rdering matrices/graphs. It is used by SOLVER to order the original matrix to reduce fillins in the factored matrix.
The METIS in MathKeisan is the original version 4.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.
PARMETIS is an MPIbased parallel library that implements a variety of algorithms for partitioning unstructured graphs and for computing fillreducing orderings of sparse matrices.
The PARMETIS in MathKeisan is the origianl version 2.0 developed at University of Minnesota and Army HPC research center by George Karypis and Vipin Kumar.
SOLVER contains subroutines used to solve sparse symmetric linear systems. It uses the leftlooking algorithm to factor a sparse matrix A into A = L D L ^{T}, where L is lower triangular with unit diagonal and D is diagonal. It takes advantage of the supernodal structure of the matrix. The current version uses the METIS library to order the matrix. Both serial and parallel numerical factorization are supported.
The Solver libraries were developed internally at NEC.
A subroutine mkversion in libblas.a outputs MathKeisan version information to standard output. In Fortran use "call mkversion()", in C use "mkversion_()". In both cases link with f90 to the library libblas.a. See also the mkversion man page. Output for MathKeisan 1.2.0 is below.
MathKeisan 1.2.0 BLAS  legacy blas LAPACK  version 3.0 ScaLAPACK  version 1.6 BLACS  version 1.1 + patch 03 METIS  version 4.0 PARMETIS  version 2.0
If you have any questions or feedback for MathKeisan 's developers, please send email to support@hpce.nec.com.
If you have any bug report, please report it to NEC through NEC's PSR system.