Date: Sat, 04 Jan 2003 07:36:27 +0100
From: Florian Barth
Subject: Re: Linux cluster Amber7 sander mpi error -- Null communicator, IOT

Chris Switzer wrote:

> Amber 7 sander newly compiled on a linux cluster gives the following
> error:
>
> 1 - MPI_COMM_RANK : Null communicator
> [1] Aborting program !
> [1] Aborting program!
> Process aborting...
> IOT Trap
> 0 - MPI_COMM_RANK : Null communicator
> [0] Aborting program !
> [0] Aborting program!
> Process aborting...
> IOT Trap
> -------------------
>
> System:
> Linux cluster 2.4.9-31smp
> Mpich-1.2.1..7b
> -------------------

From the mpich version I guess that you use myrinet gm interface. Make
sure that you load the gm library in the amber MACHINE file.
I have pretty much the same setup, which is running fine at my site.
Below the first part of my MACHINE file for parallel sander.

setenv AMBERHOME /home/flb/amber7
setenv MPICH_HOME /usr/local/mpich-1.2.1..7b

setenv MPICH_INCLUDE $MPICH_HOME/include
setenv MPICH_LIBDIR $MPICH_HOME/lib
setenv MPICH_LIB mpich
setenv GM_LIBDIR /usr/local/gm-1.5.1/lib
setenv GM_LIB gm
#
setenv MACHINE "linux/FreeBSD PC"
setenv MACH Linux
setenv MACHINEFLAGS " -DMPI"

# CPP is the cpp for this machine
setenv CPP "/lib/cpp -traditional -I$MPICH_INCLUDE"

# SYSDIR is the name of the system-specific source directory relative to
src/*/
setenv SYSDIR Machines/standard

# LOADER/LINKER:
setenv LOAD "g77 -O3 -march=athlon-mp"
setenv LOADCC "gcc -O3 -march=athlon-mp"
setenv LOADLIB "-lm -L$MPICH_LIBDIR -l$MPICH_LIB -L$GM_LIBDIR -l$GM_LIB"

setenv G77_COMPAT "-fno-globals -ff90 -funix-intrinsics-hide"
setenv OPT "-O3 -march=athlon-mp -malign-double -ffast-math"

........

>
> Makefile used to create the Amber7 sander giving above error:
> An altered version of "Machine.g77_mpich". Amber7 would only finish
> compiling when the "g77" references in Machine.g77_mpich were changed
> to "mpif77" per a reflector e-mail.
> -------------------
>
> It is noteworthy that when Amber7 is compiled non-parallel using
> Machine.g77 without any alterations, sander runs fine.
> -------------------
> -------------------
>
> Some additional notes...
> Compiling behavior with other machine files:
>
> Attempted compiling with Machine.g77_mpich unaltered gives the
> following type of error:
> g77 -c -g _nxtsec_.f
> ../Compile LOAD -o new2oldparm new2oldparm.o nxtsec.o
> g77 -O6 -o new2oldparm new2oldparm.o nxtsec.o -lm
> -L/usr/local/mpich-1.2.4..8/lib -lmpich
> /usr/local/mpich-1.2.4..8/lib/libmpich.a(gmpi_regcache.o): In function
> `gmpi_regcache_init':
> gmpi_regcache.o(.text+0x1e): undefined reference to `gm_hash_hash_ptr'
> gmpi_regcache.o(.text+0x23): undefined reference to `gm_hash_compare_ptrs'
> etc....
> make[1]: *** [new2oldparm] Error 1
> make[1]: Leaving directory `/home/switzer/amber7/src/lib'
> make: *** [install] Error 2

new2oldparm is not a parallel application. Change into the /src/sander
directory and compile only sander for mpi.

>
>
> I have pgf77. Attempted compilation with Machine.pgf77_mpi gives
> errors of the following sort:
> SYSLIB=`../sysdir lib` ; ../Compile LOAD -o sander \
> sander.o ....etc..... ../blas/blas.a ../lib/nxtsec.o $SYSLIB;
> pgf77 -o sander sander.o cshf.o ....etc..... decomp.o
> ../lapack/lapack.a ../blas/blas.a ../lib/nxtsec.o
> /home/switzer/amber7/src/Machines/standard/sys.a -lm
> sander.o: In function `trajene':
> _sander_.f:1084: undefined reference to `mpi_init__'
> _sander_.f:1085: undefined reference to `mpi_comm_rank__'
> _sander_.f:1086: undefined reference to `mpi_comm_size__'
> _sander_.f:1282: undefined reference to `mpi_bcast__'
> ....etc....
> new_time.o(.text+0x2f46): undefined reference to `mpi_send__'
> new_time.o(.text+0x32dd): undefined reference to `mpi_recv__'
> make[1]: *** [sander] Error 1
> make[1]: Leaving directory `/home/switzer/amber7/src/sander'
> make: *** [install] Error 2
> -------------------

For this you first need a mpich version compiled with pgf77. In the
MACHINE file you then also need to refer to the pgf libraries.

Regards

Florian Barth