HelixS2S: Benchmark on GPU HD7970 / LAMMPS

AMD FX-8150@4.086GHz / GPU XFX HD7970 GHz edition / MPICH 3.0.1 / Open64 4.5.2 (AMD) / APP SDK 2.8 / LAMMPS 20130121 (compiled with -O3)

Results for Rhodopsin
On the FX-8150
Single core 45.9495 seconds
4 cores 11.94 seconds
8 cores 7.06722 seconds

On the GPU HD7970 GHz edition
1 proc 4.8 seconds
2 procs 3.67 seconds

Results with FirePro3D V8800 and Tesla C2050
https://sites.google.com/site/akohlmey/news-and-announcements/gpuacceleratedlammpsonamdgpus

Results for EAM

1 procs
mpirun -n 1 ./lmp_mpich -sf gpu -c off -v g 1 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.gpu
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.

Setting up run ...
Memory usage per processor = 426.905 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1600     -7249920            0   -6826360.6    18704.149
      50    780.81547   -7031660.9            0   -6824959.8    52291.364
     100    798.21786   -7036295.3            0   -6824987.4    51479.467
Loop time of 27.893 on 1 procs for 100 steps with 2048000 atoms

2 procs
mpirun -n 2 ./lmp_mpich -sf gpu -c off -v g 1 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.gpu
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.
Initializing GPU 0 on core 1...Done.

Setting up run ...
Memory usage per processor = 225.757 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1600     -7249920            0   -6826360.6    18704.149
      50    780.81547   -7031660.9            0   -6824959.8    52291.364
     100    798.21786   -7036295.3            0   -6824987.4    51479.467
Loop time of 22.4219 on 2 procs for 100 steps with 2048000 atoms

mpirun -n 1 ./lmp_cuda -sf cuda -v g 2 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.cuda

LAMMPS (21 Jan 2013)
# Using LAMMPS_CUDA
USER-CUDA mode is enabled (lammps.cpp:393)
# CUDA: Activate GPU
# Using device 0: GeForce GTX 580
Lattice spacing in x,y,z = 3.615 3.615 3.615
Created orthogonal box = (0 0 0) to (289.2 289.2 289.2)
1 by 1 by 1 MPI processor grid
Created 2048000 atoms
# CUDA: VerletCuda::setup: Allocate memory on device for maximum of 2050000 atoms...
# CUDA: Using precision: Global: 8 X: 8 V: 8 F: 8 PPPM: 8
Setting up run ...
# CUDA: VerletCuda::setup: Upload data...
# CUDA: Total Device Memory useage post setup: 1314.117188 MB
Memory usage per processor = 416.017 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1600     -7249920            0   -6826360.6    18704.149
      50    780.81547   -7031660.9            0   -6824959.8    52291.364
     100    798.21786   -7036295.3            0   -6824987.4    51479.467
Loop time of 20.0248 on 1 procs for 100 steps with 2048000 atoms

HelixS2S

sexta-feira, 25 de janeiro de 2013

Benchmark on GPU HD7970 / LAMMPS

Nenhum comentário:

Postar um comentário