AMD FX-8150@4.086GHz / GPU XFX HD7970 GHz edition / MPICH 3.0.1 /
Open64 4.5.2 (AMD) / APP SDK 2.8 / LAMMPS 20130121 (compiled with -O3)
Results for Rhodopsin
On the FX-8150
Single core 45.9495 seconds
4 cores 11.94 seconds
8 cores 7.06722 seconds
On the GPU HD7970 GHz edition
1 proc 4.8 seconds
2 procs 3.67 seconds
Results with FirePro3D V8800 and Tesla C2050
https://sites.google.com/site/akohlmey/news-and-announcements/gpuacceleratedlammpsonamdgpus
Results for EAM
1 procs
mpirun -n 1 ./lmp_mpich -sf gpu -c off -v g 1 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.gpu
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
--------------------------------------------------------------------------
Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.
Setting up run ...
Memory usage per processor = 426.905 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1600 -7249920 0 -6826360.6 18704.149
50 780.81547 -7031660.9 0 -6824959.8 52291.364
100 798.21786 -7036295.3 0 -6824987.4 51479.467
Loop time of 27.893 on 1 procs for 100 steps with 2048000 atoms
2 procs
mpirun -n 2 ./lmp_mpich -sf gpu -c off -v g 1 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.gpu
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
--------------------------------------------------------------------------
Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.
Initializing GPU 0 on core 1...Done.
Setting up run ...
Memory usage per processor = 225.757 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1600 -7249920 0 -6826360.6 18704.149
50 780.81547 -7031660.9 0 -6824959.8 52291.364
100 798.21786 -7036295.3 0 -6824987.4 51479.467
Loop time of 22.4219 on 2 procs for 100 steps with 2048000 atoms
mpirun -n 1 ./lmp_cuda -sf cuda -v g 2 -v x 80 -v y 80 -v z 80 -v t 100 < in.eam.cuda
LAMMPS (21 Jan 2013)
# Using LAMMPS_CUDA
USER-CUDA mode is enabled (lammps.cpp:393)
# CUDA: Activate GPU
# Using device 0: GeForce GTX 580
Lattice spacing in x,y,z = 3.615 3.615 3.615
Created orthogonal box = (0 0 0) to (289.2 289.2 289.2)
1 by 1 by 1 MPI processor grid
Created 2048000 atoms
# CUDA: VerletCuda::setup: Allocate memory on device for maximum of 2050000 atoms...
# CUDA: Using precision: Global: 8 X: 8 V: 8 F: 8 PPPM: 8
Setting up run ...
# CUDA: VerletCuda::setup: Upload data...
# CUDA: Total Device Memory useage post setup: 1314.117188 MB
Memory usage per processor = 416.017 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1600 -7249920 0 -6826360.6 18704.149
50 780.81547 -7031660.9 0 -6824959.8 52291.364
100 798.21786 -7036295.3 0 -6824987.4 51479.467
Loop time of 20.0248 on 1 procs for 100 steps with 2048000 atoms
Nenhum comentário:
Postar um comentário