Uma simulação com mais de 2 milhões de atomos usando o potencial EAM em uma placa de vídeo da XFX (AMD HD7970 GHz edition com 3GB), tudo em dupla precisão usando LAMMPS com pacote GPU.
(double precision/LAMMPS/AMD SDK APP 2.7/fglrx 9.01.8/Slackware64-current)--------------------------------------------------------------------------
RESULTADOS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Na GPU HD7079 1.05GHz
--------------------------------------------------------------------------
--------------------------------------------------------------------------
- Using GPGPU acceleration for eam:
- with 1 proc(s) per device.
- with OpenCL Parameters for: GENERIC_OCL
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
GPU 1: AMD FX(tm)-8150 Eight-Core Processor , 8 cores, 3.6 GHZ (Double Precision)
--------------------------------------------------------------------------
Initializing GPU and compiling on process 0...Done.
Initializing GPUs 0-1 on core 0...Done.
Setting up run ...
Memory usage per processor = 422.869 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1600 -7159296 0 -6741031.1 18704.149
50 780.71972 -6943740.2 0 -6739647.9 52292.948
100 798.21531 -6948340.9 0 -6739675 51478.838
Loop time of 29.6983 on 1 procs for 100 steps with 2022400 atoms
--------------------------------------------------------------------------
Na CPU FX-8150 com over de 4.086GHz
--------------------------------------------------------------------------
2 by 2 by 2 MPI processor grid
Created 2022400 atoms
Setting up run ...
Memory usage per processor = 96.6497 Mbytes
Step Temp E_pair E_mol TotEng Press
0 1600 -7159296 0 -6741031.1 18704.149
50 780.71972 -6943740.2 0 -6739647.9 52292.948
100 798.21531 -6948340.9 0 -6739675 51478.838
Loop time of 93.0408 on 8 procs for 100 steps with 2022400 atoms
--------------------------------------------------------------------------
AMD SDK APP 2.7 PATCH
--------------------------------------------------------------------------
--- ./make/openclsdkdefs.mk-orig 2012-05-14 04:43:34.000000000 -0300
+++ ./make/openclsdkdefs.mk 2012-11-30 20:41:05.275285082 -0200
@@ -230,7 +230,7 @@
ifdef MINGW
LDFLAGS += -L/usr/X11R6/lib
else
- LDFLAGS += -lpthread -ldl -L/usr/X11R6/lib
+ LDFLAGS += -lpthread -ldl -L/usr/X11R6/lib64 -lX11
endif
LD_LIBDIR_FLAG := -L
LD_SHARED_FLAG := -shared
--- ./samples/opencl/SDKUtil/include/SDKCommon.hpp-orig 2012-05-14 04:43:34.000000000 -0300
+++ ./samples/opencl/SDKUtil/include/SDKCommon.hpp 2012-11-30 14:38:11.671185385 -0200
@@ -18,6 +18,7 @@
#include
#include
#include
+#include
#include
--------------------------------------------------------------------------
GERYON - ocl_timer.h
--------------------------------------------------------------------------
+#define CL_USE_DEPRECATED_OPENCL_1_1_APIS
#include "ocl_macros.h"
#include "ocl_device.h"
these results, I used -O2. But compiling with -O3 can improve a few seconds.
ResponderExcluir