domingo, 2 de dezembro de 2012

LAMMPS no AMD HD7970

Uma simulação com mais de 2 milhões de atomos usando o potencial EAM em uma placa de vídeo da XFX (AMD HD7970 GHz edition com 3GB), tudo em dupla precisão usando LAMMPS com pacote GPU.
(double precision/LAMMPS/AMD SDK APP 2.7/fglrx 9.01.8/Slackware64-current)


--------------------------------------------------------------------------
RESULTADOS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Na GPU HD7079 1.05GHz
--------------------------------------------------------------------------
--------------------------------------------------------------------------
- Using GPGPU acceleration for eam:
-  with 1 proc(s) per device.
-  with OpenCL Parameters for: GENERIC_OCL
--------------------------------------------------------------------------
GPU 0: Tahiti, 256 cores, 2.9 GB, 1.1 GHZ (Double Precision)
GPU 1: AMD FX(tm)-8150 Eight-Core Processor           , 8 cores, 3.6 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPUs 0-1 on core 0...Done.

Setting up run ...
Memory usage per processor = 422.869 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1600     -7159296            0   -6741031.1    18704.149
      50    780.71972   -6943740.2            0   -6739647.9    52292.948
     100    798.21531   -6948340.9            0     -6739675    51478.838
Loop time of 29.6983 on 1 procs for 100 steps with 2022400 atoms

--------------------------------------------------------------------------
Na CPU FX-8150 com over de 4.086GHz
--------------------------------------------------------------------------

  2 by 2 by 2 MPI processor grid
Created 2022400 atoms
Setting up run ...
Memory usage per processor = 96.6497 Mbytes
Step Temp E_pair E_mol TotEng Press
       0         1600     -7159296            0   -6741031.1    18704.149
      50    780.71972   -6943740.2            0   -6739647.9    52292.948
     100    798.21531   -6948340.9            0     -6739675    51478.838
Loop time of 93.0408 on 8 procs for 100 steps with 2022400 atoms


--------------------------------------------------------------------------
AMD SDK APP 2.7 PATCH
--------------------------------------------------------------------------
--- ./make/openclsdkdefs.mk-orig        2012-05-14 04:43:34.000000000 -0300
+++ ./make/openclsdkdefs.mk     2012-11-30 20:41:05.275285082 -0200
@@ -230,7 +230,7 @@
 ifdef MINGW
   LDFLAGS           += -L/usr/X11R6/lib
 else
-  LDFLAGS           += -lpthread -ldl -L/usr/X11R6/lib
+  LDFLAGS           += -lpthread -ldl -L/usr/X11R6/lib64 -lX11
 endif
   LD_LIBDIR_FLAG    := -L
   LD_SHARED_FLAG    := -shared  
--- ./samples/opencl/SDKUtil/include/SDKCommon.hpp-orig 2012-05-14 04:43:34.000000000 -0300
+++ ./samples/opencl/SDKUtil/include/SDKCommon.hpp      2012-11-30 14:38:11.671185385 -0200
@@ -18,6 +18,7 @@
 #include

 #include

 #include

+#include


 #include

 

--------------------------------------------------------------------------
GERYON - ocl_timer.h
--------------------------------------------------------------------------
+#define CL_USE_DEPRECATED_OPENCL_1_1_APIS
 #include "ocl_macros.h"
 #include "ocl_device.h"


Um comentário:

  1. these results, I used -O2. But compiling with -O3 can improve a few seconds.

    ResponderExcluir