OpenACC: �� GPU � ��

� �� 2011 �� OpenACC ? �� CRAY, CAPS � PGI � �� NVIDIA. �� CUDA � OpenCL.

�� , �� , �� , �� , �� . �� , �� GPU �� . �� ? �� OpenMP. �� (�� ) � �� , �� 5 ��, �� . �� ? �� , �� (� �� CUDA �� GPU �� ) � �� .

�� (PGI accelerator � CAPS HMPP) OpenACC �� Fortran. ��, �� -�� #pragma, �� ?acc? � �� , �� , �� . �� 3 ��: parallel, kernels � data.

�

�� :

#include <openacc.h>
#include <stdio.h>
#include <stdlib.h>
void main() {
� int n = 100;
� float a[n][n];
� float b[n][n];
� float c[n][n];
� float elements [n];
� for(int i = 0; i < n; i++)
� �for (int j=0; j<n; j++){
� � a[i][j] = i+j;
� � b[i][j] = 100 + 2 * i;
� }
#pragma acc kernels loop independent
� for(int i = 0; i < n; i++)
� � for (int j=0; j < n; j++){
� � � � for (int k=0; k<n; k++)
� � � � � � � � c[i][j]=+a[i][k]*b[k][j];
� � � � }� � � �
� free(a); free(b); free(c);
} // main

�� , �� CPU �� 15, �� kernels, �� , �� , �� . �� , �� loop, �� , loop �� , �� , �� : independent ? ��, seq ? ��.

�� PGI:

pgcc -Minfo=accel -acc -ta=nvidia -o e:\1.exe e:\2.c
main:
     16, Generating copyout(c[0:100][0:100])
         Generating copyin(a[0:100][0:100])
         Generating copyin(b[0:100][0:100])
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     17, Loop is parallelizable
     18, Loop is parallelizable
     20, Loop carried reuse of 'c' prevents parallelization
         Inner sequential loop scheduled on accelerator
         Accelerator kernel generated
         17, #pragma acc loop gang /* blockIdx.y */
         18, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
         20, CC 1.0 : 17 registers; 68 shared, 4 constant, 0 local memory bytes
             CC 2.0 : 19 registers; 0 shared, 84 constant, 0 local memory bytes

�� ?acc �� OpenACC, � ?Minfo=accel �� . �� : � �� main �� , �� :

16, �� b � a �� CPU;
17 � 18 �� kernel?� (�� CUDA) � �� ;
20, �� ?�? �� .

�� (�� ) � �� , ��, ��, � �� , �� occupancy, �� NVIDIA. �� , �� .

��

�� :

�� parallel �� . ��, �� , �� GPU, �� .
�� kernels ? �� parallel, �� , �� __device__ ��.
�� loop �� . �� .

�� , �� , �� , � �� , �� , �� . �� , �� : a[start:length], �� a ? ��, �� , start ? �� , � length ?�� , �� GPU, �� ; start � length �� (�� Fortran �� ? �� length �� end ? �� ). �� kernels, parallel � data region. �� , �� :

copy ? �� .
copyin ? ��, �� GPU �� , � �� .
copyout ? �� GPU � �� , �� .
create ? �� , �� -�� , �� .
present - �� , �� . �� , �� GPU ��.

��

��, �� OpenACC, �� , �� OpenMP (�� ) ? �� http://openacc.org �� OpenMP. ��, �� , �� . � �� ? �� , �� . � ��, CAPS HMPP �� NVIDIA, �� Intel MIC � �� AMD FirePro.

�� , �� . �� : �� , �� ? �� OpenACC �� . �� , �� -�� . �� ? ��: �� , �� , �� NVIDIA.

� �� , �� OpenACC � �� -�� GPU � �� . � �� , ��- � �� . �� -�� OpenACC ? �� , �� CUDA �� OpenCL.

��: �� , �� ?��-�� ?, �� ?APPLIED PARALLEL COMPUTING? E&R Center

����� ������

OpenACC: ���������� �� GPU � ������� ������� ��������

��� ������������:

������� � ����������

����� � ������

��

OpenACC: �� GPU � ��

�� :

��

��