Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.arcetri.astro.it/irlab/instr/spectralgpu/doc/spegpu_progress_1.pdf
Дата изменения: Mon Jan 7 18:50:27 2013
Дата индексирования: Tue Feb 5 14:59:39 2013
Кодировка:

Поисковые слова: vallis
The software sp ectrometer SpectralGpu
Advancement Rep ort 1
C. Baffa, E. Giani, G. Comoretto The SpectralGpu project is aimed at the development of a software spectrometer for the Radio wawelengths range using the computation power of the recently available GPU video boards. This memo reports the first progresses of the project up to 11/2012.

1

Functional Design

The SpectralGpu spectrometer gets its input from the digital receiver by means of a dedicated 10 Gbit Ethernet line. The receiver output data rate is tunable from 0.5 to 255 MS/s (106 samples/second) and the native format for each channel is a complex 8+8 bit signed integer (real ad imaginary parts as signed chars). The desired spectral resolution is from 1 to 10 m/s. At a wavelength of 1 cm this translates to about 1 KHz. With a design band of 100MHz, this requirement translates in a 32­64K Fourier Transform.

2

Improvements

We obtained a consistent improvement in term of overall speed. We used many different approaches and the resulting success grad was mixed. We present here some of the work done. A new Hardware We obtained a newer ATI board, the Radeon HD 7970. This new board offers some better characteristic over our previous top line one. In particular the standard core frequency rises from 800 to 950MHz and the core number rises from 1792 to 2048. GPU memory transfer We investigated different approaches to the data transfer from the CPU memory space to the GPU engine. But while we gest some marginal improvements on slower cards, do no produced decisive benefits until now. FFT buffer exchange We do not any more copy the partial FFT output buffer into the input one. Instead we re-spawn the partial FFT kernel with exchanged arguments. So we do not launch the Swapp er kernel. This approach gives less speed boost than expected, as the rescheduling of kernel with different argument is a costly process. High end boards get a lesser bonus, probably due to their better coupling with the host CPU, but the speed-up can be evalued to be up to 10-20%. Single precision We having as source precision we get arithmetic inside performed some tests on realistic simulations of radio data. We verified that 8+8 bits complex data, and with all GPU internal computations in single rounding errors well below data noise. So we will use only single precision GPU.

Convolution improvements We tested some different operations rearrangements for convolution operation. We still do not get any improvements over our present approach.


Native trigonometric functions The decision to use only single precision arithmetic permitted us to test the ATI native transform, that the use of these functions gives on the result an error well below input noise, while giving a 10-15% speed improvement.

3

Performances

We set up a test system to evaluate different boards performance on an uniform environment. We choose a Mint LTS 13 distribution and installed on this 32 bit system the ATI SDK 2.4 and the Ati Driver 12.6 - Catalyst 8.98. The CPU is a Intel Core i5-2400 3.1o GHz. On SpectralGpu program we used also some parameter values suggested by previous experiences. The local work size is fixed to 64. This is probably sub-optimal on smaller boards. We choose to have a FFT transform length of 32K. With this length we can use the more efficient radix-8 version of the algorithm. It is possible that a mixed radix FFT kernel can give a better performance for other lengths. For the HD 6950 board, we needed to set the environment variables GPU MAX HEAP SIZE to 50 and GPU MAX ALLOC PERCENT to 90. In the following tables we list the best results for each GPU and for the different numbers of simultaneous buffers in use. We use either mean processing time per input sample in ns, or the resulting sampling speed, in unit of 106 samples/s. # buffe r s 256 512 1024 1536 2048 # buffe r s 256 512 1024 1536 2048 # buffe r s 256 512 1024 # buffe r s 256 512 1024 ATI GPU F F T t i me ( 0.47 0.48 0.48 0.48 0.48 ATI GPU Read speed (GB/s) FFT time ( 2.69 0.58 3.52 0.58 3.96 0.58 4.24 0.58 4.42 0.58 ATI GPU Read speed (GB/s) FFT time ( 4.38 0.87 5.03 0.88 5.58 0.88 ATI GPU Read speed (GB/s) FFT time ( 2.65 1.80 3.50 1.80 3.98 1.80 Read speed (GB/s) 2.69 3.55 3.85 4.28 4.86 HD 7970 3GB ns ) G P U s p e e d 590 710 780 805 805 HD 7950 3GB ns ) G P U s p e e d 550 655 710 730 745 HD 6950 1GB ns ) G P U s p e e d 460 490 515 HD 7770 1GB ns ) G P U s p e e d 260 280 290 (MS/S) Total Sp eed (MS/S) 433 590 705 755 770 Total Sp eed (MS/S) 410 550 650 680 700 Total Sp eed (MS/S) 400 450 480 Total Sp eed (MS/S) 220 260 280

(MS/S)

(MS/S)

(MS/S)

Version 1.0, December 2012.