Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.naic.edu/~phil/hardware/nvidia/doc/cudatoolkit_release_notes_linux.txt
Дата изменения: Thu Oct 8 19:45:41 2009
Дата индексирования: Tue Nov 24 16:14:33 2009
Кодировка:
Поисковые слова: обвмадеойс нефептощи рпфплпю

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA CUDA
Linux Release Notes
Version 2.3
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

On some Linux releases, due to a GRUB bug in the handling of upper
memory and a default vmalloc too small on 32-bit systems, it may be
necessary to pass this information to the bootloader:

vmalloc=256MB, uppermem=524288

Example of grub conf:

title Red Hat Desktop (2.6.9-42.ELsmp)
root (hd0,0)
uppermem 524288
kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/1 rhgb quiet vmalloc=256MB
pci=nommconf
initrd /initrd-2.6.9-42.ELsmp.img

--------------------------------------------------------------------------------
New Features
--------------------------------------------------------------------------------

Hardware Support
o See http://www.nvidia.com/object/cuda_learn_products.html

Platform Support
o Continued OS support
- RHEL 4.x, 5.x
- Fedora 10
- SLED 10 SP2
- Ubuntu 8.10
o Additional OS support
- Ubuntu 9.04
- SUSE Linux 11.1
o Eliminated OS support
- Fedora 9
- Ubuntu 8.04
- OpenSUSE Linux 11.0

CUFFT Features
o Performance enhancements
o Double precision
- CUFFT now supports double-precision transforms, with types and
functions analagous to the existing single-precision versions.
Similarly, the "cufftType" enumeration (used in calls like
cufftPlan1d) has expanded to include double-precision identifiers:

Precision: Single Double
Type: cufftReal cufftDoubleReal
Type: cufftComplex cufftDoubleComplex

cufftType: CUFFT_R2C CUFFT_D2Z
cufftType: CUFFT_C2R CUFFT_Z2D
cufftType: CUFFT_C2C CUFFT_Z2Z

Function: cufftExecC2C cufftExecZ2Z
Function: cufftExecR2C cufftExecD2Z
Function: cufftExecC2R cufftExecZ2D

- The double-precision versions are invoked in an identical manner to
the single-precision ones, obviously with arguments changed from the
single- to the double-precision types. See "cufft.h" for exact
definitions of the above.

CUDA-GDB Features
o Available now on all supported Linux platforms
o Included in the toolkit installer

Cross-Compilation Support
o Support compilation of 32bit applications on 64bit hosts.

Double Handling by the Compiler
o when a ptx file with an sm version prior to sm_13 contains double
precision instructions, ptxas now emits a warning that double precision
instructions are demoted to single precision. ptxas has a new option
--suppress-double-demote-warning to suppress this warning

--------------------------------------------------------------------------------
Major Bug Fixes
--------------------------------------------------------------------------------

C++ Support for Device Emulation
o Support is restored for using C++ code in device emulation mode

--------------------------------------------------------------------------------
Known Issues
--------------------------------------------------------------------------------

o GPU enumeration order on multi-GPU systems is non-deterministic and
may change with this or future releases. Users should make sure to
enumerate all CUDA-capable GPUs in the system and select the most
appropriate one(s) to use.

o Individual GPU program launches are limited to a run time
of less than 5 seconds on a GPU with a display attached.
Exceeding this time limit causes a launch failure reported
through the CUDA driver or the CUDA runtime. GPUs without
a display attached are not subject to the 5 second run time
restriction. For this reason it is recommended that CUDA is
run on a GPU that is NOT attached to an X display.

o In order to run CUDA applications, the CUDA module must be
loaded and the entries in /dev created. This may be achieved
by initializing X Windows, or by creating a script to load the
kernel module and create the entries.

An example script (to be run at boot time):

#!/bin/bash

modprobe nvidia

if [ "$?" -eq 0 ]; then

# Count the number of NVIDIA controllers found.
N3D=`/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l`
NVGA=`/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l`

N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i;
done

mknod -m 666 /dev/nvidiactl c 195 255

else
exit 1
fi

o When compiling with GCC, special care must be taken for structs that
contain 64-bit integers. This is because GCC aligns long longs
to a 4 byte boundary by default, while NVCC aligns long longs
to an 8 byte boundary by default. Thus, when using GCC to
compile a file that has a struct/union, users must give the
-malign-double
option to GCC. When using NVCC, this option is automatically
passed to GCC.

o It is a known issue that cudaThreadExit() may not be called implicitly on
host thread exit. Due to this, developers are recommended to explicitly
call cudaThreadExit() while the issue is being resolved.

o For maximum performance when using multiple byte sizes to access the
same data, coalesce adjacent loads and stores when possible rather
than using a union or individual byte accesses. Accessing the data via
a union may result in the compiler reserving extra memory for the object,
and accessing the data as individual bytes may result in non-coalesced
accesses. This will be improved in a future compiler release.

o OpenGL interoperability
- OpenGL cannot access a buffer that is currently
*mapped*. If the buffer is registered but not mapped, OpenGL can do any
requested operations on the buffer.
- Deleting a buffer while it is mapped for CUDA results in undefined behavior.
- Attempting to map or unmap while a different context is bound than was
current during the buffer register operation will generally result in a
program error and should thus be avoided.
- Interoperability will use a software path on SLI
- Interoperability will use a software path if monitors are attached to
multiple GPUs and a single desktop spans more than one GPU
(i.e. X11 Xinerama).

o Sending sigkill (ctrl-c) to an application that is currently running a
kernel on the GPU may not result in a clean shutdown of the process as the
kernel may continue running for a long time afterwards on the GPU. In such
cases, a system restart may be necessary before running further CUDA or
graphics applications.

--------------------------------------------------------------------------------
Open64 Sources
--------------------------------------------------------------------------------

The Open64 source files are controlled under terms of the GPL license.
Current and previously released versions are located via anonymous ftp at
download.nvidia.com in the CUDAOpen64 directory.

--------------------------------------------------------------------------------
Revision History
--------------------------------------------------------------------------------

07/2009 - Version 2.3
06/2009 - Version 2.3 Beta
05/2009 - Version 2.2
03/2009 - Version 2.2 Beta
11/2008 - Version 2.1 Beta
06/2008 - Version 2.0
11/2007 - Version 1.1
06/2007 - Version 1.0
06/2007 - Version 0.9
02/2007 - Version 0.8 - Initial public Beta

--------------------------------------------------------------------------------
More Information
--------------------------------------------------------------------------------

For more information and help with CUDA, please visit
http://www.nvidia.com/cuda