Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.naic.edu/~phil/hardware/nvidia/doc/cudasdk_release_notes_linux.txt
Дата изменения: Thu Oct 8 19:45:57 2009
Дата индексирования: Tue Nov 24 16:12:55 2009
Кодировка:

Поисковые слова: п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п п р п р п р п р п р п р п р п
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA GPU Computing Software Development Kit
Release Notes
Version 2.3 for Linux
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

Please, also refer to the release notes of version 2.3 of the CUDA Toolkit,
installed by the CUDA Toolkit installer.

--------------------------------------------------------------------------------
TABLE OF CONTENTS
--------------------------------------------------------------------------------
I. Quick Start Installation Instructions
II. Detailed Installation Instructions
III. Creating Your Own CUDA Program
IV. Known Issues
V. Frequently Asked Questions
VI. Change Log
VII. Linux Distributions Supported
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
I. Quick Start Instructions
--------------------------------------------------------------------------------

Note: The default installation folder is "~/NVIDIA_GPU_Computing_SDK"

For more detailed instructions, see section II below.

0. Install the NVIDIA Linux display driver by executing the file

a. For 32-bit linux distributions use:
cudadriver_2.3_linux_32_190.09.run

b. For 64-bit linux distributions use:
cudadriver_2.3_linux_64_190.09.run

For information on installing NVIDIA Linux display drivers, please refer to
the NVIDIA Accelerated Linux Driver Set README and Installation Guide:
http://us.download.nvidia.com/XFree86/Linux-x86/1.0-9755/README/index.html

1. Install version 2.3 of the NVIDIA CUDA Toolkit by executing the file
cudatoolkit_2.3_linux_*.run where * corresponds to your Linux distribution

Add the CUDA binaries and lib path to your PATH and LD_LIBRARY_PATH
environment variables.

2. Install version 2.3 of the NVIDIA GPU Computing SDK by executing the file
cudasdk_2.3_linux.run

The installer will prompt you to enter an installation path for the SDK or
accept the default. We will refer to the path you choose as
SDK_INSTALL_PATH.

3. Build the SDK project examples.

cd /C
make

4. Run the examples (32-bit or 64-bit Linux)

cd /C/bin/linux/release
matrixmul

(or any of the other executables in that directory)

See the next section for more details on installing, building, and running
SDK samples.

--------------------------------------------------------------------------------
II. Detailed Installation Instructions
--------------------------------------------------------------------------------

Note: The default installation folder is "~/NVIDIA_GPU_Computing_SDK"

This package consists of a ".run" file. This is a self-extracting archive that
decompresses its contents to a temporary folder and then installs the contents
to a path that you specify. The archive is:

cudasdk_2.3-linux_*.run: NVIDIA GPU Computing SDK Installer

In addition, a NVIDIA Linux Display driver is needed to run CUDA code on an
NVIDIA GPU. CUDA 2.3 requires version 190 or newer version of the linux
NVIDIA Display Driver. Please see the NVIDIA CUDA Toolkit 2.3 release notes
for more details.

For information on installing NVIDIA Linux display drivers, please refer to
the NVIDIA Accelerated Linux Driver Set README and Installation Guide:
http://us.download.nvidia.com/XFree86/Linux-x86/1.0-9755/README/index.html

1. Install version 2.3 of the NVIDIA CUDA Toolkit by executing the file
cudatoolkit_2.3_linux_*.run where * corresponds to your Linux distribution

To install, run the cudatoolkit_2.3_linux_*.run script. You will be prompted
for the path to where you want to put the CUDA files. In the following we will
call this path . It is recommended that you run the
installer as root and use the default install path (/usr/local).

Make sure that you add the location of the CUDA binaries (such as nvcc) to
your PATH environment variable and the location of the CUDA libraries
(such as libcuda.so) to your LD_LIBRARY_PATH environment variable.

In the bash shell, one way to do this is to add the following lines to the
file ~/.bash_profile from your home directory.

a. For 32-bit operating systems use the following paths
PATH=$PATH:/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib

b. For 64-bit operating systems use the following paths
PATH=$PATH:/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib64

Then to export the environment variables add this to the profile configuration
export PATH
export LD_LIBRARY_PATH

2. Install the NVIDIA GPU Computing SDK by executing the file cudasdk_2.3_linux_*.run

To install, run the cudasdk_2.3_linux.run script. You will
be prompted for the path to where you want to put the CUDA SDK. You can
regard the CUDA SDK as user code (it is a set of examples), and therefore
the default installation is in the current user's home directory
(~/NVIDIA_GPU_Computing_SDK). You must either accept the default or specify a path
to which the user has write permissions.

We will refer to the path you choose as SDK_INSTALL_PATH below.

3. Build the SDK project examples.
a. cd /C
b. Build:
- release configuration by typing "make".
- debug configuration by typing "make dbg=1".
- emurelease configuration by typing "make emu=1".
- emudebug configuration by typing "make emu=1 dbg=1".

Running make at the top level first builds libcutil, a utility library used
by the SDK examples (libcutil is simply for convenience -- it is not a part
of CUDA and is not required for your own CUDA programs). Make then builds
each of the projects in the SDK.

NOTES:
- The release and debug configurations require a CUDA-capable GPU to run
properly (see Appendix A.1 of the CUDA Programming Guide for a complete
list of CUDA-capable GPUs).
- The emurelease and emudebug configurations run in device emulation mode,
and therefore do not require a CUDA-capable GPU to run properly.
- You can build an individual sample by typing "make"
(or "make emu=1", etc.) in that sample's project directory. For example:

cd /C/src/matrixMul
make emu=1

And then execute the sample with:
/C/bin/linux/emurelease/matrixMul

- To build just libcutil, type "make" (or "make dbg=1") in the "common"
subdirectory:

cd /C/common
make

4. Run the examples from the release, debug, emurelease, or emudebug
directories located in

/C/bin/linux/[release|debug|emurelease|emudebug].


--------------------------------------------------------------------------------
III. Creating Your Own CUDA Program
--------------------------------------------------------------------------------

Note: The default installation folder is "~/NVIDIA_GPU_Computing_SDK"

Creating a new CUDA Program using the NVIDIA GPU Computing SDK infrastructure is easy.
We have provided a "template" project that you can copy and modify to suit your
needs. Just follow these steps:

1. Copy the template project

cd /C/src
cp -r template

2. Edit the filenames of the project to suit your needs

mv template.cu myproject.cu
mv template_kernel.cu myproject_kernel.cu
mv template_gold.cpp myproject_gold.cpp

3. Edit the Makefile and source files. Just search and replace all occurences
of "template" with "myproject".

4. Build the project

make

You can build a debug version with "make dbg=1", an emulation version with
"make emu=1", and a debug emulation with "make dbg=1 emu=1".

5. Run the program

../../C/bin/linux/release/myproject

(It should print "Test PASSED")

6. Now modify the code to perform the computation you require. See the
CUDA Programming Guide for details of programming in CUDA.


--------------------------------------------------------------------------------
IV. Known Issues
--------------------------------------------------------------------------------

Note: Please see the CUDA Toolkit release notes for additional issues.

1. "ld: cannot find -lglut". On some linux installations (notably default RHEL
4 update 3 installations), building the simpleGL example (and other examples
that use OpenGL) can result in a linking error like the following.

/usr/bin/ld: cannot find -lglut

Typically this is because the SDK makefiles look for libglut.so and not for
variants of it (like libglut.so.3). To confirm this is the problem, simply
run the following command.

ls /usr/lib | grep glut

You should see the following (or similar) output.

lrwxrwxrwx 1 root root 16 Jan 9 14:06 libglut.so.3 -> libglut.so.3.8.0
-rwxr-xr-x 1 root root 164584 Aug 14 2004 libglut.so.3.8.0

If you have libglut.so.3 in /usr/lib, simply run the following command
as root.

ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so

If you do NOT have libglut.so.3 then you can check whether the glut package
is installed on your RHEL system with the following command.

rpm -qa | grep glut

You should see "freeglut-2.2.2-14" or similar in the output. If not, you
or your system administrator should install the package "freeglut-2.2.2-14".
Refer to the Red Hat and/or rpm documentation for instructions.

If you have libglut.so.3 but you do not have write access to /usr/lib, you
can also fix the problem by creating the soft link in a directory to which
you have write permissions and then add that directory to the libarary
search path (-L) in the Makefile.

2. In code sample alignedTypes, the following aligned type does not provide
maximum throughput because of a compiler bug:
typedef struct __align__(16) {
unsigned int r, g, b;
} RGB32;
The workaround is to use the following type instead:
typedef struct __align__(16) {
unsigned int r, g, b, a;
} RGBA32;
as illustrated in the sample.

3. Some Linux distributions may not include the GLU library. You can download this
from this search engine, matching the correct Linux distribution.

http://fr.rpmfind.net/linux/rpm2html/search.php?query=libGLU.so.1&submit=Search+...

4. (SLED11) SUSE Linux Enterprise Edition 11 is missing:
"libGLU", "libX11" "libXi", "libXm"

This particular version of SUSE Linux 11 does not have the proper symbolic links for the following libraries:

a. libGLU
ls /usr/lib | grep GLU

libGLU.so.1
libGLU.so.1.3.0370300

To create the proper symbolic links:

ln -s /usr/lib/libGLU.so /usr/lib/libGLU.so.1

b. libX11
ls /usr/lib | grep X11


libX11.so.6
libX11.so.6.2.0

To create the proper symbolic links:

ln -s /usr/lib/libX11.so.6
/usr/lib/libX11.so


c. libXi
ls /usr/lib | grep Xi

libXi.so.6
libXi.so.6.0.0

To create the proper symbolic links:

ln -s /usr/lib/libXi.so.6 /usr/lib/libXi.so

d. libXm
ls /usr/lib | grep Xm

libXm.so.6
libXm.so.6.0.0

To create the proper symbolic links:

ln -s /usr/lib/libXm.so.6 /usr/lib/libXm.so

--------------------------------------------------------------------------------
V. Frequently Asked Questions
--------------------------------------------------------------------------------

The Official CUDA FAQ is available online on the NVIDIA CUDA Forums:
http://forums.nvidia.com/index.php?showtopic=84440

Note: Please also see the CUDA Toolkit release notes for additional Frequently
Asked Questions.

--------------------------------------------------------------------------------
VI. Change Log
--------------------------------------------------------------------------------

Release 2.3 Beta
* Added PTXJIT sample
- New CUDA SDK sample that illustrates how to use cuModuleLoadDataEx
- Loads a PTX source file from memory instead of file.

Release 2.2.1
* Updated common.mk file to remove -m32 when generating CUBIN output
* Support for PTX output has been added to common.mk
* CUDA Driver API samples: simpleTextureDrv, matrixMulDrv, and threadMigration have been updated to reflect changes:

- Previously when compiling these CUDA SDK samples, gcc would generate a
compilation error when building on a 64-bit Linux OS if the 32-bit glibc
compatibility libraries were not previously installed. This SDK release
addresses this problem. The CUDA Driver API samples have been modified
and solve this problem by casting device pointers correctly before
being passed to CUDA kernels.
- When setting parameters for CUDA kernel functions, the address offset
calculation is now properly aligned so that CUDA code and applications
will be compatible on 32-bit and 64-bit Linux platforms.
- The new CUDA Driver API samples by default build CUDA kernels with the
output as PTX instead of CUBIN. The CUDA Driver API samples now use
PTXJIT to load the CUDA kernels and launch them.
* Added sample pitchLinearTexture that shows how to texture from pitch linear
memory

Release 2.2 Final
* Added Mandelbrot (Julia Set), deviceQueryDrv, radixSort, SobolQRNG, threadFenceReduction
* CUDA Event Blocking Stream Synchronization (CU_CTX_BLOCKING_SYNC) is now supported on Linux
* New CUDA 2.2 capabilities:
- supports zero-mem copy (GT200, MCP79)
* simpleZeroCopy SDK sample
- supports OS allocated pinned memory (write combined memory). Test this by:
> bandwidthTest -memory=PINNED -wc

Release 2.1 Final
* CUDA samples that use OpenGL interop now call cudaGLSetGLDevice after the GL context is created.
This ensures that OpenGL/CUDA interop gets the best possible performance possible.

Release 2.1 Beta
* Added CUDA smokeParticles (volumetric particle shadows samples)
* Note: added cutil_inline.h for CUDA functions as an alternative to using the
cutil.h macro definitions
* For CUDA samples that use the Driver API, you must install the Linux 32-bit
compatibility (glibc) binaries on Linux 64-bit Platforms. See Known issues in about
section IV on how to do this.

Release 2.0 Beta2
* Added simpleVoteIntrinsics (requires GT200)

Release 2.0 Beta
* Updated to the 2.0 CUDA Toolkit
* CUT_DEVICE_INIT macro modified to take command line arguments. All samples now
support specifying the CUDA device to run on from the command line (“-device=n”).
* deviceQuery sample: Updated to query number of multiprocessors and overlap
flag.
* fluidsD3D sample: Renamed to fluidsD3D9 and updated to the new Direct3D
interoperability API.
* multiGPU sample: Renamed to simpleMultiGPU.
* reduction, MonteCarlo, and binomialOptions samples: updated with optional
double precision support for upcoming hardware.
* simpleAtomics sample: Renamed to simpleAtomicIntrinsics.
* simpleD3D sample: Renamed to simpleD3D9 and updated to the new Direct3D
interoperability API.
* 7 new code samples:
dct8x8, quasirandomGenerator, recursiveGaussian, simpleD3D9Texture,
simpleTexture3D, threadMigration, and volumeRender

Release 1.1
* Updated to the 1.1 CUDA Toolkit
* Fixed several bugs in common/common.mk
* Removed isInteropSupported() from cutil: OpenGL interop now works on multi-GPU
systems
* MonteCarlo sample: Improved performance. Previously it was very fast for
large numbers of paths and options, now it is also very fast for
small- and medium-sized runs.
* Transpose sample: updated kernel to use 2D shared memory array for clarity,
and optimized bank conflicts.
* 15 new code samples:
asyncAPI, cudaOpenMP, eigenvalues, fastWalshTransform, histogram256,
lineOfSight, Mandelbrot, marchingCubes, MonteCarloMultiGPU, nbody, oceanFFT,
particles, reduction, simpleAtomics, and simpleStreams

Release 1.0
* Updated to the 1.0 CUDA Toolkit.
* Added 4 new code samples: convolutionTexture, convolutionFFT2D,
histogram64, and SobelFilter.
* All graphics interop samples now call the cutil library function
isInteropSupported(), which returns false on machines with multiple CUDA GPUs,
currently (see above).
* When compiling in DEBUG mode, CU_SAFE_CALL() now calls cuCtxSynchronize() and
CUDA_SAFE_CALL() and CUDA_CHECK_ERROR() now call cudaThreadSynchronize() in
order to return meaningful errors. This means that performance might suffer in
DEBUG mode.

Release 0.9
* Updated to version 0.9 of the CUDA Toolkit.
* Added 7 new code samples: MersenneTwister, MonteCarlo, imageDenoising,
simpleTemplates, deviceQuery, alignedTypes, and convolutionSeparable
* Removed 3 old code samples: convolution, vectorLoads and loadUByte. Convolution
is replaced by convolutionSeparable and vectorLoads and loadUByte are
replaced by alignedTypes.

Release 0.8.1 beta
* Standardized project and file naming conventions. Several project names
changed as a result.
* cppIntegration output now matches the other samples ("Test PASSED").
* Modified transpose16 sample to transpose arbitrary matrices efficiently, and
renamed it to transpose.
* Added 11 new code samples: bandwidthTest, binomialOptions, BlackScholes,
boxFilter, convolution, dxtc, fluidsGL, multiGPU, postProcessGL,
simpleTextureDrv, and vectorLoads.
* Changed object file output directory to reside within each project's
directory rather than in the shared bin directory. This prevents linker
errors caused by different projects having files with identical names.
* Added "make clobber" mode to makefiles to support deleting project obj
directories.
* Consolidated scan_naive, scan_workefficient, and scan_best into a single
"scan" sample that runs all three scan kernels and compares performance.

Release 0.8 beta prerelease 2
* Linux CUDA SDK now uses a self extracting installer (.run)
* Linux CUDA SDK samples now build correctly if CUDA Toolkit is installed to a
path other than /usr/local
* Added matrixmul_drv sample to Linux SDK
* Added CUBINFILES mode to project Makefiles
* simpleCUFFT and simpleCUBLAS samples now work in emulation mode (make emu=1).

Release 0.8 beta prerelease 1
* Second limited release (RHEL 4 update 3 only)
* Several new samples, better release notes

Release 0.2 alpha 2
* First limited release (RHEL 3 update 4 only)


--------------------------------------------------------------------------------
VII. Linux Platform Distributions Supported
--------------------------------------------------------------------------------

OS Platform Support support from 2.2
* Linux distros 32 & 64:
RHEL-4.x (4.7),
RHEL-5.x (5.3),
SLED-10-SP2,
Fedora10,
Ubuntu-8.10,
OpenSUSE 11.1
o gcc 3.4, gcc 4

OS Platform Support added to 2.3
* SLED11
* Ubuntu-9.04

OS Platforms not officially supported in 2.3
* Fedora 9,
Ubuntu-8.04,
OpenSUSE-11.0