The 2.6 kernel makes extensive use of per-CPU data - arrays containing one
object for each processor on the system. Per-CPU variables are not suitable for
every task, but, in situations where they can be used, they do offer a
couple of advantages:
- Per-CPU variables have fewer locking requirements since they are
(normally) only accessed by a single processor. There is nothing
other than convention that keeps processors from digging around in
other processors' per-CPU data, however, so the programmer must remain
aware of what is going on.
- Nothing destroys cache performance as quickly as accessing the same
data from multiple processors. Restricting each processor to its own
area eliminates cache line bouncing and improves performance.
Examples of per-CPU data in the 2.6 kernel include lists of buffer heads,
lists of hot and cold pages, various kernel and networking statistics
(which are occasionally summed together into the full system values), timer
queues, and so on. There are currently no drivers using per-CPU values,
but some applications (i.e. networking statistics for high-bandwidth
adapters) might benefit from their use.
The normal way of creating per-CPU variables at compile time is with this
macro (defined in <linux/percpu.h>):
DEFINE_PER_CPU(type, name);
This sort of definition will create name, which will hold one
object of the given type for each processor on the system. If the
variables are to be exported to modules, use:
EXPORT_PER_CPU_SYMBOL(name);
EXPORT_PER_CPU_SYMBOL_GPL(name);
If you need to link to a per-CPU variable defined elsewhere, a similar
macro may be used:
DECLARE_PER_CPU(type, name);
Variables defined in this way are actually an array of values. To get at a
particular processor's value, the per_cpu() macro may be used; it
works as an lvalue, so so code like the following works:
DEFINE_PER_CPU(int, mypcint);
per_cpu(mypcint, smp_processor_id()) = 0;
The above code can be dangerous, however. Accessing per-CPU variables can
often be done without locking, since each processor has its own private
area to work in. The 2.6 kernel is preemptible, however, and that adds a
couple of challenges. Since kernel code can be preempted, it is possible
to encounter race conditions with other kernel threads running on the same
processor. Also, accessing a per-CPU variable requires knowing which
processor you are running on; it would not do to be preempted and moved to
a different CPU between looking up the processor ID and accessing a per-CPU
variable.
For both of the above reasons, kernel preemption usually must be disabled when
working with per-CPU data. The usual way of doing this is with the
get_cpu_var and put_cpu_var macros. get_cpu_var
works as an lvalue, so it can be assigned to, have its address taken, etc.
Perhaps the simplest example of the use of these macros can be found in
net/socket.c:
get_cpu_var(sockets_in_use)++;
put_cpu_var(sockets_in_use);
Of course, since preemption is disabled between the calls, the code should
take care not to sleep. Note that there is no version of these macros
for access to another CPU's data; cross-processor access to per-CPU data
requires explicit locking arrangements.
It
is also possible to allocate per-CPU variables
dynamically. Simply use these functions:
void *alloc_percpu(type);
void free_percpu(const void *);
alloc_percpu() will allocate one object (of the given type) for
each CPU on the system; the allocated storage will be zeroed before being
returned to the caller.
There is another set of macros which may be used to access per-CPU data
obtained with kmalloc_percpu(). At the lowest level, you may use:
per_cpu_ptr(void *ptr, int cpu)
which returns (without any concurrency control) a pointer to the per-CPU
data for the given cpu. For access to a local processor's data,
with preemption disabled, use:
get_cpu_ptr(ptr)
put_cpu_ptr(ptr)
With the usual proviso that you do not sleep between the two. No comments have been posted.
Post one now
|