One of the key "enterprise" features added to the 2.6 kernel is
asynchronous I/O (AIO). The AIO facility allows user processes to initiate
multiple I/O operations without waiting for any of them to complete; the
status of the operations can then be retrieved at some later time. Block
and network drivers are already fully asynchronous, and thus there is
nothing special that needs to be done to them to support the new
asynchronous operations. Character drivers, however, have a synchronous
API, and will not support AIO without some additional work. For most char
drivers, there is little benefit to be gained from AIO support. In a few
rare cases, however, it may be beneficial to make AIO available to your
users.
AIO file operations
The first step in supporting AIO (beyond including
<linux/aio.h>) is the implementation of three new methods
which have been added to the file_operations structure:
ssize_t (*aio_read) (struct kiocb *iocb, char __user *buffer,
size_t count, loff_t pos);
ssize_t (*aio_write) (struct kiocb *iocb, const char __user *buffer,
size_t count, loff_t pos);
int (*aio_fsync) (struct kiocb *, int datasync);
For most drivers, the real work will be in the implementation of
aio_read() and aio_write(). These functions are
analogous to the standard read() and write() methods,
with a couple of changes: the file parameter has been replaced
with an I/O control
block (iocb), and they (usually) need not complete the requested
operations
immediately. The iocb argument can usually be treated
as an opaque cookie used by the AIO subsystem; if you need the
struct file pointer for this file descriptor, however, you
can find it as iocb->ki_filp.
The aio_ operations can be synchronous. One
obvious example is when the requested operation can be completed without
blocking. If the operation is complete before aio_read() or
aio_write() returns, the return value should be the usual status
or error code. So, the following aio_read() method, while being
pointless, is entirely correct:
ssize_t my_aio_read(struct kiocb *iocb, char __user *buffer,
size_t count, loff_t pos)
{
return my_read(iocb->ki_filp, buf, count, &pos);
}
In some cases, synchronous behavior may actually be required. The
so-called "synchronous iocb's" allow the AIO subsystem to be used
synchronously when need be. The macro:
is_sync_kiocb(struct kiocb *iocb)
will return a true value if the request must be handled synchronously.
In most cases, though, it is assumed that the I/O request will not be
satisfied immediately by aio_read() or aio_write(). In
this case, those functions should do whatever is required to get the
operation started, then return -EIOCBQUEUED. Note that any work
that must be done within the user process's context must be done before
returning; you will not have access to that context later. In order to
access the user buffer, you will probably need to either set up a DMA
mapping or turn the buffer pointer into a series of
struct page pointers before returning.
Bear in mind also that there can be multiple asynchronous I/O requests active at
any given time. A driver which implements AIO will have to include proper
locking (and, probably queueing) to keep these requests from interfering
with each other.
When the I/O operation completes, you must inform the AIO subsystem of the
fact by calling aio_complete():
int aio_complete(struct kiocb *iocb, long res, long res2);
Here, iocb is, of course, the IOCB you were given when the request
was initiated. res is the usual result of an I/O operation: the
number of bytes transfered, or a negative error code. res2 is a
second status value which will be returned to the user; currently (2.6.0-test9),
callers of aio_complete() within the kernel always set
res2 to zero. aio_complete() can be safely called
in an interrupt handler. Once you have called aio_complete(), you
no longer own the IOCB or the user buffer, and should not touch them again.
The aio_fsync() method serves the same purpose as the
fsync() method; its purpose is to ensure that all pending data are
written to disk. As a general rule, device drivers will not need to
implement aio_fsync().
Cancellation
The design of the AIO subsystem includes the ability to cancel outstanding
operations. Cancellation may occur as the result of a specific user-mode
request, or during the cleanup of a process which has exited. It is worth
noting that, as of 2.6.0-test9, no code in the kernel actually performs
cancellation. So cancellation may not work properly, and the interface
could change in the process of making it work. That said, here is how the
interface looks today.
A driver which implements cancellation needs to implement a function for
that purpose:
int my_aio_cancel(struct kiocb *iocb, struct io_event *event);
A pointer to this function can be stored into any IOCB which can be
cancelled:
iocb->ki_cancel = my_aio_cancel;
Should the operation be cancelled, your cancellation function will be
called with pointers to the IOCB and an io_event structure. If it
is possible to cancel (or successfuly complete) the operation prior to
returning from the cancellation function, the result of the operation
should be stored into the res and res2 fields of the
io_event structure, and return zero. A non-zero return value from
the cancellation function indicates that cancellation was not possible.
Post a comment
|