The kiobuf abstraction was introduced in 2.3 as a low-level way of
representing I/O buffers. Its primary use, perhaps, was to represent
zero-copy I/O operations going directly to or from user space. A number of
problems were found with the kiobuf interface, however; among
other things, it forced large I/O operations to be broken down into small
chunks, and it was seen as a heavyweight data structure. So, in 2.5.43,
kiobufs were removed from the kernel.
This article looks at how to port drivers which used the kiobuf
interface in 2.4. We'll proceed on the assumption that the real feature of
interest was direct access to user space; there wasn't much motivation to
use a kiobuf otherwise.
Zero-copy block I/O
The 2.6 kernel has a well-developed direct I/O capability for block
devices. So, in general, it will not be necessary for block driver writers
to do anything to implement direct I/O themselves. It all "just works."
Should you have a need to perform zero-copy block operations, it's worth
noting the presence of a useful helper function:
struct bio *bio_map_user(struct block_device *bdev,
unsigned long uaddr,
unsigned int len,
int write_to_vm);
This function will return a BIO describing a direct operation to the given
block device bdev. The parameters uaddr and len
describe the user-space buffer to be transferred; callers must check the
returned BIO, however, since the area actually mapped might be smaller than
what was requested. The write_to_vm flag is set if the operation
will change memory - if it is a read-from-disk operation. The returned BIO
(which can be NULL - check it) is ready for submission to the
appropriate device driver.
When the operation is complete, undo the mapping with:
void bio_unmap_user(struct bio *bio, int write_to_vm);
Mapping user-space pages
If you have a char driver which needs direct user-space access (a
high-performance streaming tape driver, say), then you'll want to map
user-space pages yourself. The modern equivalent of
map_user_kiobuf() is a function called get_user_pages():
int get_user_pages(struct task_struct *task,
struct mm_struct *mm,
unsigned long start,
int len,
int write,
int force,
struct page **pages,
struct vm_area_struct **vmas);
task is the process performing the mapping; the primary purpose of
this argument is to say who gets charged for page faults incurred while
mapping the pages. This parameter is almost always passed as
current. The memory management structure for the user's address
space is passed in the mm parameter; it is usually
current->mm. Note that get_user_pages() expects that
the caller will have a read lock on mm->mmap_sem.
The start and len parameters describe the user-buffer to
be mapped; len is in pages. If
the memory will be written to, write should be non-zero. The
force flag forces read or write access, even if the current page
protection would otherwise not allow that access. The pages array
(which should be big enough to hold len entries) will be filled
with pointers to the page structures for the user pages. If
vmas is non-NULL, it will be filled with a pointer to the
vm_area_struct structure containing each page.
The return value is the number of pages actually mapped, or a negative
error code if something goes wrong. Assuming things worked, the user pages
will be present (and locked) in memory, and can be accessed by way of the
struct page pointers. Be aware, of course, that some or all of
the pages could be in high memory.
There is no equivalent put_user_pages() function, so callers of
get_user_pages() must perform the cleanup themselves. There are
two things that need to be done: marking of modified pages, and releasing
them from the page cache. If your device modified the user pages, the
virtual memory subsystem may not know about it, and may fail to write the
pages to permanent storage (or swap). That, of course, could lead to data
corruption and grumpy users. The way to avoid this problem is to call:
SetPageDirty(struct page *page);
for each page in the mapping. Current (2.6.3) kernel code checks to ensure
that pages are not reserved first with code like:
if (!PageReserved(page))
SetPageDirty(page);
But pages mapped from user space should not, normally, be marked reserved
in the first place.
Finally, every mapped page must be released from the page cache, or it will
stay there forever; simply pass each page structure to:
void page_cache_release(struct page *page);
After you have released the page, of course, you should not access it
again.
For a good example of how to use get_user_pages() in a char
driver, see the definition of sgl_map_user_pages() in
drivers/scsi/st.c.
Post a comment
|