New SP2 System Calls Copyright © 1996-1997 Mark Russinovich last updated January 2, 1997 Two New System Calls in SP2 Introduction With NT 4.0 Service Pack 2, Microsoft quietly introduced two new Win32 calls, with corresponding support in NT 4.0's kernel-mode components (NTOSKRNL.EXE). These two calls, ReadFileScatter() and WriteFileGather(), provide Win32 programs with the ability to read data from an uncached file into a discontiguous buffer, and conversely, to write data to an uncached file from a discontiguous buffer. However, these new calls are not referenced from any component shipped as part of SP2. So what are the new calls doing there, and what could have prompted Microsoft to suddenly augment the Win32 API in the second service pack to a major operating system release? The answer is also one that highlights the tight integration between the operating systems and applications groups at Microsoft: the performance of a major Microsoft application, SQL Server, is significantly enhanced by using these new services. The calls are very useful for applications that perform their own data caching such as SQL Server. It is widely known that in the markets that these products occupy, a performance edge held for only a short period of time can be an enormous competitive advantage, and it is clear that Microsoft has given themselves just such an advantage. In this page I hope to level the playing field, and/or inform developers of potentially useful APIs. Microsoft did document these functions on December 6, but they were place in a Knowledge Base article, "Performance Enhancements for SQL Server Under Windows NT 4.0", a title which certainly does nothing to reveal the fact that the text documents two new Win32 functions. This document is divided as follows: * Win32 API This section describes the two new Win32 API functions in a manner similar to the Win32 SDK * NT API This section describes the three new NT API functions for use from a device driver, in a manner similar to the Windows NT DDK * Remarks I wrap up with some notes and observations on using the new APIs. * Example The sample program included demonstrates the use of ReadFileScatter. The Win32 API See the remarks section (at the bottom of this document) for information on how to call these functions. ReadFileScatter The ReadFileScatter function reads data from a non-cached file into a possibly discontiguous buffer, which is described as an array of page-aligned virtual addresses, with each address specifying the location of one page in the buffer. SP2 supports only the asynchronous form of this call, where an overlapped I/O structure is passed as a parameter. The file must be located on a local drive. BOOL ReadFileScatter( HANDLE hFile, // handle of file to read LARGE_INTEGER lpBuffer[], // array describing buffer DWORD nNumberOfBytesToRead, // number of bytes to read LPDWORD lpNumberOfBytesRead, // address of number of bytes read LPOVERLAPPED lpOverlapped // address of structure for data ); Parameters hFile Identifies the file to be read. The file handle must have been opened with GENERIC_READ access, as well as with FILE_FLAG_OVERLAPPED and FILE_FLAG_NO_BUFFERING flags. lpBuffer[] This is a large integer aligned array of virtual addresses that describe the buffer being read into. Each entry in the array represents one page of the buffer and must be a page-aligned address. Enough entries must be present to describe a buffer with pages for the pages indicated by nNumberOfBytesToRead. nNumberOfBytesToRead Specifies the number of bytes to be read from the file. This must be page-size granular (i.e. 4096, 8192, etc.). lpNumberOfBytesRead In the final implementation, this parameter will be used in a manner identical to the use of the same parameter in ReadFile. In the SP2 implementation, this field must be NULL, or the function will fail. lpOverlapped Points to an OVERLAPPED structure. Unlike for ReadFile, in the SP2 implementation of ReadFileScatter, this parameter is required, and consequently the hFile must have been created with FILE_FLAG_OVERLAPPED. See the Win32 SDK or Visual C Help for a description of this data structure. Return Values If the function succeeds, the return value is nonzero. If the function fails, the return value is zero. To get extended error information, call GetLastError. Remarks This function's API is constructed to mirror ReadFile's. However, in SP2 corners were cut with the implementation. Specifically, in ReadFile, lpOverlapped is an optional parameter and its absence will result in synchronous behavior of the call. In SP2's ReadFileScatter, lpOverlapped is mandatory and the synchronous behavior has not been implemented. To use ReadFileScatter the caller must create an event to pass in lpOverlapped, that will be signaled when the operation completes. At that time, GetOverlappedResult should be called to obtain the return status. WriteFileGather WriteFileGather has a prototype identical to ReadFileScatter's, and its SP2 implementation suffers the same limitations. The NT API Whereas it is possible for a Win32 program to obtain the addresses of ReadFileScatter and WriteFileGather, the related NT API functions are currently not accessible to device driver writers. Microsoft must release an updated NTOSKRNL.EXE which exports the functions and an NTOSKRNL.LIB that allows for linking to them. See the remarks section at the bottom for a hack that makes them accessible in an unconventional manner. NtReadFileScatter NTSTATUS NtReadFileScatter( IN HANDLE FileHandle, IN HANDLE Event, /* optional */ IN PIO_APC_ROUTINE UserApcRoutine, /* optional */ IN PVOID UserApcContext, /* optional */ OUT PIO_STATUS_BLOCK UserIosb, /* optional */ IN LARGE_INTEGER BufferDescription[], IN ULONG BufferLength, IN PLARGE_INTEGER ByteOffset, IN PULONG Key /* optional */ ); Parameters FileHandle Handle to a file that has been opened for GENERIC_READ access and no intermediate buffering. The file must be located on a local drive. Event Handle to an event that will be signaled when the operation has completed. UserApcRoutine Points to an APC routine that will be called with the function completes. If this parameter is non-null, then Event must be null and vice versa. UserApcContext If an APC routine is specified, then this parameter will be passed to the routine when the operation completes. UserIosb Pointer to an I/O status structure that will be filled in when the operation completes. BufferDescription An array of large integers that describe the buffer being read into. The array must be large integer aligned, and contain virtual addresses that describe the buffer. Each entry is a large integer virtual address that specifies one page of the buffer. Note that the high DWORD of the large integers are ignored by SP2, but are present for future evolution into 64-bit NT. BufferLength The number of bytes to be read. This value must be an integral multiple of the system page size. ByteOffset Large integer offset in the file at which the read operation will begin. Key Passed directly to the file system, which may use it to control the order of I/O operations. No NT file system currently honors the key parameter. Return Value NtReadFileScatter returns the NTSTATUS value that a lower level driver sets in the I/O status block for the request. Comments While the SP2 Win32 interface to this function (ReadFileScatter), is not completely implemented, NtReadFileScatter is fully implemented. The function is similar to NtReadFile, but performs the additional job of creating an MDL that describes the buffer specified in BufferDescription. The MDL is prepared using a new support function, MmProbeAndLockSelectedPages, which is described below. NtWriteFileGather NtWriteFileGather has the same interface as NtReadFileScatter. MmProbeAndLockSelectedPages VOID MmProbeAndLockSelectedPages( IN OUT PMDL MemoryDescriptorList, IN LARGE_INTEGER PageList[], IN KPROCESSOR_MODE AccessMode, IN LOCK_OPERATION Operation ); Probes the pages specified in PageList, making them physically resident and updating MemoryDescriptorList to describe them. This function has been added to support NtReadFileScatter and NtWriteFileGather. Parameters MemoryDescriptorList Points to an MDL that supplies a virtual address, byte offset, and length. The physical page portion of the MDL is updated when the pages are locked in memory. PageList An array of large integer virtual addresses. Each virtual address must be page aligned. SP2's implementation ignores the upper 32-bits of the addresses, which are present to ease forward transition to 64-bit NT. AccessMode Specifies the access mode in which to probe the arguments, either KernelMode or UserMode. Operation Specifies the type of operation for which the caller wants the access rights probed and the pages locked, one of IoReadAccess, IoWriteAccess, or IoModifyAccess. This function is a wrapper around MmProbeAndLockPages, and simply builds an MDL that describes each page specified in PageList. Remarks Although SP2 does not ship with a Kernel32 .LIB file for linking to the new functions, ReadFileScatter and WriteFileGather can still be used. To obtain their entry points from a Win32 application, use the function GetProcAddress as shown below: ReadFileScatter = (void *) GetProcAddress(GetModuleHandle( "Kernel32.dll" ), "ReadFileScatter" ); A hack that would make the functions accessible from a device driver would be to make function stubs for each that consist of a system call trap with the appropriate system call number. This is left as an exercise for the reader. In reviewing the code for ReadFileScatter and WriteFileGather, it is interesting to note that they are only partially implemented. Its clear that while the NT operating system group fleshed out the NT kernel implementations of the functions, the Win32 team rushed to get the new calls into SP2 and sacrificed a full implementation in order to do so. If you have an application that could benefit from these APIs, please let me know. ---------------------------------------------------------------------------- Download Scatter Example Plus Source (20KB) [Image]