Unraveling the Mysteries of Writing a Winsock 2 Layered Service Provider. -- MSJ, May 1999

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

May 1999

Unraveling the Mysteries of Writing a Winsock 2 Layered Service Provider
Wei Hua, Jim Ohlund, Barry Butterklee

Extending base transport functionality using a layered transport service provider is a very powerful idea. Layered service providers implement only higher-level custom communication functions and rely on an existing underlying base provider for data exchange with a remote endpoint.

This article assumes you're familiar with C++, Winsock, Overlapped I/O

Code for this article: layered.exe (86KB)
Wei Hua and Barry Butterklee are developer support engineers for Microsoft Technical Support, specializing in network programming. Jim Ohlund is a software test engineer for the Microsoft Proxy Server team.

One of the most interesting yet least understood features of Winsock 2 is the service provider interface (SPI). Unlike the well-known Winsock 2 API, which is covered by numerous books, documentation, and samples, the Winsock 2 SPI is relatively unexplored. The Winsock 2 SPI is implemented by network transport service providers and namespace resolution service providers. You can use the Winsock 2 SPI to extend an existing transport service provider by implementing a layered service provider (LSP). For example, quality of service (QOS) on Windows® 98 and Windows 2000 is implemented as an LSP over the TCP/IP protocol stack. Another use for LSP would be to develop specialized URL filtering software to prevent Web browsers from accessing certain sites, regardless of the browser installed on a desktop.

    Winsock 2 was introduced with the release of Windows NT® 4.0; it is also available on Windows 95 (as an add-on package), Windows 98, and Windows 2000. The specification for the Winsock 2 SPI is available either from the Platform SDK or ftp://ftp.microsoft.com/bussys/Winsock/Winsock2. The only sample code is a "layered" sample from the Microsoft® Platform SDK.

    Let's unravel the mystery of the Winsock 2 SPI by taking an inside look at the layered sample. First, we'll go over the background you'll need; then we'll examine the layered sample and how you can extend it.
Background Information

    Winsock 2 adheres to the Windows Open Service Architecture (WOSA) model (see Figure 1). In effect, the WOSA architecture allows third-party service providers to be plugged in without the need for application developers to rewrite their code and without the need to replace the Winsock 2 DLL, ws2_32.dll.

Figure 1 Winsock 2 Architecture

    Figure 1 Winsock 2 Architecture

The Winsock 2 SPI allows you to develop two different types of service providers— transport and namespace. Transport providers (commonly referred to as protocol stacks) are services, which supply functions that set up connections, transfer data, exercise flow control, error control, and so on. Namespace providers are services that associate the addressing attributes of a network protocol with one or more human-friendly names and enable protocol-independent name resolution. The SPI also allows you to develop two types of transport service providers—base and layered service providers.

Base service providers implement the actual
Figure 2 Protocols

Figure 2 Protocols

details of a transport protocol: setting up connections, transferring data, and exercising flow control and error control. Layered service providers implement only higher-level custom communication functions and rely on an existing underlying base provider for the actual data exchange with a remote endpoint (see Figure 2). For example, you could implement a security manager or a bandwidth manager on top of a base TCP/IP stack. Chaining providers can be accomplished as long as the providers at both the upper and lower edges support the Winsock 2 SPI. The MSDN™ Platform SDK layered sample is a generic layered service provider that lies on top of all available base providers once it's installed.

Winsock 2 does not currently allow layering namespace providers. Therefore, you can implement a new namespace provider using the Winsock 2 SPI, but you cannot change or extend the naming, registration, and querying behavior of an existing namespace provider.

We'll only look at SPI functions for developing a layered transport service provider, because base transport providers and namespace providers are typically available from operating system vendors and transport stack vendors. Extending base transport functionality using a layered transport service provider is a very powerful idea. But it should be noted that only applications using the Winsock interface can take advantage of the added functionality. Programs using other networking interfaces are not affected. When we use the term "service provider" here, we are actually referring to the transport service provider, which could be a layered transport service provider or a base transport service provider. We use "LSP" to refer specifically to a layered transport service provider.

The Winsock 2 SPI uses the function prefix naming convention shown in Figure 3.

When you write a service provider, only WSPStartup and NSPStartup are exported as entry points. WSPStartup and NSPStartup actually expose the additional functions that comprise a service provider via a special function dispatch table passed as a function parameter.

Winsock 2 LSPs are implemented as standard Windows DLLs with a single exported entry function, WSPStartup. All other transport SPI functions are made accessible to ws2_ 32.dll or an upper chain layered provider via the LSP's dispatch table, the lpProcTable parameter in the WSPStartup function. The LSP is given ws2_32.dll's upcall dispatch table via the UpcallTable parameter to WSPStartup. For function prototypes, #defines, and type definitions, please see WS2spi.h.

    The WSPPROC_TABLE structure (see Figure 4 ) defines the functions the LSP has to implement and whose function entry pointers are used to fill in the lpProcTable parameter in the WSPStartup call.

int WSPStartup ( WORD wVersionRequested, LPWSPDATAW lpWSPData, LPWSAPROTOCOL_INFOW lpProtocolInfo, WSPUPCALLTABLE UpcallTable, LPWSPPROC_TABLE lpProcTable );

    While it may seem like your LSP has to implement a lot of functions for many situations, you can simply relay the call to a corresponding function service provider immediately below in the protocol chain.

    The WSPUPCALLTABLE structure (see Figure 5) defines the upcall functions from ws2_32.dll that the LSP can call into; they're obtained from the UpcallTable parameter in WSPStartup. WSPUPCALLTABLE is a fixed-length structure. If new upcall functions are added to ws2_32.dll, it is necessary to call GetProcAddress in an LSP's WSPStartup to obtain function pointers to the new upcall functions. Currently only one such additional upcall function exists—WPUCompleteOverlappedResult.
LSP Chaining

    LSPs and base providers are strung together to form a protocol chain. A WSAPROTOCOL_INFOW structure is used to refer to the protocol chain as a whole. It describes the order in which the layered providers are joined. We'll explain the details of this structure later.

    The DLL implementing an LSP is either loaded by another LSP or directly by ws2_32.dll, depending upon its position in the protocol chain. If an LSP is not at the top of the protocol chain, the LSP immediately above it in the chain loads it; otherwise, it is loaded by ws2_ 32.dll. (In the layered sample, the loading of the lower chain provider is done at the first WSPSocket call.) After loading an LSP, WSPStartup for that LSP must be called. The UpcallTable, ultimately retrieved from ws2_32.dll, must be passed down to that LSP (immediately lower in the protocol chain) and that LSP's lpProcTable must then be retrieved.

    When an LSP calls WSPStartup of the next lower provider in the protocol chain, if the next lower layer is another layered provider, the chain's WSAPROTOCOL_INFOW structure must be passed in the WSPStartup call. When the next lower layer is a base protocol (signifying the end of the chain), the chain's WSAPROTOCOL_INFOW structure is no longer propagated downward. Instead, you must retrieve the base provider's WSAPROTOCOL_ INFOW structure and pass it to the base provider's WSPStartup call. Thus, the base provider has no notion of being involved in a protocol chain. The same rules apply to specialty functions (such as WSPStringToAddress and WSPAddressToString) when they pass the WSAPROTOCOL_INFOW structure to the lower chain call. Special care must be taken to ensure that the dwProviderReserved field in the WSAPROTOCOL_ INFOW structure is always propagated downward, even if the next lower layer is a base provider. This is because dwProviderReserved is generated by WSPDuplicateSocket to store context information for a duplicated socket, including the duplicated handle used by WSPSocket to reconstruct the duplicated socket handle.

    It should be noted that the way an LSP's SPI functions are called is the same regardless of where the LSP is located in the protocol chain. As a result, for the sake of simplicity, throughout this article we assume the immediate client of our LSP is ws2_32.dll, and the immediate lower chain provider is a base provider.

    It also should be noted that a service provider's WSPStartup can be called multiple times. Furthermore, ws2_32.dll will call WSPCleanup once for each invocation of WSPStartup that it makes. Therefore, each service provider should implement a per-process reference counter that is incremented in WSPStartup and decremented in WSPCleanup. When the counter reaches zero, the service provider must prepare itself to be unloaded from memory.
Mapping Between Winsock 2 API and SPI Functions

    When a developer calls a Winsock 2 API, ws2_32.dll eventually calls a corresponding Winsock 2 SPI function to carry out the requested functionality using a specific service provider (note the exceptions in the following paragraphs). For example, the select API maps to the WSPSelect SPI, the connect and WSAConnect APIs both map to the WSPConnect SPI, and the accept and WSAAccept APIs both map to the WSPAccept SPI. But not all Winsock APIs have a corresponding SPI.

    Support functions like htonl, htons, ntohl, and ntohs are implemented only within ws2_32.dll, and are not passed down to service providers. The same holds true for the WSA versions of these functions.

    Conversion functions such as inet_addr and inet_ntoa are implemented only within ws2_32.dll.

    All the TCP/IP-specific name conversion and resolution functions in Winsock 1.1, such as getXbyY, WSAAsyncGetXByY and WSACancelAsyncRequest, as well as gethostname, are implemented within ws2_32.dll.

    Winsock service provider enumeration and the blocking hook related functions are implemented within ws2_32.dll; thus WSAEnumProtocols, WSAIsBlocking, WSASetBlockingHook, and WSAUnhookBlockingHook do not appear as SPI functions.

    Since error codes are returned along with SPI functions, equivalents of WSAGetLastError and WSASetLastError are not needed in the SPI.

    The event object manipulation and wait functions, including WSACreateEvent, WSACloseEvent, WSASetEvent, WSAResetEvent, and WSAWaitForMultipleEvents are mapped directly to native Windows operating system services, and are not present in the SPI.
LSP Socket Creation and IFS Handles

    There are three types of socket handles: socket handles returned by base providers to LSP, socket handles returned to ws2_32.dll by LSP, and socket handles in user applications returned by ws2_32.dll.

    The ws2_32.dll keeps an association list of socket handles retrieved from an LSP and the socket handles returned to user applications. An LSP should operate in a similar fashion and keep an association list of the socket handles retrieved from the base provider and the socket handles returned to ws2_32.dll. This allows an LSP to find the corresponding lower layer socket given a socket handle for this layer and ensures proper closure of all base socket handles when an LSP is unloaded.

    Before looking into which function an LSP should use to generate a handle to return to ws2_32.dll, we need to first take a look at Installable File System (IFS) handles. When a socket has an IFS handle, it can be used in file I/O functions to perform Winsock recv and send calls. On Windows NT, IFS handles can be added to I/O completion ports (IOCP) to achieve scalability. Providers with IFS handles indicate this via the XP1_IFS_HANDLES attribute bit in the WSAPROTOCOL_INFOW structure. All Microsoft base providers implement sockets as IFS handles. An LSP cannot create socket handles that are true IFS handles since you can't implement an IFS in an LSP. However, the socket handles returned to ws2_32.dll by calling either WPUCreateSocketHandle or WPUModifyIFSHandle can be used in file I/O calls.

    An LSP gets a base provider's socket handle by calling the base provider's WSPSocket. If a base provider proposes an IFS handle, an LSP can call WPUModifyIFSHandle to generate a modified handle to return to ws2_32.dll. The modified handle is indistinguishable from the proposed handle as far as the operating system is concerned. In fact, the LSP can simply choose to use only the modified handle in all of its internal processing. The catch is that the LSP cannot post-process overlapped I/O using WSPSend (WriteFile), WSPSendTo, WSPRecv (ReadFile), WSPRecvFrom, or WSPIoctl. To allow additional processing in the LSP after an overlapped I/O completes in any of the above calls, or if the base provider handle is not an IFS handle, WPUCreateSocketHandle has to generate a handle to return to ws2_32.dll.

    As a convenience to LSP implementors, one of the input parameters supplied by an LSP in WPUCreateSocketHandle is a DWORD context value. The Winsock 2 DLL associates this context value with the allocated socket handle and allows the LSP to retrieve the context value at any time via the WPUQuerySocketHandleContext call. A typical use for this context value would be to store a pointer to an LSP-maintained data structure containing the base provider socket handle. The socket handles created with WPUCreateSocketHandle are also indistinguishable from true file system handles. However, any layered service provider that uses this technique should still identify itself as a non-IFS provider by not setting the XP1_IFS_HANDLES flag in the provider information structure. An application can use this flag to indicate whether it should use file system calls.

    If ReadFile and WriteFile are used on such providers, ws2_32.dll will have to perform additional parameter marshaling and user/kernel mode transitions. To be exact, such providers have unavoidable additional mode transitions even in the case of WSASend and WSARecv, although ReadFile and WriteFile will have more of them. The layered sample allows pre- and post-processing of all Winsock I/O calls. Therefore, it uses the WPUCreateSocketHandle to create a socket handle to return to ws2_32.dll.

    Up to Windows NT 4.0 SP3, you can only use the WPUCreateSocketHandle function from the context of an administrator account. This problem prevents large-scale deployment of an LSP on Windows NT. Windows 95 and Windows 98 do not have this restriction, as they don't employ the Windows NT security model. Windows NT 4.0 SP4 has addressed the WPUCreateSocketHandle problem. This is really good news to LSP developers.
LSP Socket I/O

    There are three basic I/O models in Winsock 2: blocking, nonblocking, and overlapped. I/O operations occurring on IOCPs by nature use overlapped I/O. We'll go over each I/O model using WSPRecv; the details are analogous for WSPSend.

    Blocking I/O is the simplest form in Winsock 2. Any I/O operation with a blocking socket will not return until the operation has completed. Therefore, any thread can only execute one I/O operation at a time. When the LSP WSPRecv is called in blocking fashion by ws2_32.dll, the lpOverlapped parameter will be NULL. The LSP only needs to relay the call to the base provider WSPRecv call. The LSP's WSPRecv will only return when the base WSPRecv completes.

    Even though blocking I/O is easy to implement, you still have to consider backward compatibility with Winsock 1.1 blocking hooks. The WSASetBlockingCall and WSACancelBlockingCall calls are removed from the Winsock 2 API specification. WSPCancelBlockingHook can still be called by ws2_32.dll if a Winsock 1.1 application calls the WSASetBlockingHook and WSACancelBlockingCall functions. In the LSP, you can simply relay the WSPCancelBlockingHook call to the base providers call. If you're implementing a base provider and a blocking call is in progress, then you must implement a mechanism to call the WPUQueryBlockingCallback function periodically.
Nonblocking I/O

    If a socket is in a nonblocking mode, any I/O operation must either complete immediately or return the error code WSAEWOULDBLOCK, indicating that the operation cannot be finished right away. In the latter case, a mechanism is needed to discover when it is appropriate to try the operation again. A set of network events have been defined for this purpose; they can be polled or waited on using WSPSelect or be registered for asynchronous delivery by calling WSPAsyncSelect or WSPEventSelect.

    In the case of WSPSelect, three fd_sets (readfds, writefds, and exceptfds) are passed down from ws2_32.dll. The LSP WSPSelect needs to find the base provider sockets by calling WPUQuerySocketHandleContext and creating its own read, write, and except fd_set for the base sockets. It then calls the base provider WSPSelect. When the base WSPSelect returns, it has to translate the base socket's fd_set to the originally received fd_sets. As a consequence of the architecture of Winsock 2 service providers, select in user applications can only correctly handle sockets from the same service provider in a single call.

    The LSP WSPEventSelect implementation is actually trivial. As the user application event object is passed down by ws2_32.dll, the LSP only needs to translate the ws2_32.dll socket handle to the corresponding base socket handle and relay the call to base providers WSPEventSelect. When the expected I/O can be invoked, the base provider directly signals the event object, and user applications are then notified of the arrival of the network events.

    The LSP WSPAsyncSelect implementation is a little more involved. Since an LSP can't rely on the user application's message pump to pump socket notification messages, typically the LSP has to create another window and have a dedicated worker thread to pump messages. In the layered sample, when WSPAsyncSelect is first called, a worker thread along with a hidden window is created and the user application's window handle and notification message are stored. After translating the ws2_ 32.dll socket handle to the base provider handle, the new hidden window and a new message are used to call the base provider's WSPAsyncSelect. When the new message is received indicating the expected I/O is available, the window procedure of the hidden window sends the user application's notification message to the user application's window using WPUPostMessage.

    There is actually another practical reason that your LSP should use an intermediate window. When you >
Transfer interrupted!
ws 9x, due to the way wsock2.vxd and winsock.dll are implemented an LSP will not get the notification message if the user app window is passed to the base WSPAsyncSelect directly.
Overlapped I/O

    All Winsock 2 transport providers must support overlapped I/O. Sockets capable of overlapped I/O in an LSP are created via the WSPSocket function with the WSA_FLAG_ OVERLAPPED flag set, and follow the overlapped I/O model established in Win32®. Overlapped I/O is the most complicated among all three Winsock 2 I/O models.

    The ws2_32.dll keeps track of all overlapped I/O in a user application, including those occurring on an IOCP. When the user application initiates an overlapped WSARecv, from an LSP perspective, its WSPRecv is called by ws2_32.dll and the lpOverlapped parameter is not NULL. After socket handle translation, it needs to call a base provider's WSPRecv and must get notified when the base provider's WSPRecv completes. In turn, it needs to send completion notification to ws2_32.dll. In other words, the LSP has to deal with two levels of completion notifications: one from the base provider to the LSP and the other from the LSP to ws2_32.dll. It must do all of this without blocking the calling thread.

    Let's examine these two levels of completion notifications in more detail. The first is to issue an overlapped I/O on a base provider and to detect its completion. When an LSP initiates overlapped I/O on a base provider, it has two methods to manage completion of the I/O request: signaling an LSP-specified event object or invoking an LSP-specified completion routine. In both cases, a data structure, WSAOVERLAPPED, is associated with the overlapped operation. The WSAOVERLAPPED structure may be used by the LSP as a place to store a "handle" to the results (such as the number of bytes transferred, updated flags, and error codes) of the overlapped operation. To obtain these results the LSP must invoke the base provider's WSPGetOverlappedResult, passing in a pointer to the WSAOVERLAPPED structure.

    If event-based completion is selected for an overlapped I/O request, the base provider's WSPGetOverlappedResult routine may be used by the LSP to either poll or wait for completion of the overlapped operation at the base provider. An LSP may also use other methods (such as WSAWaitForMultipleEvents) to wait until the corresponding event object has been signaled.

    If a completion routine is specified for an overlapped I/O request, only the polling option of WSPGetOverlappedResult is available. Since a Winsock 2 completion routine is invoked via an asynchronous procedure call (APC) mechanism, the LSP needs to put the thread initiating the I/O into an alertable wait state to allow the completion routine to be invoked after the base provider overlapped I/O completes. Therefore, the LSP needs to call a wait function (such as WSAWaitForMultipleEvents or SleepEx) and specify the fAlertable parameter to TRUE in the wait call. Once completion has been indicated, the LSP may invoke the base provider's WSPGetOverlappedResult with the expectation that the call will complete immediately.

    Figure 6 summarizes the completion semantics for an overlapped socket showing the various combinations of the lpOverlapped, hEvent, and lpCompletionRoutine parameters in WSPRecv.

    On Windows NT, there is actually one more mechanism an LSP could use to get notified of a base provider's overlapped I/O completion. An LSP could add base provider socket handles to an IOCP, initiate an overlapped I/O at the base provider, and then call GetQueuedCompletionStatus in a worker thread to retrieve the completion notification of the base provider's overlapped I/O.

    Let's look at the second level of completion notification, in which LSP notifies ws2_32.dll of an overlapped I/O completion. Depending on how an LSP's implementation of WSPRecv is invoked by ws2_32.dll, the LSP could use either WPUCompleteOverlappedRequest or WPUQueueApc.

    WPUCompleteOverlappedRequest is a new function added to Winsock 2 SPI Revision 2.2.2 to support IFS handles. This function can only be used when the user application doesn't use a completion routine for its overlapped I/O. WPUCompleteOverlappedRequest simply lets the LSP notify ws2_32.dll that an overlapped I/O has completed; ws2_32.dll will use one of the above completion mechanisms to notify the user application of the completion. WPUCompleteOverlappedRequest is not supported in the Winsock 2 implementation on Windows NT 4.0 SP3 and earlier releases. Windows 95 with the latest Winsock 2 add-on, Windows 98, Windows NT 4.0 SP4, and Windows 2000 support this function in ws2_32.dll.

    For those earlier releases of Windows NT, an LSP may call SetEvent to signal the event handle in the user application-specified overlapped structure. There is a certain amount of overhead incurred when an LSP calls WPUCompleteOverlappedRequest to support IFS handles. If IFS handle support is not a requirement, it's more efficient to call SetEvent in the LSP to directly signal the completion of an overlapped I/O to the user application.

    If a user application specifies a completion routine for an overlapped I/O, ws2_32.dll calls the LSP's WSPRecv SPI and passes down the user-specified completion routine in the lpCompletionRoutine parameter. It is then the LSP's responsibility to arrange for invocation of this completion routine when the underlying I/O has completed. Since the completion routine must be executed in the context of the same thread that initiated the overlapped operation, it cannot be invoked directly from the LSP. An LSP needs to arrange for a completion routine to be executed in the proper thread context by calling WPUQueueApc. This function can be called from any process and thread context, even a context different from the thread and process used to initiate the overlapped operation.

    WPUQueueApc takes as input parameters a pointer to a WSATHREADID structure, a pointer to an APC function to be invoked, and a 32-bit context value that is subsequently passed to the APC function. LSPs are always supplied with a pointer to the proper WSATHREADID structure via the lpThreadId parameter to the overlapped function. The LSP should store the WSATHREADID structure locally and supply a pointer to this copy of the WSATHREADID structure as an input parameter to WPUQueueApc. Once the WPUQueueApc function returns, the provider can dispose of its copy of the WSATHREADID.

    The procedure WPUQueueApc simply queues sufficient information to call the indicated APC function with the given parameters, but does not call it directly. When the target thread enters an alertable wait state, this information is dequeued and a call is made to the APC function in that target thread and process context. Under some circumstances, an LSP may need to initiate and complete overlapped operations from within an internal worker thread. In this case, a WSATHREADID would not be available from an incoming function call. The SPI provides an upcall, WPUOpenCurrentThread, to obtain a WSATHREADID for the current thread. When this WSATHREADID is no longer needed, its resources should be returned by calling WPUCloseThread.

    As mentioned, the WSAOVERLAPPED structure provides a communication medium between the initiation of an overlapped I/O operation and its subsequent completion. The WSAOVERLAPPED structure is designed to be compatible with the Win32 OVERLAPPED structure (see Figure 7).

    Since the layered sample is not an IFS provider per se, it uses a WSAOVERLAPPED structure to carry the state information about an overlapped I/O that occurred. The Internal member is set to WSS_OPERATION_IN_PROGRESS when a base overlapped I/O is invoked. When the overlapped I/O completes and no user completion function is used, the Internal member is set with the value in the Offset member. The Offset member is used to report the error code of the base overlapped I/O. The OffsetHigh member is used to carry the flags of the Winsock I/O (lpFlags parameter in WSPRecv). The InternalHigh member is used to report the number of bytes transferred. When a user completion function is used, everything else is the same, except the Internal member is set to the function pointer of the user completion function and is submitted as a context for WPUQueueApc.

    Winsock 2 supports a debug/trace mechanism that allows developers to trace Winsock 2 procedure calls, procedure returns, parameter values, and return values. Parameter values and returns can be altered on procedure-call or procedure-return. The debug/trace dt_dll.dll exposes two functions, WSAPreApiNotify and WSAPostApiNotify. The basic idea of implementing Winsock 2 debug/trace in an LSP is that when a WSP function is entered, the dt_dll.dll's WSAPreApiNotify function is the first call made, and when the WSP function exits, the WSAPostApiNotify function is the last call made. A detailed description of Winsock 2 debug and trace facilities can be found in the dbgspec.doc in the Platform SDK sample, DT_DLL.
Installing a Layered Transport Service Provider

    Now that you understand how an LSP is implemented, let's look at how to build an LSP installation program. The installation program simply configures your LSP in the Winsock 2 system configuration database, which is a catalog of all installed service providers. The configuration database lets Winsock 2 know that your service provider exists and defines the type of service you are providing. Winsock 2 uses the database to determine what transport service providers it needs to load when a Winsock application creates a socket. The ws2_32.dll searches the database for the first provider that matches socket input parameters of the socket or WSASocket API calls, such as address family, type of socket, and protocol. Once an appropriate matching entry is found, ws2_32.dll loads the appropriate service provider DLL that is specified in the catalog.

    You need four functions to successfully install and manage a service provider entry into the database. Each function begins with the WSC prefix:
WSCEnumProtocols
WSCInstallProvider
WSCWriteProviderOrder
WSCDeInstallProvider

    These functions query and manipulate the database using a WSAPROTOCOL_INFOW structure (see Figure 8). For installation of an LSP, you're primarily concerned with the ProviderId, dwCatalogEntryId, and ProtocolChain fields. The ProviderId field is a globally unique identifier that allows you to define and install your provider on any system. The dwCatalogEntryId field simply identifies each WSAPROTOCOL_INFOW catalog entry structure in the database. The ProtocolChain field determines if a WSAPROTOCOL_INFOW structure is a catalog entry for a base provider, layered provider or provider protocol chain. The ProtocolChain field is a WSAPROTOCOLCHAIN structure:

typedef struct { int ChainLen; /* the length of the chain, /* length = 0 means layered provider, /* length = 1 means base provider, /* length > 1 means protocol chain */ DWORD ChainEntries[MAX_PROTOCOL_CHAIN]; /* a list of dwCatalogEntryIds */

} WSAPROTOCOLCHAIN, FAR * LPWSAPROTOCOLCHAIN;

The ChainLen field determines if a catalog entry represents a base provider, layered provider, or defines a protocol chain. A protocol chain is a catalog entry that defines how you position your layered provider as a service provider between Winsock and other service providers (see Figure 2). A ChainLen field equaling zero indicates a layered provider, 1 indicates a base provider, and anything greater than 1 indicates a protocol chain. Layered and base providers have only a single catalog entry per provider in the database.

    The final field, ChainEntries, is an array of catalog IDs used for describing the order to load service providers in protocol chain catalog entries. When Winsock searches the catalog for an appropriate service provider during socket creation, it only looks for protocol chain and base provider catalog entries. Layered provider catalog entries are ignored by ws2_32.dll, and exist only to link up a protocol chain for protocol chain catalog entries.

    When you install an LSP over a service provider, you need to create two WSAPROTOCOL_INFOW catalog entry structures, one to represent your layered provider and the other to represent a protocol chain linking your layered provider to a base provider. These two structures are typically initialized with properties of an existing service provider's WSAPROTOCOL_INFOW catalog entry structure, which can be retrieved by calling WSCEnumProtocols. After initialization, you need to install your layered provider catalog entry using WSCInstallProvider, then retrieve the catalog ID assigned to this structure after installation. The catalog entry then may be used in setting up a protocol chain catalog entry linking your layered provider to another provider. WSCInstallProvider is called to install the chained provider (see Figure 9).

    Notice in Figure 9 that the PFL_HIDDEN flag is specified in the WSAPROTOCOL_INFOW structure for the layered provider. This flag ensures that WSAEnumProtocols does not include the catalog for the layered provider in its returned buffer.

    The LSP is now technically installed on the system. Consider how Winsock 2 searches the database for service providers. Most Winsock applications decide what protocol they plan to use through the parameters of a call to a socket or the WSASocket APIs. For example, if you create a socket using the address family AF_INET and type in SOCK_ STREAM, Winsock 2 will search for the first available TCP/IP protocol chain or base provider catalog entry in the database that provides this functionality. When you install a protocol chain for the layered provider using WSCInstallProvider, the catalog entry automatically becomes the last entry in the database. To make the new chain the default TCP/IP provider, you must reorder the providers in the database and place the protocol chain catalog entry ahead of other TCP/IP providers by calling WSCWriteProviderOrder. You can view how providers are installed and ordered in the catalog by executing the sporder.exe utility that can be found in the Platform SDK. It is important to have sporder.dll in the path, or sporder.exe will fail.

    Figure 10 shows the Winsock 2 configuration after installing the layered sample on a typical computer. The entry LAYERED_PROVIDER shown here represents the layered provider catalog entry, and Layered MSAFD Tcpip [TCP/IP] represents a protocol chain linking the layered provider to the base provider MSAFD Tcpip [TCP/IP].

Figure 10 Winsock 2 Configuration

    Figure 10 Winsock 2 Configuration

As the number of shipping LSPs grows, it's possible that an installation program may install an LSP on a system with previously installed LSPs. The installation program needs to decide whether to insert its LSP into existing protocol chains or create new chains by layering only on base providers. You already know how to install an LSP over base providers. To insert an LSP into an existing protocol chain, the installation program needs to use the WSC functions to do the following:

Install the layered provider to get its catalog ID.
Modify the chain provider's WSAPROTOCOL_INFOW by incrementing ProtocolChain.ChainLen and inserting the catalog ID into the intended position in ProtocolChain.ChainEntries.
Remove the existing chain and install the modified chain.

Managing Protocol Chain Order

LSPs have tremendous potential for value-added networking services. But the current Winsock 2 specification does not answer an important question: where to insert an LSP in the protocol chain if there's another one installed. For example, if you need to install a data encryption LSP on a system that already has a URL filtering LSP, it is obvious that the data encryption LSP needs to be inserted below the filtering LSP in the existing protocol chain. But the problem is that the LSP installation program has no way to find out what type of service the existing LSP provides and therefore does not know the proper position to insert itself into the chain. This is not a big concern for a controlled networking environment where administrators decide what LSPs to install and in what order. But the widespread success of an LSP is effectively prohibited because the only safe installation is an LSP over a base provider and making the new chain the default provider for the protocol. Such an approach guarantees the service of the new LSP, but removes the existing LSP as the default provider in the chain.

Another related issue not addressed in the Winsock 2 specification is how existing LSPs can protect themselves from changes in the chaining or be notified when they occur. This issue is not as grave as the first one. In practice, if an LSP protocol chain is not to be modified, the LSP developer could hardcode the chain order within the LSP, and in the installation program install the LSP as a base provider by specifying 1 in the ProtocolChain.ChainLen member of the LSP's WSAPROTOCOL_INFOW structure.

A Walk Through the Layered Sample

Now, let's put everything together and explore the layered sample found in the Platform SDK. Although this sample may look huge, it implements a complete Winsock 2 LSP that you can readily extend.

To build the layered sample, you can simply run the nmake tool on the makefile. The resulting lsp.dll is the layered LSP, and the inst_lsp.exe is the installation executable. Copy the lsp.dll to your Windows system(32) directory and run inst_lsp.exe. Running inst_lsp.exe again removes the layered sample from the Winsock 2 provider catalog and thus uninstalls layered.

    There are several versions of the layered sample. The one discussed here is the latest, which is planned to ship in the Platform SDK release for the Windows 2000 official release. It can be used on Windows 95 with the latest Winsock 2 for Windows 95 add-on installed, Windows 98, and Windows NT 4.0 SP4. It can't be used on Windows NT 4.0 SP3 and earlier releases of Windows NT because the WPUCompleteOverlappedRequest function is not implemented in those releases. To use layered on Windows NT 4.0 releases before SP4, you will need to get the old layered sample from the March 1997 Platform SDK, which uses SetEvent to signal the event handle from the user app-specified overlapped structure. Figure 11 lists the files in the layered sample.
LIST_ENTRY and SINGLE_LIST_ENTRY Two list structures are used in layered, a doubly linked list defined in LLIST.H and a singly linked list defined in the NTDEF.H from the Platform SDK:

typedef struct _LIST_ENTRY { struct _LIST_ENTRY *Flink; struct _LIST_ENTRY *Blink; } LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY; typedef struct _SINGLE_LIST_ENTRY { struct _SINGLE_LIST_ENTRY *Next; } SINGLE_LIST_ENTRY, *PSINGLE_LIST_ENTRY;

LIST_ENTRY is used to store all protocol catalog items, all outstanding I/O on a particular socket, and all outstanding sockets. SINGLE_LIST_ENTRY is used to store a list of preallocated INTERNALOVERLAPPEDSTRUCT structures.

    To allow a structure to be linked by LIST_ENTRY, the structure has to have the following form:

// typedef struct _FOO // { // LIST_ENTRY FooListEntry; • • • // // } FOO, * PFOO;

    Given a pointer to a member FooListEntry, the CONTAINING_RECORD macro gives you back the pointer to the hosting FOO structure.

#define CONTAINING_RECORD(address, type, field) ((type FAR *)( \ (PCHAR)(address) - \ (PCHAR)(&((type *)0)->field)))

Once you understand the CONTAINING_RECORD macro, you should have no problem understanding the rest in LLIST.H.
DT_DLL SPI Function Tracing The layered sample allows you to perform debug tracing of all SPI functions by setting up a Winsock 2 debug/trace DLL. You can use the MSDN Platform SDK sample DT_ DLL with the layered sample. You need to rename the file dt_dll.dll to mydt_dll.dll. Every WSP function in the layered sample calls two special macros named PREAPINOTIFY and POSTAPINOTIFY to hook WSAPreApiNotify and WSAPostApiNotify in the mydt_dll.dll. There is a DEBUG_TRACING flag in the makefile that makes debugging possible. If DEBUG_TRACING is present, the PREAPINOTIFY macro is mapped to WSAPreApiNotify, and the POSTAPINOTIFY macro is mapped to WSAPostApiNotify; otherwise, both macros are mapped to no-op.

    The DTHookInitialize function in DT_HOOK.CPP loads mydt_dll.dll and gets its WSAPreApiNotify and WSAPostApiNotify function entries. The DT hook functions are called in all SPI functions at this layer (implemented in SPI.CPP) and at the base provider layer (implemented in DPROVIDE.H). The DTHookShutdown function unloads mydt_dll.dll. The DllMain function in DLLMAIN.CPP calls DTHookInitialize in DLL_PROCESS_ATTACH and DTHookShutdown in DLL_PROCESS_DETACH.
PROTO_CATALOG_ITEM and DCATALOG The first WSPStartup call in SPI.CPP creates a global DCATALOG object, gProviderCatalog, and calls Initialize in DCATALOG.CPP, which uses WSCEnumProtocols to fill a list of PROTO_CATALOG_ITEM objects of all installed providers in gProviderCatalog->m_protocol_list. The gProviderCatalog->m_local_item points to layered's own protocol catalog item. The gProviderCatalog->FindNextProviderInChain function loads the next provider in the chain; if the next provider is a base provider, FindNextProviderInChain also returns the PPROTO_CATALOG_ITEM for the base provider. Each time WSPStartup is called, the variable gStartupCount is incremented. Each time WSPCleanUp in SPI.CPP is called, gStartupCount is decremented. When gStartupCount reaches zero, WSPCleanUp deletes the gProviderCatalog object.
DSOCKET The DSOCKET object in DSOCKET.CPP is responsible for storing socket operation modes and establishing an association between a socket handle from the base provider and a socket handle proposed to ws2_32.dll. The static member m_socket_list in the DSOCKET class contains a global list of all outstanding DSOCKET objects. It's initialized in the static function DSOCKET::DSocketClassInitialize, which is called in the first WSPStartup call. The m_provider_socket member in a DSOCKET object is the socket handle exposed by the base provider. The m_socket_ handle member is the socket handle proposed to ws2_32.dll and created from the WPUCreateSocketHandle call. The rest of the members in a DSOCKET object store the contexts for different I/O models. For overlapped I/O on Windows NT, m_completion_context contains the Completion key for the IOCP. For WSPAsyncSelect, m_ async_events is a network event bitmask that the user application has registered interest in. m_async_window is the window handle of the lsp.dll's hidden window, which will receive network notification window messages. m_async_message is the base provider's async network notification window message.

    There isn't any context in DSOCKET to reflect WSPEventSelect because the call is simply passed down to the base provider's WSPEventSelect.

    When an SPI function in the layered sample is called by ws2_32.dll and a socket handle is passed to the function, the socket handle is the one previously returned to ws2_32.dll from WPUCreateSocketHandle. When a WPUCreateSocketHandle call is made, it stores the corresponding DSOCKET object pointer as the context. So when given a socket handle from ws2_32.dll, the layered sample calls WPUQuerySocketHandleContext to retrieve the original DSOCKET object. The base provider's socket handle retrieved from the DSOCKET object is used in calling the corresponding base SPI function.
DPROVIDER The DPROVIDER object contains all SPI function entries of a provider. The layered sample creates a DPROVIDER object for the base provider implemented in DPROVIDE.H and DPROVIDE.CPP. The FindNextProviderInChain and LoadProvider functions in DCATALOG can load a DPROVIDER object for the base provider. The base provider object is loaded on the first of any of the following calls: WSPSocket, WSPAddressToString, or WSPStringToAddress in SPI.CPP.

    The DPROVIDE.CPP file also implements a mechanism to support Microsoft extensions to Winsock 2. When WSPIoctl in SPI.CPP is called with the SIO_GET_EXTENSION_ FUNCTION_POINTER flag, DPROVIDER's InterceptExtensions function is called. The WSPTransmitFile and WSPAcceptEx function pointers in SPI.CPP are returned instead of the base TransmitFile and AcceptEx function pointers. The WSPTransmitFile and WSPAcceptEx in SPI.CPP do a socket handle translation and call the base provider's TransmitFile and AcceptEx functions.

    The extension function GetAcceptExSockAddrs is implemented a little differently. Since GetAcceptExSockAddrs doesn't involve a socket handle, and therefore no socket handle translation is needed, the layered sample does not intercept this call. The base provider's GetAcceptExSockAddrs function pointer is not modified in InterceptExtensions and is passed directly up in WSPIoctl(SIO_ GET_EXTENSION_FUNCTION_ POINTER). The last Microsoft Winsock 2 extension function, WSARecvEx, is mapped into WSARecv by mswsock.dll, so it's not handled by InterceptExtensions.

    These Microsoft extension functions are implemented only on Windows NT but not on Windows 9x. To add your own extension functions, simply modify the WSPIoctl function in SPI.CPP to return extension function pointers on SIO_GET_EXTENSION_FUNCTION_POINTER.
DBUFFERMANAGER The first time WSPStartup is called in the SPI.CPP file, a global DBUFFERMANAGER object named gBufferManager is created. Every time WSPStartup is called, the gStartupCount variable gets incremented. When the last WSPCleanup call is made (where gStartupCount variable reaches zero) the gBufferManager object is deleted. When an I/O function is invoked, the gBufferManager->AllocBuffer creates an internal buffer based on the user buffer. The layered sample then uses the internal buffer to invoke the corresponding base provider's I/O function. Upon completion, gBufferManager->CopyBuffer copies the data in the internal buffer back into the original user buffer. The gBufferManager->FreeBuffer is called to free the internal buffer. In the layered sample, AllocateBuffer and CopyBuffer simply use the same user buffer pointer as the internal buffer pointer, and FreeBuffer is a no-op. Your LSP can overwrite these functions to intercept easily and modify the data stream involved in SPI I/O calls.
DASYNCWINDOW When the WSPAsyncSelect function in SPI.CPP is called by ws2_32.dll, the GetAsyncWindow function in SPI.CPP is called to retrieve a global DASYNCWINDOW object named gAsyncWindow. If it's the first time GetAsyncWindow is called, the gAsyncWindow is created and its Initialize function is called to create a worker thread in m_async_thread. In its thread procedure, AsyncThreadProc, a hidden window (m_async_window) is created and a message pump is called. If GetAsyncWindow is called again, the cached gAsyncWindow is returned. When AsyncThreadProc is entered, LoadLibrary is called again on the lsp.dll to add a system load count on lsp.dll, and on exiting the thread procedure, FreelibraryandExitThread is called to decrement the system load count on the lsp.dll. The layered load count gStartupCount is not affected by these two calls. This is because when ws2_32.dll calls the last WSPCleanup, it will attempt to unload the provider. If you have an extra load count on lsp.dll, you can avoid the DLL's prematurely unloading before the thread exits.

    After GetAsyncWindow returns, the Socket->RegisterAsyncOperation function is called to store the user application's async window, async notification message, and the network event bitmask in the Socket object. Then the gAsyncWindow-> RegisterSocket function is called to invoke the base provider's WSPAsyncSelect function using the hidden window, a new message named WM_SELECT_MESSAGE, and the network event bitmask that the user application registers interest in.

    The Socket->RegisterAsyncOperation function does the following:

Socket->m_async_window = hWnd; // user app's async // window Socket->m_async_message = wMsg;// user app's async // message Socket->m_async_events = lEvent; // user apps' async // event

The gAsyncWindow->RegisterSocket function calls the base provider's WSPAsyncSelect using gAsyncWindow->m_ async_window, WM_SELECT_MESSAGE, and lEvent.

    Note that the m_async_window member variable in the Socket object is the user app's window, and the m_ async_message member variable in the Socket object is the user app's asynchronous Winsock notification message. The m_async_window in gAsyncWindow is the hidden window that uses WM_SELECT_MESSAGE as the async message for the base provider's asynchronous Winsock notification message.

    When the window procedure of gAsyncWindow->m_ async_window receives a WM_SELECT_MESSAGE message, the Socket->SignalAsyncEvents function is called, which in turn calls WPUPostMessage to post the original user app's Socket->m_async_message to the original user app's Socket->m_async_window.

DOVERLAPPEDSTRUCTMGR and INTERNALOVERLAPPEDSTRUCT The extended overlapped INTERNALOVERLAPPEDSTRUCT structure includes all the context information for an overlapped I/O operation, including the type of the I/O, buffers, completion routine, socket handles, and so on. When the first overlapped I/O operation is invoked, a global DWORKERTHREAD object named gWorkerThread is created. The Initialize function of gWorkerThread creates a global DOVERLAPPEDSTRUCTMGR object named gOverlappedManager. The gOverlappedManager's Initialize function preallocates a predefined amount (OUTSTANDINGOVERLAPPEDSTRUCTS) of INTERNALOVERLAPPEDSTRUCT in gBufferManager->m_overlapped_ struct_block. OUTSTANDINGOVERLAPPEDSTRUCTS is defined as 1000 in DOVERLAP.H. In other words, if your LSP expects at any moment to have more than a thousand overlapped I/O operations, then you need to increase the OUTSTANDINGOVERLAPPEDSTRUCTS defined in DOVERLAP.H. The gWorkerThread's destructor deletes gBufferManager.

DWORKERTHREAD When the first overlapped I/O operation is invoked, a global DWORKERTHREAD object, gWorkerThread, is created. In its Initialize call, if the IOCP creation succeeds one worker thread is created. This implies the platform is Windows NT. If the IOCP creation fails, which implies the platform is Windows 9x, then a semaphore is created, and the number of threads created are the number of CPUs available in the system.

    On Windows NT, when ws2_32.dll calls a layered sample SPI I/O function in an overlapped fashion, the provider socket handle is added to the IOCP. Then the base provider overlapped call is made and the worker thread waits for completion using the GetQueuedCompletionStatus function. When GetQueuedCompletionStatus returns, the OverlappedCompletionProc function is called to notify ws2_32.dll of the completion. OverlappedCompletionProc calls WPUCompleteOverlappedRequest if the client doesn't provide a completion function; otherwise, WPUQueueApc is used.

    On Windows 9x, when ws2_32.dll calls a layered SPI I/O function in an overlapped fashion, the overlapped structure is added to a queue (gWorkerThread->m_overlapped_ operation_queue) and the semaphore count is incremented. The worker threads wait on the semaphore using WaitForSingleObjectEx. Once the wait call returns, the next overlapped structure in the queue is retrieved and InitiateOverlappedOperation is called, which provides OverlappedCompletionProc as the completion routine and initiates the base provider's overlapped operation. The WaitForSingleObjectEx function that waits on the semaphore also serves another purpose. It uses TRUE as the Alertable parameter to put the thread into an alertable wait, so whenever a previously posted overlapped I/O operation completes, the OverlappedCompletionProc is called. Again, OverlappedCompletionProc calls WPUCompleteOverlappedRequest if the ws2_32.dll doesn't provide a completion function; otherwise, WPUQueueApc is used.

    Note that on entering the worker thread procedure WorkerThreadProc, LoadLibrary is called on lsp.dll to add a system load count on the lsp.dll, and on exit of the WorkerThreadProc, FreelibraryandExitThread is called to decrement the system load count on the lsp.dll. The gStartupCount is not affected by either of these two calls. Again, this is to make sure that lsp.dll is loaded when the worker thread does clean up on exit, even if ws2_32.dll has called the last WSPCleanup and attempted to unload the lsp.dll.
Blocking Hook

    The WSASetBlockingCall and WSACancelBlockingCall calls were removed from the Winsock 2 API specification. However, the WSPCancelBlockingHook function can still be called by ws2_32.dll if the application in which context it is loaded uses the Winsock 1.1 interface.

    In the layered sample DllMain(DLL_PROCESS_ ATTACH), a thread local storage (TLS) index is allocated by the TlsAlloc function. This TLS index is used to store the base provider object so that the base's WSPCancelBlockingHook can be located and called in this layer's WSPCancelBlockingHook. Two macros, SetBlockingProvider and GetBlockingProvider in SPI.CPP, set and get the base provider object to the TLS index. Using the WSPRecv function as an example, SPI.CPP makes the following calls for the blocking WSPRecv:

SetBlockingProvider (Provider); Provider->WSPRecv(…); SetBlockingProvider (NULL);

    The following SPI functions implement a blocking call as shown above: WSPAccept, WSPAcceptEx, WSPConnect, WSPRecv, WSPRecvFrom, WSPSend, and WSPSendTo.

    As the blocking actually takes place at the base provider, the layered sample doesn't have to call WPUQueryBlockingCallback.
WSPCleanup

    WSPCleanup in SPI.CPP simply decrements the gStartupCount variable and does nothing else if gStartupCount is greater than zero. When gStartupCount reaches zero, if either gAsyncWindow and gWorkerThread is not NULL, its Destroy function is called, which eventually will exit the corresponding worker thread. Remember that when a worker thread is entered, an additional LoadLibrary is called, and on exiting the worker thread a matching FreelibraryandExitThread is called. You know that ws2_32.dll also calls FreelibraryandExitThread after it makes the last WSPCleanup to your LSP. It doesn't matter whose FreelibraryandExitThread is called last. The worker thread cleanup code is guaranteed to run when the lsp.dll is still loaded. The last FreelibraryandExitThread call unloads the lsp.dll. The last WSPCleanup also calls DSOCKET::DSocketClassCleanup to clean up the socket list, delete the catalog list gProviderCatalog, and delete the overlapped structure list gBufferManager.

    In previous versions of the layered sample, the program freed an overlapped structure without checking whether the underlying overlapped I/O had completed, which may create a crash situation. In the layered sample that's available today, the sequence of the cleanup is as follows:
Close the worker threads.
Close socket handles so that no more I/O can be posted.
Delete the overlapped structure if the corresponding I/O completes. (Windows NT uses the HasOverlappedIoCompleted macro; Windows 9x waits another 100 milliseconds.)

Putting It All Together

    Now that you have learned all the pieces in the layered sample, it's time for you to put everything together. Study a typical lsp.dll's life cycle:
The lsp.dll is loaded in DllMain (DLL_PROCESS_ ATTACH)
WSPSocket is called
Various types of SPI function calls are made on the socket
WSPCloseSocket is called
WSPCleanup is called
The lsp.dll is unloaded in DllMain(DLL_PROCESS_ DETACH)

    With an understanding of all the pieces of the layered sample, you will find the sample isn't hard to follow at all. You will likely agree with us that all the C++ objects in the layered sample clearly and cleanly implement every specific SPI requirement for a functional LSP. So it's easy to extend those objects to implement your LSP rather than reinvent the wheel.

    You will probably find that implementing an LSP is no longer a daunting and time-consuming task if you simply extend the layered sample. As the release of Windows NT 4.0 SP4 corrects the WPUCreateSocketHandle function bug, we expect there will be many commercial LSPs coming to the market. With your creative ideas to develop service providers for Winsock 2, you may begin exploring these opportunities now!

For related information see:
Writing Windows NT Server Applications in MFC Using I/O Completion Ports at http://msdn.microsoft.com/library/techart/msdn_servrapp.htm.
Also check http://msdn.microsoft.com/developer/default.htm for daily updates on developer programs, resources and events.

From the May 1999 issue of Microsoft Systems Journal

Send feedback on this article. Find support options.