Path: news.mitre.org!blanket.mitre.org!philabs!newsjunkie.ans.net!newsfeeds.ans.net!news-was.dfn.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news-peer.sprintlink.net!news-backup-west.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!199.232.56.18!news.ultranet.com!not-for-mail
From: "Jim McCollum" <jimmc@ultranet.com>
Newsgroups: comp.os.ms-windows.programmer.nt.kernel-mode
Subject: KeInsertQueueDpc bug on SMP systems?
Date: Thu, 15 Jan 1998 12:43:00 -0500
Organization: UltraNet Communications, Inc.   http://www.ultranet.com/
Lines: 67
Message-ID: <69lhm3$q2r$1@decius.ultra.net>
NNTP-Posting-Host: 146.115.154.11
X-Complaints-To: abuse@ultra.net
X-Ultra-Time: 15 Jan 1998 17:40:19 GMT
X-Newsreader: Microsoft Outlook Express 4.71.1712.3
X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3

I've been getting IRQL_NOT_LESS_OR_EQUAL bugchecks out of the NT kernel
that, after lots of debugging, appears to me to be a bug in
KeInsertQueueDpc.

Analysis of the crash shows that the DPC queue, the header of which is
located in the processor control region (PCR), is corrupted. The nature of
the corruption is that a DPC has a forward link on the DPC queue in one
processor's PCR while the backward link is linked to the PCR DPC queue of
another processor.

I spent a great deal of time tracking this down, including unassembling the
code in KeInsertQueueDpc, and I've managed to convince myself that
KeInsertQueueDpc is not SMP safe. Here's what happens. Two threads running
on separate processors make simultaneous calls to KeInsertQueueDpc
specifying the same DPC object. Because the DPC is not targetted to a
specific processor, each thread attempts to queue the object to the local
processor's DPC queue. KeInsertQueueDpc performs the following steps (this
is slightly simplified):

1) Raises IRQL to HIGH_LEVEL (31).
2) Acquires a spinlock which is located in the PCR (hence, each thread is
allocating a *different* spinlock).
3) Manipulates the flink/blink in the DPC object to place it on the local
processor's DPC queue.
4) Requests a software interrupt (to force processing of the DPC queues).
5) Releases the spinlock acquired in (2).
6) Restores IRQL.
7) Returns to the caller.

Because the spinlock allocated in step (2) is located in the PCR, each
thread allocates a different spinlock. They then proceed to step 3,
simultaneously attempting to manipulate the list entry in the DPC object,
corrupting them. Both threads then release their respective spinlocks, lower
IRQL and exit KeInsertQueueDpc, leaving behind corrupted processor DPC
queues. When NT later attempts to retire the DPC and remove it from the
queue, it stumbles over the corrupted links and crashes. My driver is
nowhere near the stack, but its DPC object is always implicated in the
resulting corrupted DPC queues.

After staring at this code, it became clear that while the spinlock in step
(2) above protects the DPC queue in the PCR, the DPC object itself is not
protected. I am able to workaround the crash by associating a spinlock with
my DPC object and acquiring it before calling KeInsertQueueDpc. This
prevents multiple threads from going through KeInsertQueueDpc simultaneously
for the same DPC object and the crash disappeared.

I distilled the code from my driver and wrote a small driver with only a few
routines which will crash an SMP system in a matter of minutes, if not
seconds. All this driver does is start up a bunch of threads, which do
nothing but sit in a loop and periodically call KeInsertQueueDpc, specifying
a dummy DPC routine. I then applied the above workaround and sure enough the
crashes went away.

Has anyone else seen these crashes? I'm running 4.0 with service pack 3.
While I am able to prevent these crashes out of my driver with the
workaround, I'm concerned that other NT drivers may unknowingly be
susceptible to this problem.

I'll include the source code for the driver which will crash an SMP system
in a reply to this entry.

Thanks,
Jim McCollum
Marathon Technologies Corporation