From:	SMTP%"RELAY-INFO-VAX@CRVAX.SRI.COM" 13-DEC-1993 15:39:58.47
To:	EVERHART
CC:	
Subj:	Desc. of V6.0 np-pool allocator (was Upgrading to VMS V6.0)

X-Newsgroups: comp.os.vms
Subject: Desc. of V6.0 np-pool allocator (was Upgrading to VMS V6.0)
Message-ID: <1993Dec10.163454.23298@peavax.mlo.dec.com>
From: claborn@vmsdev.enet.dec.com
Date: Fri, 10 Dec 1993 16:33:13 GMT
Sender: usenet@peavax.mlo.dec.com (USENET News System)
Organization: Digital Equipment Corporation
Keywords: pool, V6.0
Lines: 71
To: Info-VAX@CRVAX.SRI.COM
X-Gateway-Source-Info: USENET

Based on some previous postings to the V6.0 stream, there appears to be some
confusion regarding the structure of non-paged pool in V6.0 and all versions
of OpenVMS for AXP. Here's a summary (The Guide to Performance Management has 
some more details):

Basically, the 4 previously distinct virtual regions of non-paged pool (SRP,
IRP, LRP and variable) are now combined into one virtual region defined in the
usual fashion by the parameters NPAGEDYN and NPAGEVIR. Internal to the
allocator are 80 (not 1) lookaside lists that span a contiguous allocation range
from 1 to 5120 bytes. In other words, all allocation requests of size 5120 bytes
or less can potentially be satisfied very quickly from a lookaside list. These
lists are completely self-tuning; they adapt automatically to the varying
demands of your workload. There are no knobs to spin to manually tune them.
We were able to eliminate 10 SYSGEN parameters.

In order to investigate various allocation algorithms, we built a trace-driven
simulator that allowed us (using previously collected trace files from various
workloads) to measure things like list hit rate and memory consumption in a 
completely repeatable fashion. The simulator showed us that the old scheme
(SRPs, IRPs, LRPs and variable) achieved list hit rates between 60% and 92%, the
remaining requests being satisfied by trolling through variable pool. For all
traces we took, the new allocator consistently achieved a list hit rate of 99%
within the first 5 minutes of simulated execution time and averaged 99.8% over
the whole trace.

Turns out, the old allocator's hit rate (and hence performance) is so low with
recent traces because there's a significant number of requests for packet
sizes that fall between the upper bound for IRPs and the lower bound for LRPs.
File control blocks and lots of DECnet OSI packets fall into this no-man's land.
You could change LRPMIN to grab these, but at a fairly severe memory penalty.
Also, in configurations running FDDI, the required packets are *much* larger
than ethernet packets (typically what LRPs are configured to handle) and hence,
all these big packets must be alloc. from variable pool; very painful.  The new
allocator merely populates the appropriate list on the fly. 

It does this by deallocating packets by size. When a request is made, the
appropriate list is checked (if req. size <= 5120) and a packet removed if one
is available.  If not, the allocation is handled by variable pool. When the
packet is returned, the size is checked and if <= 5120, it is hung onto the
corresponding list. By this simple algorithm, high use lists get quickly 
populated to meet the demand of the workload; low use lists don't. In the
background, there's a thread that gently returns packets from lists back to
variable pool in order to reduce memory consumption. It does this by 
periodically (once every 30 seconds) returning one and only one packet from each
list with two or more packets. High use lists never even feel this reclamation;
low use lists over time get their memory returned to variable pool for general
consumption.

There are two versions of the allocator, a fast MIN streamlined version (no 
poolchecking, no statistics maintenance) which is loaded when the SYSGEN param.
POOLCHECK = 0, the default; and there's a MON version that does poolchecking and
stat. maint. when POOLCHECK <> 0. When the MON version is loaded,  you can get
an idea of what your workload's request histogram looks like from the SDA
command:
SDA> SHOW POOL /STATISTICS
There's another SDA command available when the MON version is loaded:
SDA> SHOW POOL /RING_BUFFER
This displays a log of the last 512 requests (both alloc. and dealloc.) to
non-paged pool and is useful for system / driver code debugging when looking for
a memory leak or corruption (trust me on this one ;^)). These two commands are
inactive when the MIN version is loaded as there's no stat. maintenance.

Well... This turned out to be a bit longer than planned, but I hope some of you
will find the information useful.
						Regards, George

George H. Claborn (V6.0 memory mgt. project leader)
OpenVMS Engineering
Digital Equipment Corporation
110 Spitbrook Road
Nashua, New Hampshire 03062   USA