›7 I
›11h
›?20 J
›3300t
›1;3300r
›1;65536s
›0d


                                   1-1


                                CHAPTER 1

                       XQP MECHANISM IMPLEMENTATION


  The basic flow of requests into and out of the XQP is described below.


  1.1  Mapping The Code Into P1 And Allocating Impure Storage.

  The F11BXQP.EXE image, which contains only pure  code,  is  mapped  at
  process  creation  time into P1 space.  There is a routine XQPMERGE in
  the SYS facility module PROCSTRT which knows how to do  this.   It  is
  functionally  equivalent  to  a  LIB$P1MERGE  call,  but is optimized;
  SYSINIT has already looked up the image and set up a permanent  global
  section  for  it.   All that happens in XQPMERGE is to map it.  If the
  sysgen parameter ACP_XQP_RES is set, SYSINIT has wired the  code  into
  physical memory and global valid page faults are also avoided.

  Once the code has been mapped, XQPMERGE jumps to  the  lowest  address
  mapped,  the initialization routine.  The module DISPATCH is linked to
  be the first module in the image and the  code  there  does  a  CMKRL,
  specifying  the  INIT_FCP  routine.  This routine does an EXPREG in P1
  space to get the impure storage area allocated, including space for  a
  private  kernel  stack.   It  then locks down the areas for the kernel
  stack and the parts of the impure area  referenced  at  elevated  IPL,
  fabricates  a  channel  for use by the XQP, and initializes some queue
  headers and a handful of other locations in the impure area.  It  also
  notes   the   address   of   the  impure  area  in  the  process  cell
  CTL$GL_F11BXQP.  In addition, the very first process  so  merged  when
  booting  the  system (the SYSINIT process) creates a permanent mailbox
  (ACP$BADBLOCK_MBX, channel MBX_CHAN) for  talking  to  the  bad  block
  scanner.

  ["Alternatively" XQPMERGE (the F11X module) can be called  to  take  a
  given  XQP  (logical  name TESTXQP) and force map it into P1 space and
  jump to its initialization entry-point.]

  Now the file system is ready for business.


                                   1-1


  XQP MECHANISM IMPLEMENTATION


  1.2  Getting To The XQP From QIO

  All file system functions are QIOs.  They  start  off  life  with  QIO
  pre-processing  in  the SYS module SYSQIOREQ.  An IRP is allocated and
  is the basic  argument  block  passed  to  the  file  system  for  all
  functions.   This  IRP  is  first  processed  by  the  file system FDT
  routines who eventually get the request to the XQP, if necessary.


  1.2.1  FDT Routine Processing

  The FDT routines for file system  functions  are  in  the  SYS  module
  SYSACPFDT  (which  also  handles  the  system  ACPs).   These routines
  perform various setup and initialization functions.  The IRP is queued
  (sent) to the XQP for processing by these routines.


  1.2.1.1  Access  and  Create - These  functions   are   performed   by
  ACP$ACCESS.   The steps are as follows.  Check for a file already open
  (CCB$L_WIND non-zero).  Check that  JIB$W_FILCNT  won't  be  exceeded.
  Build  the  XQP  packet.  Check for dismount.  Decrement JIB$W_FILCNT.
  Interlock the channel (increment CCB$L_WIND).  Queue the XQP packet.


  1.2.1.2  De-access - This function is performed by ACP$DEACCESS.   The
  steps are as follows.  Check for a file open (CCB$L_WIND > 0 implies a
  process section (can't be de-accessed in this  way),  <  0  implies  a
  window  pointer, 0 implies no file).  Build the XQP packet.  Interlock
  the channel (increment CCB$L_WIND).   Update  the  volume  transaction
  count.   If this is the only activity on the channel (CCB$W_IOC is 1),
  the XQP packet is queued.  If other activity exists  on  the  channel,
  CCB$W_IOC  is  decremented  to  account  for  this  de-access, but the
  de-access IRP is placed on CCB$L_DIRP so as to be performed  when  the
  channel goes idle.

  The de-access function is special  if  the  WCB  indicates  a  non-FCP
  window  (WCB$B_ACCESS  bit  WCB$M_NOTFCP  set)  or  a shareable window
  (WCB$B_ACCESS bit WCB$M_SHRWCB set).  If the window is  shareable  but
  is  an  FCP window, the reference count (WCB$W_REFCNT) is decremented.
  If the count is non-zero, we simply clear CCB$L_WIND to remove us as a
  user  of  the  window.   If  the count is zero, then the XQP de-access
  packet is queued for processing.  The  processing  of  it  will  clear
  CCB$L_WIND.   If the window is not a FCP window and not shareable, the
  "file" is closed by clearing CCB$L_WIND and incrementing JIB$W_FILCNT.
  If  the  window is both a non-FCP and a shared window, WCB$W_REFCNT is
  decremented and CCB$L_WIND is cleared to remove our reference to it.


                                   1-2


                                            XQP MECHANISM IMPLEMENTATION


  1.2.1.3  Modify  and  Delete - These  functions   are   performed   by
  ACP$MODIFY.  The XQP packet is built, we check for volume mounted, and
  the XQP packet is queued.


  1.2.1.4  Mount - The mount function is  provided  by  ACP$MOUNT.   The
  steps  are  as  follows.   Build  the  XQP  packet.   Check  for MOUNT
  privilege.  Check  for  a  mountable  device  (UCB$V_MOUNTING  set  in
  UCB$W_STS).  Update the volume transaction count.       Queue  the XQP
  packet.


  1.2.1.5  Read  and  Write - The  read  function  is   implemented   in
  ACP$READBLK, write in ACP$WRITEBLK.  The read function checks for read
  access to the file (WCB$B_ACCESS  bit  WCB$V_READ  set).   Read  check
  enabled  is  checked at this time.  The write function checks for file
  write access (WCB$B_ACCESS bit WCB$V_WRITE set).  Write data check  is
  checked  at this time.  Common processing includes checking the access
  to the user's buffer and mapping the virtual block number  range  into
  logical  blocks.   If  the  mapping (at least partially) succeeds, the
  request (now converted to a physical request) is queued to the driver.
  If  the  mapping  fails,  the  request must be queued to the XQP for a
  window turn.  (Non-FCP devices return SS$_ENDOFFILE.)  If  this  is  a
  write  function  and  WCB$V_WRITE_TURN  is set in WCB$W_ACON, then the
  mapping always fails, and the write request is sent to the XQP.   This
  is done for direct writes to directories, and INDEXF, BITMAP and QUOTA
  files.

  Erase functions are coded  as  special  write  block  functions.   The
  difference  is  that  a  user  buffer  is  not involved.  If the erase
  pattern is zero (almost always true), a  pre-allocated  zero  area  is
  used.   (This  pre-allocated  area  consists  of a single zero page of
  memory, but with a page of page table entries  each  pointing  to  it,
  thereby  giving a (up to) 127 page erase buffer.  If the erase pattern
  is non-zero, a page is allocated to hold the pattern (replicated  from
  the  four  bytes) and a page of page table entries is allocated to map
  over it.

  Assuming that the virtual to logical mapping succeeded, the request is
  queued to the driver.  Eventually, the driver will process the request
  and the I/O will  complete.   When  the  I/O  post  interrupt  reaches
  IOC$IOPOST  (SYS module IOCIOPOST), the VBN and count specified by the
  user are checked against the  updated  processed  count.   The  buffer
  virtual  address  is  also updated.  (Note that for an erase function,
  the address is not updated since the buffer is a  pseudo  buffer.)  If
  the  byte count remaining is non-zero (indicating that the request did
  not map completely, could not be  performed  as  a  single  contiguous
  transfer,  or that the request exceeded the capacity of a single I/O),
  the remaining request is re-mapped and re-queued to  the  driver.   If
  the  mapping  fails,  the  request must be queued to the XQP.  In this


                                   1-3


  XQP MECHANISM IMPLEMENTATION


  case, the XQP queuing is done by IOC$WAKACP instead  of  EXE$QIOACPPKT
  (described below).

  Virtual to logical mapping is done by IOC$MAPVBLK (in the  SYS  module
  IOSUBRAMS).  It walks down the WCB list associated with the request to
  locate mapping pointers that locate  the  desired  VBN.   The  routine
  returns  the  starting  LBN  and  number  of bytes that can be mapped.
  Also, the UCB corresponding to the volume holding this extent  (for  a
  volume set) is returned.  On a total mapping failure, the original UCB
  (WCB$L_ORGUCB) is returned.


  1.2.1.6  XQP Packet Building - The XQP packet is built by the internal
  routine  BUILDACPBUF.   This  routine  allocates the space for the XQP
  packet (address placed  in  IRP$L_SVAPTE).   The  COMPLX,  FILACP  and
  VIRTUAL  bits  are set in IRP$W_STS.  Accounting for buffer byte count
  quota is done here.  The user arguments to the  QIO  (FIB,  etc.)  are
  checked for access.  Descriptors for each user argument are created in
  the XQP packet.  The number of descriptors is  placed  in  IRP$L_BCNT.
  Finally, CCB$L_UCB is placed into IRP$L_MEDIA.


  1.2.1.7  Volume Status - The FDT routines insure that the  volume  has
  the  correct  state  for  the request.  The check dismount check makes
  sure  that  the  volume  isn't  being  dismounted  (DEV$V_DMT  set  in
  UCB$L_DEVCHAR)  and then checks for mounted.  The mounted check checks
  that the device is mounted (DEV$V_MNT set in UCB$L_DEVCHAR), that  the
  device  is not a member of a shadow set, that the device is not in the
  dismount state (UCB$V_DISMOUNT set in UCB$W_STS) and that  the  volume
  is not mounted foreign.

  Once the volume checks (if any) pass, the volume transaction count  is
  incremented  (VCB$W_TRANS).   This  is  normally  done  for the volume
  describing the desired UCB, but will be done to the  UCB  on  which  a
  file  is open if the WCB so indicates.  The IRP$L_UCB field is updated
  to this value.  The IRP$L_MEDIA field is updated to this  UCB  if  the
  device is not spooled.


  1.2.2  XQP Packet Processing

  XQP packets (the IRP with the added XQP buffer descriptors) are queued
  to  the  XQP  by IOC$WAKACP (SYS module IOCIOPOST) or by EXE$QIOACPPKT
  (SYS module SYSQIOREQ).  In the case of the queuing of an  XQP  packet
  by  the  FDT  routines (within the context of the requesting process),
  EXE$QIOACPPKT will generate  a  kernel  mode  AST  specifying  as  the
  routine  the value found in F11B$L_DISPATCH.  For the case of I/O post
  processing, in which we are probably not in the context of the  target
  process,  a  special  kernel mode AST is queued to the process with an


                                   1-4


                                            XQP MECHANISM IMPLEMENTATION


  AST routine address of EXE$QXQPPKT (SYS module SYSQIOREQ) and  an  AST
  parameter  of  the  IRP.  EXE$QXQPPKT then performs the queuing of the
  kernel mode AST to the XQP dispatcher.

  To avoid the necessity of allocating an AST control block  (ACB),  the
  CDRP extension to the IRP is used as an ACB.  There is a very definite
  assumption that this area is always there and  not  used  by  anything
  else  for  virtual  I/O  functions.  This area is normally used by the
  disk class driver when processing disk I/O requests.


  1.3  Initial Setup For F11BXQP Code Execution

  When the kernel AST specified above begins execution, we  are  finally
  executing  code  in  the  F11BXQP  image.   The  routine called is the
  DISPATCH routine.  The argument to this routine (the AST parameter) is
  the IRP.  This routine queues the IRP onto a per-process queue.

  If there are no other requests being processed, the normal  case,  the
  routine  enables  the  special  XQP  channel  for  use by stuffing the
  CCB$B_AMOD field so it appears to be a  normal  kernel  mode  channel.
  When  the  XQP  is  not  using it, the CCB$B_AMOD field contains a -1,
  which makes  it  inaccessible  to  anyone  at  any  mode  because  the
  privilege   check   for  channels  in  IOC$VERIFYCHAN  does  a  signed
  comparison  against  access  mode.   The  system   run-down   routine,
  EXE$RUNDWN,  in the SYS module SYSRUNDWN, also does signed comparisons
  against access  mode  to  determine  if  a  given  channel  should  be
  de-assigned,  and  the negative access mode in the special XQP channel
  when the XQP is not actively processing a  request  prevents  it  from
  being de-assigned.

  At this point, the PCB$B_DPC cell is incremented.   This  is  done  to
  prevent  process  deletion  while  any  file  system  request is being
  processed.  The EXE$DELPRC routine, in the SYS module SYSDELPRC, waits
  for  this  cell  to  be  zero  before actually proceeding with process
  deletion.  It waits at IPL 0 to allow kernel ASTs to be delivered,  so
  that  pending file system requests can in fact complete.  Similar code
  in the process  suspension  service  prevents  a  process  from  being
  suspended  until  pending  file  system  requests  are completed.  The
  reason for blocking process suspension while file system requests  are
  active is that random synchronization locks could be held indefinitely
  in that case, potentially locking up the entire cluster.   The  reason
  for  blocking  process deletion while processing a file system request
  is  to  minimize  problems  potentially  caused  by   half   completed
  operations.

  The DISPATCH routine then saves the current kernel  stack  limits  and
  current frame pointer (FP) prior to setting the kernel stack limits to
  be the private XQP kernel stack and setting the stack pointer  to  the
  base  of  the  private  XQP  kernel stack.  It also sets up a register
  (R10, called "BASE") to point to the  impure  area  (the  contents  of


                                   1-5


  XQP MECHANISM IMPLEMENTATION


  CTL$GL_F11BXQP).  All XQP routines run assuming that R10 points to the
  XQP_QUEUE variable in the XQP storage area.  DISPATCH then  calls  the
  DISPATCHER  routine,  and from now on we are on the private XQP kernel
  stack.

  The DISPATCHER routine de-queues the IRP from the queue we just  stuck
  it  on  above and proceeds to call the appropriate routines to execute
  the desired function.  After completing the request,  it  attempts  to
  de-queue another request and process it.  That is how pending requests
  are eventually processed.


  1.4  What Happens When The XQP Needs To Wait For Something

  To stall in the caller's mode, the file system  dismisses  the  kernel
  AST  it  is in when it needs to stall for either I/O or a lock request
  that is queued.  A completion AST resumes  the  thread  of  execution.
  The initial entry into the XQP was also via an AST, so that the entire
  XQP operation is  done  at  AST  level.   XQP  activity  is  generally
  asynchronous  with  respect  to  normal process operation.  The XQP is
  itself a serial function, though.

  Two routines in the DISPATCH module are  used  to  accomplish  stalls.
  Immediately  after  a  QIO  or ENQ request is queued for which the XQP
  must stall, the  WAIT_FOR_AST  routine  is  called.   This  saves  the
  current frame pointer, makes the XQP channel inaccessible as described
  above, restores the previous stack limits and frame pointer and does a
  RET  to dismiss the AST.  Because the frame pointer has been restored,
  the RET picks up where we left off on the original kernel stack.   PMS
  metering is stopped for the duration of the stall.

  The QIO or ENQ request  that  was  queued  specifies  the  the  impure
  pointer  as  the AST parameter, and the routine CONTINUE_THREAD as the
  AST routine.  When the AST arrives at the CONTINUE_THREAD routine, the
  impure  pointer  is  restored from the AST parameter, the stack limits
  are pointed back to the XQP stack, the  saved  XQP  frame  pointer  is
  restored,  the XQP channel is made accessible again, PMS monitoring is
  restarted, and we RET from that  routine,  which  returns  us  to  the
  caller of the WAIT_FOR_AST routine.


  1.5  Finishing Up XQP Processing

  After all the actual work has been done, e.g., a  new  file  has  been
  created  and  all  buffers  written  back  to disk, or a file has been
  accessed,  or  whatever,  the  IO_DONE  routine  is  called  from  the
  DISPATCHER   routine.   IO_DONE  moves  USER_STATUS  into  IRP$L_MEDIA
  (actually a quadword), decrements the transaction count for  the  VCB,
  clears  the name string descriptor length in the complex buffer packet
  to prevent write back of the name, copies the local FIB back into  the


                                   1-6


                                            XQP MECHANISM IMPLEMENTATION


  complex  buffer  packet,  if  any, sets IRP$L_BCNT to ABD$C_ATTRIB for
  non-read functions so that the attributes don't get  written  back  to
  the user's buffer, and calls the CHECK_DISMOUNT routine.

  With the XQP, we are  already  in  the  correct  process  context,  so
  instead  of  issuing an IOPOST interrupt, this routine calls (via JSB)
  the special entry point IOC$BUFPOST in the SYS module IOCIOPOST  which
  executes the same code (resetting PCB quotas, setting up what would be
  the special kernel  mode  AST  completion  routine  (BUFPOST  for  XQP
  functions  requiring  a  complex  buffer (all except window turns) and
  buffer I/O, DIRPOST for direct I/O)) otherwise executed by the  IOPOST
  software interrupt.  After coming back to IO_DONE from that, the event
  flag is posted, and then another JSB executes the special  kernel  AST
  code.   BUFPOST  copies  the IRP described buffers (FIB, etc.) back to
  the user buffers, de-allocates the  complex  buffer,  and  flows  into
  DIRPOST.   DIRPOST updates PHD quotas, decrements the channel activity
  count, sends a de-access request to the XQP if the activity count goes
  to  zero  and  CCB$L_DIRP indicates de-access pending, writes the user
  IOSB, sets the event flag, queues the user AST and ends by freeing the
  I/O packet.

  The IOPOST software interrupt is signaled  instead  if  the  IRP$L_PID
  field  is  negative,  indicating  special  post processing.  It is not
  known if this feature is used.

  Finally, the DISPATCHER routine does a RET which gets us back into the
  DISPATCH  routine  (where  the initial kernel AST came to in the first
  place).  It makes the XQP channel inaccessible again,  decrements  the
  PCB$B_DPC  cell  to  allow  process  deletion  and  suspension  again,
  restores the original kernel stack limits and frame pointer, and  does
  a RET.


  1.6  Request Dispatching

  The next request to be performed is  obtained  by  GET_REQUEST.   This
  routine  also initializes the impure area.  Initialization consist of:
  starting  PMS   monitoring,   zeroing   the   impure   area,   setting
  USER_STATUS[0]  to 1, setting the BFR_LIST queue heads to empty lists,
  setting   the   value   for   PACKET   (current   IRP),   CURRENT_UCB,
  CURRENT_WINDOW   (if  the  low  bit  is  set  (no  real  window)  then
  CURRENT_WINDOW is 0), CURRENT_FIB, CURRENT_UCB and  PRIMARY_FCB  if  a
  window   exists   (window   doesn't   exist   for  access/create/mount
  functions),   ORB,   CURRENT_VCB,   CURRENT_RVT,   CURRENT_RVN,    and
  IO_CCB$L_UCB  (from  CURRENT_UCB),  clearing  the  byte  count for the
  window block descriptor (to prevent I/O  completion  from  writing  it
  back),  setting  the  spoolfile cleanup flag (if IRP$L_MEDIA not equal
  IRP$L_UCB), copying the ARB into LOCAL_ARB, setting SYSPRV flag in ARB
  if  appropriate,  also  VOLOWNER and GROUPOWNER cleanup flags, setting
  SYSPRV cleanup flag if SYSPRV, BYPASS or READALL privileges set.


                                   1-7


  XQP MECHANISM IMPLEMENTATION


  Returning to the main flow of DISPATCHER, the XQP$_FILE_NAME  (message
  buffer)  is  cleared  to  avoid  possible  confusion.   Other  message
  variables are set (FUNC_DESC and SUB1_FUNC_DESC).  A  minimum  set  of
  buffers  are obtained (GET_REQD_BFR_CREDITS) as described under buffer
  management.  The main  function  is  dispatched  upon.   readpblk  and
  writepblk,  acpcontrol  and  mount  functions  are done directly.  All
  others must first check the activity block lock (which blocks all  XQP
  activity  on the volume).  Create/access decision is done here.  After
  the operation has completed, NOTIFY_USER  and  PERFORM_AUDIT  is  done
  (because   of   the   perturbations   that  the  FID_TO_SPEC  call  in
  PERFORM_AUDIT will have).   Cleanup  is  done:   if  status  indicates
  success, then a normal cleanup is done; any error invokes ERR_CLEANUP.
  This is repeated, trying for a successful cleanup for a very large but
  finite  number  of  tries.   UNLOCK_XQP  releases  all XQP locks.  PMS
  monitoring is  ended.   IO_DONE  is  called  as  described  above  The
  activity  block  lock  is  released (refer to the block lock later) if
  taken (BLOCK_CHECK set).


  1.7  CHECK_DISMOUNT

  CHECK_DISMOUNT (in  CHKDMO)  performs  deferred  dismount  processing.
  Walking  down  the  UCB  list  for the volume[set], for those for whom
  DEV$V_DMT are set and the transaction count is 1 (us), a  dismount  is
  performed.

  Dismounting starts by setting the V_DISMOUNT bit under the  IODB  lock
  to  prevent  other  people from trying to start I/O on the volume.  An
  IO$_UNLOAD/IO$_AVAILABLE function is issued.  For shared devices,  get
  the  value  block  for  the  device  lock.   Clear  the  high  bit  of
  UCB$W_DIRSEQ to warn RMS of the volume  dismount  (refer  to  the  RMS
  directory  cache).   Decrement UCB$W_REFC.  Decrement AQB$W_MNTCNT, if
  zero, remove from the AQB list.  De-allocate  all  FCBs,  ACLs,  WCBs,
  de-queue  access  locks (by forcing FCB$W_REFCNT to 0).  De-queue FID,
  extent cache locks, de-allocate cache.   De-queue  quota  cache  lock,
  de-allocate  quota  cache.   De-queue volume lock (VOLLKID).  De-queue
  shadow lock.  For volume sets, clear the  RVT  list  entry,  decrement
  RVT$W_REFC,  if zero de-queue structure lock, de-queue block lock (and
  clear BLOCK_CHECK so DISPATCHER won't try to release the block  lock),
  de-allocate  RVT.  For single volumes, just de-queue the block lock as
  for a volume  set.   De-allocate  VCB.   If  the  device  lock  exists
  (promoted  when getting value block above) demote to CR (keep at EX if
  allocated to someone).  Post de-allocation on dismount  (performed  by
  IOC$DALLOC_DMT in the SYS module IOSUBPAGD).  De-allocate buffer cache
  and AQB.


                                   1-8


                                            XQP MECHANISM IMPLEMENTATION


  1.8  Volume Activity Blocking:

  The ability to cause the file system to stall requests cluster-wide is
  necessary to allow storage and index BITMAP, plus quota cache rebuilds
  while a volume is active and in use.   Any  changes  that  potentially
  modify those structures must be blocked.

  The REBUILD module in the MOUNT facility is used by MOUNT, SET  VOLUME
  /REBUILD,  and  the  DISKQUOTA  utility to rebuild the BITMAPs and the
  QUOTA.SYS file.  ANALYZE /DISK /REPAIR  performs  a  similar  function
  (plus  others)  with  a different chunk of code.  Both of them use the
  ACPCONTROL  LOCK_VOL  control  function  to  prevent  file   creation,
  deletion, extension, and truncation activity.

  The LOCK_VOL function stalls activity with help from a system blocking
  routine  on  a  special  lock called the activity blocking lock (block
  lock).  Its format is

        F11B$b<volset id>

  where <volset id> is a 12 byte unique volume or volume set  identifier
  as discussed under serialization of conflicting activity.

  In the VCB (or RVT for a volume set) there is an activity count field,
  VCB$W_ACTIVITY  or RVT$W_ACTIVITY, and a field to store the lock ID of
  the blocking lock, either VCB$L_BLOCKID or RVT$L_BLOCKID.  A  non-zero
  BLOCKID  and an odd (low bit set) ACTIVITY count means that everything
  is normal and can proceed.  The START_REQUEST routine is  called  from
  the  DISPATCHER  routine to check this before anyone starts taking out
  serialization locks or  doing  anything  interesting.   If  the  above
  conditions   are   true,  START_REQUEST  simply  adds  2  to  ACTIVITY
  (preserving its oddity) and returns.  The  FINISH_REQUEST  routine  is
  called  from DISPATCHER when a request is completed, and subtracts the
  2 out again.

  The F11B$b lock in this situation is a system owned lock  held  in  CR
  mode  with  XQP$BLOCK_ROUTINE  (in  the  SYS  module SYSACPFDT) as the
  blocking routine address and the VCB address as the parameter.

  A process that comes along and does a LOCK_VOL function will call  the
  routine  TAKE_BLOCK_LOCK.   This  will  queue  an EX mode lock for the
  F11B$b lock.  This is of course incompatible with the system owned  CR
  lock,  and  on every node that holds that lock (normally all the nodes
  that  have  the  volume  mounted),  the   lock   manager   calls   the
  XQP$BLOCK_ROUTINE  entry  point  at IPL$_SYNCH with the VCB address in
  R1.  That routine in turn, gets itself pointed to either  the  VCB  or
  RVT  ACTIVITY count, as appropriate.  It then decrements ACTIVITY, now
  making the field even (it was odd).  If that causes it to go to  zero,
  the  volume  is  now idle, and further activity is blocked.  If so, it
  queues an AST to the swapper process, with the lock id of  the  F11B$b
  lock  as the AST parameter, after clearing the BLOCKID field, with the
  AST entry point of XQP$DEQBLOCKER (also in SYSACPFDT).  (The ACB  used


                                   1-9


  XQP MECHANISM IMPLEMENTATION


  in  this  process  is  contained  in the VCB/RVT.) XQP$DEQBLOCKER then
  actually  de-queues  the  system  owned  CR  mode  F11B$b  lock.    If
  XQP$BLOCK_ROUTINE  did not see ACTIVITY go to zero, it simply RSBs and
  the process calling FINISH_REQUEST  that  sees  it  go  to  zero  will
  de-queue  the lock.  Once all nodes release their locks, the volume is
  idle and TAKE_BLOCK_LOCK will get its EX mode lock and  proceed.   The
  START_REQUEST and FINISH_REQUEST routines make their tests and changes
  at IPL$_SYNCH to interlock with the blocking routine correctly.

  Once the ACTIVITY field is even (low bit clear), subsequent callers to
  START_REQUEST  will  not  add  to it, but call the BLOCK_WAIT routine.
  This will queue for the F11B$b lock in PW  mode,  which  will  not  be
  granted until the process that queued for it in EX mode de-queues that
  lock by doing an ACPCONTROL  UNLK_VOL  function.   When  the  EX  mode
  F11B$b lock is de-queued, the first waiting process to get the PW lock
  will convert that lock to the CR mode system owned lock  and  set  the
  low  bit  of the ACTIVITY count, thus allowed things to proceed again.
  In fact, the F11B$b lock is initially armed in  this  fashion  by  the
  first  function  that calls START_REQUEST after the volume is mounted.
  Other users on the same node  waiting  for  the  lock  will  find  the
  blocked  value  non  zero, and so they will de-queue their (redundant)
  version of the lock.

  Another thing that BLOCK_WAIT will do is return the buffer credits  it
  had  already  acquired  while  waiting  for  the  blocking  lock to be
  granted.  This is to avoid having the CACHE_SERVER (discussed  further
  on)  not  able  to flush caches due to lack of available buffers which
  would cause a deadlock.

  Note that the process holding the EX F11B$b  lock  (via  the  LOCK_VOL
  function)  and  cache flushes are allowed to proceed, because they are
  necessary for the rebuild to work.   File  creates,  etc.,  are  still
  prevented via the VCB$V_NOALLOC flag, however.

  Allowing the process holding the block lock  to  proceed  is  done  by
  virtue  of  the  fact  that  the  process  has  a  non-zero  value for
  BLOCK_LOCKID (one of the non impure XQP variables).   DISPATCHER  will
  not  ask  for  the  block  lock before invoking a function.  QUOTAUTIL
  requires the process to hold the block lock to perform  a  modify  use
  function.   All acpcontrol functions avoided the request for the block
  lock in DISPATCHER.  ADD_QUOTA is an  exception,  though.   The  block
  lock  is  obtained in ACPCONTROL.  BLOCK_LOCKID is also checked in the
  lock/unlock acpcontrol functions.  This has the disadvantage,  though,
  that  effectively  only  one  volume  can  be locked at a time.  Also,
  failure of the process to unlock the volume prevents the  volume  from
  ever being unlocked.


                                   1-10


                                            XQP MECHANISM IMPLEMENTATION


  1.9  Secondary Operations

  Various functions require  so  called  secondary  operations.   As  an
  example,  bad  block  processing is done in secondary context to avoid
  confusing the main function.  This secondary context  is  provided  by
  saving  the  CONTEXT_START;CONTEXT_SIZE  area  into/out  CONTEXT_SAVE.
  This is done by  the  SAVE_CONTEXT  and  RESTORE_CONTEXT  routines  in
  GETREQ.   The  secondary  save  area  allows  for  only  one secondary
  operation nested within the primary.  Various  restrictions  apply  to
  secondary context.

  The usages of secondary context area:  operating upon the BADLOG  file
  (SCAN_BADLOG  and  DEALLOCATE_BAD),  marking for deletion a file being
  removed/superseded during a file creation  (CREATE),  opening  a  file
  from  which  attributes  are  being propagated (CREATE), extending the
  index file (EXTEND_INDEX),  opening  a  file  to  determine  placement
  (GET_LOC), extending or compressing a directory (SHUFFLE_DIR).

  The primary context must be restored when done.   Note,  though,  that
  ERR_CLEANUP  will  detect if we were in secondary context and clean up
  secondary  context  first,  and  then  move  onto   primary   context.
  Secondary  context  may  leave  around  buffers waiting to be written.
  However,  secondary  context  must  release  all  serialization  locks
  obtained  in  secondary  context,  and  must  therefore  write out any
  buffers protected by these locks (refer to serialization  of  activity
  and buffer management).  Also, any unrecorded blocks (refer to cleanup
  processing) must be recorded before leaving secondary context.


                                   1-11


                                CHAPTER 2

                  SERIALIZATION OF CONFLICTING ACTIVITY


  The procedure based XQP design requires explicit serialization of file
  system processing.  The distributed lock manager is the mechanism used
  to do this.


  2.1  Naming Of Serialization Lock Resources

  Everything of interest on a disk volume is a file.  Files  are  files.
  The  allocation  storage  BITMAP  is  a  file.  Directories are files.
  Every file has at least one file header.   Every  file  header  has  a
  unique  number  which identifies it.  All the file headers are in fact
  contained in a file called INDEXF.SYS, and knowing what a  given  file
  number  is,  you  can  calculate  what  block  within  the INDEXF file
  contains its file header.  The file header  for  the  INDEXF  file  is
  itself one of the blocks within the INDEXF file.

  Given that everything the file system wants to do has something to  do
  with a file or files, serializing operations by taking out locks based
  on those file numbers is the natural thing to do.

  For example, say you want to rewrite the record attributes (which  are
  contained  in  the file header) of file number 47 to update the end of
  file information (which is part of the record attributes).  First  you
  take  out  an  exclusive  lock  whose name is based on "47" that has a
  parent lock associated with the specific volume.  You  then  read  the
  file  header from the disk into a buffer, modify the record attributes
  in the buffer, and write the changed file header back to  disk.   When
  done,  you release the exclusive lock "47".  Someone else trying to do
  the same thing at the same time has to do the same thing.  First  they
  try  to  obtain an exclusive lock on "47", but you already have it, so
  they wait until you are completely done.   When  you  finally  release
  your lock, they are granted the lock, read the header, etc.

  Locks that are used to serialize processing in this manner  are  known
  as serialization locks.


                                   2-1


  SERIALIZATION OF CONFLICTING ACTIVITY


  Each of the file serialization locks has a parent lock associated with
  the volume.  We need to pick a name for the parent lock that is unique
  for any given  volume  or  volume  set,  and  can  also  be  generated
  identically  from  any  node in the cluster that mounts a given shared
  volume.  The volume label was chosen  somewhat  arbitrarily  over  the
  allocation class form of the device name.  At 12 bytes, it is slightly
  shorter than the possible 15 byte device name, so  that  gives  it  an
  edge.  It is also under the control of the file system.

  In conjunction with the device name, it is used by  MOUNT  to  enforce
  the  requirement that only one volume with a given name can be mounted
  shared at any given time in the cluster.

  Any number of volumes with  the  same  volume  label  can  be  mounted
  privately.   In  that  case,  a  combination of the node name plus the
  address of the UCB for the device forms a name guaranteed to be unique
  throughout the cluster.

  The resource name used is the character string  "F11B$v"  followed  by
  either  the  volume  label  if  mounted  shared,  or the node name-UCB
  address combination if mounted privately.  That is,

        F11B$v<volume identifier>

  The 12 byte volume identifier part of this lock name is  generated  by
  the   routine   GET_VOLUME_LOCK_NAME  in  the  MOUNT  facility  module
  CLUSTRMNT when the volume is mounted.  These 12 bytes  are  stored  in
  the  VCB field VCB$T_VOLCKNAM for subsequent use by MOUNT and the XQP.
  This volume identifier is available using  the  DEVLOCKNAM  item  code
  with  the  $GETDVI  system  service.  That field actually returns a 16
  byte field.  One of the extra bytes is used to distinguish the  shared
  from  privately  mounted  cases  to  preclude  any possible name space
  collisions.  The file system avoids this by using  the  node  specific
  lock  as  a parent for privately mounted volumes.  The node lock id is
  contained in the exec  cell  EXE$GL_SYSID_LOCK.   This  also  has  the
  advantage of not requiring cluster traffic to determine master-ship of
  the lock.

  The volume lock itself is initially acquired in PW mode by  the  MOUNT
  routine  GET_VOLUME_LOCK,  also  in  the CLUSTRMNT module.  When MOUNT
  processing is essentially complete, the volume lock is converted to  a
  system  owned lock in CR mode by the STORE_CONTEXT routine in the same
  module.  This lock remains granted in that mode until  the  volume  is
  dismounted  and it is de-queued by the CHECK_DISMOUNT routine when the
  transaction count on the volume  becomes  idle  after  the  volume  is
  marked   for   dismount.    Its   lock  ID  is  stored  in  the  field
  VCB$L_VOLLKID.

  This volume lock, then, remains permanently granted  as  long  as  the
  volume  is mounted, and is used as the parent lock for the file number
  serialization locks discussed above.  A similar lock is taken  out  on
  the  volume set name to provide a parent lock for operations on volume


                                   2-2


                                   SERIALIZATION OF CONFLICTING ACTIVITY


  sets.  Its lock ID is stored in the RVT$L_STRUCLKID field.

  File  number  serialization  locks  are  taken  out  in  the   routine
  SERIAL_FILE.   Either  the  volume  lock  or  the  volume  set lock is
  specified as a parent lock depending on whether  it  is  a  single  or
  multi-volume  disk  volume.   The  resource  name used for file number
  serialization locks is the string "F11B$s" followed by 3 bytes of file
  number   plus  1  byte  of  relative  volume  number.   This  uniquely
  identifies  any  given  file  on  a  volume  set  (or  single  volume,
  obviously).


  2.2  Serialization Strategy

  On any given file system function, any or all of  the  following  disk
  volume structures may be referenced:

        o  A directory file - including both the directory header and/or
           the  directory  file data blocks.  This is the file specified
           by the FIB$W_DID, or directory ID, and may be used to look up
           the target file in an ACCESS function, or to make a directory
           entry in a CREATE operation.

        o  The target file header and its  possible  extension  headers.
           This  is  the  file  you  wish  to do something with, such as
           ACCESS it, write attributes, extend, truncate, etc.

        o  The  storage  and  index  file  bitmaps,  and  the  QUOTA.SYS
           diskquota  file.   The  storage  bitmap is involved when free
           storage is mapped to a file being extended, or returned  from
           a  file being truncated or deleted.  The index file bitmap is
           touched when a new file or extension header is being created,
           or  headers  are  being deleted.  The QUOTA.SYS file reflects
           allowed and current diskquota usages.

  For most operations, these structures are always accessed, if at  all,
  in the following order:

       1.  The directory file is looked at to do a directory lookup,  if
           any.

       2.  The target file header and its extension headers, if any, are
           looked  at.   The  file  header  is  calculated from its file
           number, which is either the result of  the  directory  lookup
           above, or explicitly specified by FIB$W_FID.

       3.  Storage allocation changes, either extension, truncation,  or
           deletion  of  the  target  file, including quota checking, if
           enabled.


                                   2-3


  SERIALIZATION OF CONFLICTING ACTIVITY


  2.3  Serialization On Specific Files

  Serialization locks for the directory and target files are handled  by
  the SERIAL_FILE routine, which takes the file ID as input and extracts
  the file number and relative volume number  to  construct  the  F11B$s
  lock  resource  name  and  then  take  the  lock out.  The SERIAL_FILE
  routine returns an index into the vector LB_LOCKID, which keeps  track
  of  the lock ID of the lock granted.  This index is stored in the cell
  DIR_LCKINDX for the directory serialization lock, and  into  the  cell
  PRIM_LCKINDX for the target, or primary file.

  In order to minimize both the number of  locking  operations  and  the
  number  of locks required to perform a given operation, it was decided
  to not serialize access to extension headers with a separate lock, but
  rather   to   serialize   access  to  all  extension  headers  with  a
  serialization lock on the primary header.  In normal  operations  this
  works  out  nicely  because  one  always goes after the primary header
  first and follows a link from each header to the next header.

  Serialization locks on the directory and  primary  file  are  normally
  held  until  the  completion  of  the entire operation.  All locks are
  released in the routine UNLOCK_XQP.  This routine is called  from  the
  DISPATCHER routine.


  2.4  Serialization Of Volume Changes

  Operations  that  involve  the  storage  or  index  file  bitmaps  are
  serialized  with  an  F11B$v  lock.   This  lock  is  taken out by the
  ALLOCATION_LOCK routine, which is  very  similar  to  the  SERIAL_FILE
  routine.   The  lock  ID  of  this allocation lock is always stored in
  element 0 of the LB_LOCKID vector.

  Note, though, that manipulating the headers of the storage  or  bitmap
  files  requires taking out the serialization lock on the file, as well
  as the allocation lock.


  2.5  Deadlock Considerations

  The file system is designed  to  be  deadlock  free.   By  assuming  a
  hierarchical  directory  and  file  structure, taking out locks in the
  above order results in a deadlock free system.

  Certain operations, such as creating a new file, must access files  in
  a  different  order.   Specifically, the allocation lock must be taken
  out first to determine what the file  number  of  the  to  be  created
  primary  file  is,  and then the directory entry can be made.  In this
  case, deadlock is avoided by releasing the allocation  lock  prior  to
  acquiring  the serialization lock on the new file header.  In general,


                                   2-4


                                   SERIALIZATION OF CONFLICTING ACTIVITY


  the allocation lock is always released before  acquiring  a  new  file
  number serialization lock.  The ALLOCATION_UNLOCK routine does this.

  It is considered okay to  hold  the  serialization  lock  on  a  newly
  created  file and then go after the directory serialization lock (even
  though it violates the ordering rule above) on the theory that if this
  is  a  new  file,  nobody  should be able to find it in the directory.
  Note that deleting a file removes the directory entry first,  so  that
  even  if  a system crashes while deleting a file, if anything is gone,
  it will be the  directory  entry,  so  that  helps  also.   You  could
  probably  go  out  of  your way to construct a directory with dangling
  entries, and then try to force a deadlock by having one  process  look
  up  the  non-existent files while another is creating them and causing
  new directories entries to be made, but the odds seem very low that is
  a real problem.


  2.6  Internal Serialization Checks

  To enforce the requirement that an appropriate serialization  lock  is
  held  when  a given header is read from disk, there is another vector,
  using the lock index returned from SERIAL_FILE, call  LB_BASIS,  which
  contains  the  lock  basis,  or  file  number  + RVN field, of a given
  serialization lock.  The CURR_LCKINDX field  contains  the  last  lock
  index  returned  by  SERIAL_FILE.   The READ_HEADER routine uses these
  bits of information,  in  addition  to  looking  at  the  header  just
  returned  from  the READ_BLOCK routine (discussed later), to determine
  if in fact the correct serialization lock is held for the header  just
  requested.   The READ_HEADER routine is used by all code in the XQP to
  read a header.

  For example, if a user specifies an extension header by  file  ID  and
  attempts  a  DELETE  function on the extension header, the MARK_DELETE
  routine  will  first  serialize  on  the  given  file  ID  by  calling
  SERIAL_FILE,  which  sets  the CURR_LCKINDX and LB_BASIS fields up, as
  noted above.  It will  then  call  READ_HEADER  to  actually  get  the
  header.   READ_HEADER,  however,  will  note  that it has an extension
  header (based on the FH2$W_SEG_NUM field  in  the  header),  and  that
  furthermore  the  lockbasis  for  the serialization lock held does not
  match  the  primary  header  for  that  file  (determined   from   the
  FH2$W_BK_FIDNUM  and  FH2$B_BK_FIDNMX  fields),  it  will exit with an
  "SS$_NOSUCHFILE" status, thereby making  direct  access  to  extension
  headers impossible.

  There is an exception to this, however.  BACKUP and DUMP, to name two,
  will  perform  an  ACCESS function explicitly on extension headers for
  the purposes of getting the extension header with  a  read  attributes
  list  that returns the complete file header, because they want to know
  exactly what is in all the extension headers of  a  given  file.   The
  ACCESS  routine  works together with the READ_HEADER routine to handle
  that case.  To perform an ACCESS function on an extension header,  the


                                   2-5


  SERIALIZATION OF CONFLICTING ACTIVITY


  ACCESS  routine  will  first  take the serialization lock on the given
  file ID as if it were a primary header - it has no choice  because  it
  cannot tell yet.  Then it calls READ_HEADER with an extra argument, an
  optional  output  from  READ_HEADER.    The   extra   argument   tells
  READ_HEADER  to  not  simply  return SS$_NOSUCHFILE if it encounters a
  lockbasis mismatch, but rather to return what the real  lockbasis  for
  that  extension  header  is,  derived from the primary header backlink
  field noted above.  The ACCESS routine  then  releases  the  incorrect
  serialization  lock  it had acquired, gets the right one based on what
  READ_HEADER told it, and tries again.   It  must  actually  retry  the
  READ_HEADER  again,  of  course,  because  until  it  has  the correct
  serialization lock, what that header actually is could change out from
  underneath it.


  2.7  Serializing Access To Shared Data Structures

  Besides serializing access to file  headers,  the  F11B$s  locks  also
  serialize  access to the File Control Block, or FCB.  This is a shared
  structure, and it must not be changed by some other process while  the
  process  is  unscheduled.   The serialization lock works fine for this
  once you've found the FCB corresponding to the file being operated on.
  Access  to  buffers associated with a given file is also serialized by
  the F11B$s lock.  This is not the same as locating the buffers in  the
  cache, of course, and those mechanisms are discussed later.

  The FCBs are in a doubly linked list off the VCB.  To scan that  list,
  the  XQP  raises  IPL  to  SCHED  to  prevent rescheduling while it is
  scanning.  There are currently no consistency bugchecks within the XQP
  to  validate  that  an  appropriate  serialization  lock  is held when
  referencing an FCB.

  The file extent and file number caches (pointed to by  VCB$L_VCA)  are
  similarly  serialized  by  the  F11B$v  allocation lock.  That is, the
  control of access to those shared structures by multiple processes  is
  from using the allocation lock.


  2.8  CURR_LCKINDX versus PRIM_LCKINDX versus DIR_LCKINDX

  DIR_LCKINDX records the index  into  the  lock  arrays  (LB_BASIS  and
  LB_LOCKID)  for  the  parent directory of the operation.  It is not in
  the context area saved for sub-operations.  PRIM_LCKINDX is the  index
  corresponding to the lock on the primary file header.  CURR_LCKINDX is
  set by SERIAL_FILE to record the last index returned  by  SERIAL_FILE.
  Calling  RELEASE_SERIAL_LOCK  with  a lock index equal to CURR_LCKINDX
  will zero CURR_LCKINDX.  Zero (the allocation lock)  is  not  a  valid
  value for these lock index variables.

  READ_BLOCK always uses  the  CURR_LCKINDX  when  reading  random  file


                                   2-6


                                   SERIALIZATION OF CONFLICTING ACTIVITY


  headers and blocks.  READ_HEADER also uses the CURR_LCKINDX value when
  checking for correct lockbasis.

  These locks are not normally  released  until  request  cleanup  time.
  However,  those  who  make  such  a  lock in secondary context, or who
  operate on the index file or such in  primary  context  (moving  EOF),
  must  release  their  lock  separately.   DEALLOCATE_BAD,  MARK_DELETE
  perform explicit writes  of  their  modified  buffers  and  explicitly
  release the lock (clearing PRIM_LCKINDX also).

  Normally,  PRIM_LCKINDX  has   the   same   value   as   CURR_LCKINDX.
  PRIM_LCKINDX  is  normally not itself referenced, although ERR_CLEANUP
  forces CURR_LCKINDX to equal PRIM_LCKINDX.  There are several cases in
  PRIM_LCKINDX and CURR_LCKINDX are not related.

  In SEARCH_QUOTA, if it is necessary to serialize on the quota file (to
  rebuild  stale  FCBs  for  it),  the  CURR_LCKINDX value must be saved
  during the rebuild sine it will refer to the quota file.   Quota  file
  operations (QUOTA_FILE_OP) runs with the quota file serialization lock
  (as well as the allocation lock) using CURR_LCKINDX.   When  advancing
  the  index file EOF (not currently needed), CURR_LCKINDX will refer to
  the  index  file  serialization  itself  during  the   header   write.
  Likewise,  while re-mapping the index file, CURR_LCKINDX will refer to
  the index file serialization lock.

  In PROPAGATE_ATTR (in CREATE), attributes are being  copied  from  one
  file  to  another.   In  this  routine, executed in secondary context,
  PRIM_LCKINDX points to  the  file  from  which  attributes  are  being
  copied.   CURR_LCKINDX  is saved across the OPEN_FILE call and is kept
  pointing to the target file, in case its headers must be re-read (when
  we go to find their buffers to write in attributes).

  Entering the CREATE function, a serialization lock may be held from  a
  previous  ACCESS  attempt  (if this was a create-if access) and so any
  PRIM_LCKINDX lock is released.  (ACCESS, like most  routines,  doesn't
  clean up after itself.)

  The DELETE_FILE routine, when purging the buffers  for  the  extension
  headers,  fabricates a serialization lock on the extension header file
  ID as a basis for purging the buffers.  This lock requires saving  the
  value of CURR_LCKINDX.

  In DIR_ACCESS,  CURR_LCKINDX  is  saved  while  DIR_LCKINDX  is  being
  established (in a call to SERIAL_FILE).

  FID_TO_SPEC releases the PRIM_LCKINDX lock  to  avoid  synchronization
  deadlocks  with processes walking down the hierarchy toward this file.
  The reference count is incremented on the  FCB,  though,  to  keep  it
  alive.  CURR_LCKINDX will refer to the various directories in the back
  link chain.  PRIM_LCKINDX is re-determined when we return to the  file
  after the search.


                                   2-7


  SERIALIZATION OF CONFLICTING ACTIVITY


  READ_WRITEVB will obtain a serialization lock (CURR_LCKINDX) on a file
  id  when  it  determines  that a process is trying to directly write a
  file header.

  SHUFFLE_DIR resets (in secondary context) CURR_LCKINDX to  DIR_LCKINDX
  so that READ_BLOCK will work.

  The lockbasis corresponding to PRIM_LCKINDX can be wrong if we try  to
  access  directly  an  extension  file  header.   The  code  to correct
  PRIM_LCKINDX (to get the correct lockbasis and lock) is in ACCESS.


                                   2-8


                                CHAPTER 3

                          XQP I/O BUFFER CACHING


  The file system manages its I/O buffers as an LRU cache.   The  intent
  is  to retain, in memory, the buffers corresponding to the disk blocks
  the file system has most recently referenced, and thus avoid  actually
  moving  the data from disk after it has been read from disk once.  All
  XQP I/O is performed to the buffers in the cache.

  There are two major problems faced by the XQP I/O buffer cache.

       1.  Providing a shared, system wide cache  in  a  multi-threaded,
           procedure based environment.

       2.  Cluster wide validation/invalidation of buffer contents.


  3.1  Shared, System-wide I/O Buffer Cache

  Each node maintains a system-wide I/O buffer cache.  The  contents  of
  the  buffers  are copies of the corresponding disk blocks.  This first
  section discusses the management of this cache on a single node.   The
  next  section  discusses the mechanisms used to validate these buffers
  against operations performed by other nodes in a cluster.


  3.1.1  Allocation And Initialization Of I/O Buffer Cache

  A system wide (single node) I/O buffer cache is used by the XQP.

  The buffers are allocated from  paged  pool.   This  is  done  by  the
  SETUP_BLOCKCACHE  routine in the MOUNT module STACP.  MOUNT qualifiers
  are used to control buffer cache creation.  By  default,  all  mounted
  volumes  share the same buffer cache that is allocated when the system
  disk is  mounted  during  the  boot  process.   The  number  of  pages
  allocated  for  each  of  the pools described above are taken from the
  active values of the SYSGEN parameters  ACP_MAPCACHE  (storage  bitmap
  pool),   ACP_DIRCACHE   (directory   and   quota  file  data  blocks),


                                   3-1


  XQP I/O BUFFER CACHING


  ACP_HDRCACHE (file headers and index file bitmap), and  ACP_DINDXCACHE
  (directory index cache).

  A separate I/O buffer cache can be  specified  by  use  of  the  MOUNT
  qualifier  /PROCESSOR=UNIQUE.   A  specific  I/O  buffer  cache can be
  specified  by  using  the  /PROCESSOR=SAME:mntdev   qualifier,   where
  "mntdev" is the name of an already mounted device.

  If an attempt is made to allocate a separate cache, but the allocation
  fails  (lack of enough contiguous space in paged pool), MOUNT will try
  to allocate a minimal size cache instead.  If the minimal  size  cache
  can  be allocated, the REDCACHE (reduced cache) message will be issued
  and the volume will be mounted successfully.   If  the  minimal  cache
  allocation attempt fails, you get an error.


  3.1.2  Finding The I/O Buffer Cache

  The cache for a  given  mounted  device  is  found  by  following  the
  UCB$L_VCB  pointer  to the VCB, then the VCB$L_AQB pointer to the AQB,
  and finally the AQB$L_BUFCACHE pointer to the cache header.  There  is
  a  single  AQB for each buffer cache.  However, multiple VCBs may (and
  usually do) point to a single AQB.


  3.1.3  Layout Of The I/O Buffer Cache

  The I/O buffer cache itself consists of a fixed overhead  area  (F11BC
  structure), a variable size buffer descriptor array (BFRD structures),
  a variable size lock descriptor array (BFRL  structures),  a  variable
  size buffer LBN hash table, a variable size lock basis hash table, and
  finally, an array of page aligned I/O buffers.  Each area performs the
  following functions.

       1.  The fixed overhead area contains  pointers  to  the  variable
           areas that follow and their sizes.  It also contains a number
           of queue headers discussed later.

       2.  The next area is an array of buffer  descriptors,  or  BFRDs.
           These  describe what disk block a given buffer belongs to (by
           LBN and UCB address), whether it  is  valid  or  modified  or
           being  used,  and  what type of buffer it is.  It also has an
           index to its associated BFRL, discussed later.

       3.  The BFRLs describe the locks associated with the  buffers  in
           the cache.  They are discussed further on.

       4.  The buffer LBN hash table is an array of  word  indices  into
           the  BFRD  array.   It reduces the amount of time required to
           search the cache to determine if a given LBN  is  already  in


                                   3-2


                                                  XQP I/O BUFFER CACHING


           the cache over what a linear search all the descriptors would
           involve.

           The hash function is a modulo function using the desired  LBN
           and  the  size  of  the  hash  table in words.  Overflows are
           handled by chaining through the BFRDs.

       5.  The lock basis hash table  serves  a  similar  function.   It
           allows a relatively quick search of the BFRLs to determine if
           one already exists for a given lock basis.  This is discussed
           later.

  There are pointers in the fixed overhead area to  the  variable  areas
  that  follow it.  There are as many BFRDs and BFRLs as buffers, so the
  size of those areas is directly proportional to the number of  buffers
  in the cache.  The buffer LBN and lock basis hash areas have a minimum
  of one word each per buffer.  The minimum size requirements  are  thus
  calculated  by  the SETUP_BLOCKCACHE routine with an extra page thrown
  in so we will have enough  room  to  always  page  align  the  buffers
  themselves  regardless  of  where  the  space is actually allocated in
  paged pool.  Any extra room between the lock descriptors  (BFRLs)  and
  the start of the I/O buffers is split up between the two hash tables.

  The entire overhead area and  the  buffers  themselves  are  currently
  allocated  as  a  single contiguous chunk of paged pool.  However, the
  implementation  allows  for  the  descriptor  area  to  be   allocated
  separately  from  the  buffers.   The  total overhead area is about 10
  percent of the size of the buffers themselves.


  3.1.4  Segregation Of Buffers Into Pools

  The XQP divides all buffers in the cache into 3 pools for purposes  of
  LRU replacement.  They are:

       1.  Storage bitmap blocks and the Storage  Control  Block  (SCB).
           These   are   all   the   data  blocks  mapped  by  the  file
           [0,0]BITMAP.SYS.

       2.  Directory data blocks and data blocks of  the  [0,0]QUOTA.SYS
           file.  This is the only pool that performs multi-block reads.

       3.  File headers and index file bitmap  blocks.   These  are  all
           data blocks mapped by the [0,0]INDEXF.SYS file.

  In addition there is a fourth pool of  pages  used  by  the  directory
  index  caching  mechanism.   These  pages are not I/O buffers, but are
  managed by the  buffer  caching  routines  because  they  provide  the
  necessary cluster validation.  This will be discussed later.


                                   3-3


  XQP I/O BUFFER CACHING


  3.1.5  Buffer Replacement

  The replacement algorithm for buffers is Least  Recently  Used  (LRU).
  When  the  desired disk block cannot be found in the cache, the oldest
  buffer is tossed out and replaced with the desired block.

  This is accomplished by linking all BFRDs for  a  given  pool  onto  a
  queue  header  for  that  pool.   F11BC$Q_POOL_LRU is a vector of four
  queue headers for that purpose.

  Since the buffer manager can release a buffer at any time (only if you
  ask  it  to  read  something,  of  course),  it  is possible for local
  variables (and globals such as FILE_HEADER) to no longer point to  the
  buffer  desired.   If it is necessary to read a set of blocks, it will
  be necessary to re-ask for the original block.


  3.1.6  Serialization Of Cache Manipulation

  Changing the state of the cache descriptors in the overhead area  must
  be  done  atomically,  i.e., any process needing to use it must always
  see a consistent picture.  Searching or manipulating  the  cache  must
  therefore be serialized.  This only needs to be done for the processes
  on a given node, however, not across  an  entire  cluster.   The  lock
  manager is therefore not required in this case, and we can perform the
  function faster without it in this restricted case.

  There are two routines, SERIAL_CACHE and RELEASE_CACHE,  that  acquire
  and  release  the  cache  interlock.  These routines are called by the
  other routines in the RDBLOK module.  SERIAL_CACHE queues the IRP  for
  the  current  function  onto the queue header of the AQB.  If it is at
  the head of the queue, it returns from that routine and its caller may
  proceed.   If  it  is  not  at the head of the queue, the process puts
  itself to sleep until it is at the head of the queue.

  The RELEASE_CACHE routine removes the IRP from the head of the  queue.
  If  another  element  remains, i.e., is now at the head, RELEASE_CACHE
  queues an AST to that process so that it will proceed.

  The CDRP area of the IRP is used as an ACB, just like it  is  used  to
  start the whole AST thread in the first place, discussed already in an
  earlier section.

  The cache serialization interlock is only  held  while  searching  the
  cache  or  changing  the state of the buffer descriptors.  It is never
  held when the XQP must stall  for  I/O,  or  anything  else  for  that
  matter.


                                   3-4


                                                  XQP I/O BUFFER CACHING


  3.1.7  Reserving Buffers

  Before a given file system operation is allowed to use any buffers  in
  the  cache,  it  must  first  "reserve"  the minimum number of buffers
  required to perform  the  operation.   This  is  done  by  maintaining
  counters  (BFR_CREDITS,  one for each pool) in the fixed overhead area
  that represent the number of buffers currently reserved by  concurrent
  file  system  activity.   This  is  the F11BC$L_POOLAVAIL vector.  The
  GET_REQD_BFR_CREDITS routine performs this  function.   The  currently
  required  buffer credits are:  1 bitmap block buffer, 2 directory data
  block buffers, 3  file  header  buffers,  1  directory  index  buffer.
  CACHE_HDR  and  AQB  are  initialized here.  This routine will stall a
  process until enough buffers are  available.   The  F11BC$Q_POOL_WAITQ
  vector  has listheads for each pool for the IRPs to be queued on while
  they wait.  The RETURN_CREDITS routine will  send  them  an  AST  when
  buffer  credits  are  returned.  The reason for this is that deadlocks
  could result if a partially completed operation already  held  buffers
  that  in  turn  were  required  by  another  process waiting for those
  buffers before releasing his.  The obtaining of credits is done  under
  the cache interlock (routines SERIAL_CACHE and RELEASE_CACHE).

  RETURN_CREDITS returns the buffer credits to  the  free  pool  counts,
  under  the cache interlock and only if the buffers are not in use.  It
  will wake up some process if there is a process on the pool wait queue
  or  the  ambiguity  queue  (discussed  below) by adding such a process
  after our entry on the cache interlock queue.  This causes them to  be
  awakened when we release the cache interlock.


  3.1.8  Free Versus In-process Buffers

  When all is quiet, that is, there is no file system activity, the four
  values  in  the F11BC$L_POOLAVAIL vector will equal the four values in
  the F11BC$W_POOLCNT (pool counts).  In addition,  all  buffers  for  a
  given pool will be linked onto their respective F11BC$Q_POOL_LRU queue
  header.

  When a buffer  is  being  used  by  a  particular  process  during  an
  operation,  it is removed from the POOL_LRU queue, and inserted onto a
  per process BFR_LIST queue.  The BFR_LIST structure is itself a vector
  of  queue  headers,  one  for  each  pool.  Each process also has four
  element BFR_CREDITS and BFRS_USED vectors representing the  number  of
  buffers  reserved and the number actually in use.  The number of BFRDs
  on each queue  header  in  BFR_LIST  must  always  correspond  to  the
  BFRS_USED   value.   When  a  BFRD  is  on  the  BFR_LIST  queue,  the
  BFRD$W_CURPID field will contain  the  internal  PID  index  for  that
  process.


                                   3-5


  XQP I/O BUFFER CACHING


  3.1.9  Extending Buffer Credits During An Operation

  As mentioned earlier, a minimal number of  buffers  must  be  reserved
  before  any  operation  is  allowed  to  proceed,  in fact, before any
  operation is allowed to hold any locks.  For example, 3  buffers  from
  the  file  header  pool  are  always  reserved.   If there were only 6
  buffers in the file header pool (ACP_HDRCACHE sysgen  parameter)  only
  two  processes would be allowed to proceed concurrently.  Until one of
  them completes, another process coming along will be stalled.  In that
  situation,  if  a  file  with 4 headers is being accessed, the process
  will have to discard the first  header  read  from  its  BFR_LIST  and
  re-use that buffer to read the fourth header.  The FREE_ONE routine in
  RDBLOK will do this.  The BFR_LIST itself is managed LRU so  that  the
  oldest buffer gets tossed.  All callers of the READ_BLOCK routine must
  be prepared for this possibility.

  However, if there are more than 6 unreserved buffers in a given  pool,
  additional  buffer  credits  will  be  extended  to a process to avoid
  invalidating a recently accessed buffer, as  the  above  example  did.
  This  is  done  by decrementing POOLAVAIL and incrementing BFR_CREDITS
  when the additional buffer is  desired,  subject  to  POOLAVAIL  being
  greater  than  or  equal  to 6.  The number six is somewhat arbitrary.
  The intent is to preserve a certain amount of  concurrency  under  all
  conditions.


  3.1.10  The Ambiguity Queue

  The  queue  header  F11BC$Q_AMBIGQFL  is  the  ambiguity  queue.    As
  mentioned  earlier  in the serialization discussion, it is possible to
  serialize on the  wrong  lock  basis  when  attempting  to  access  an
  extension   header   directly.   If  that  same  extension  header  is
  concurrently being accessed by another process as an extension header,
  using the correct lockbasis, it is possible for one of those processes
  to locate the buffer as being "in use" by the  other  process  in  the
  cache.    Normal   file   number   serialization  usually  makes  this
  impossible, and, except for the specific case of file  headers,  would
  cause an XQPERR bugcheck.

  However, in this case, the process will put itself  on  the  ambiguity
  queue and go to sleep.  This is done in the RESOLVE_AMBIGUITY routine,
  which queues the IRP onto the F11BC$L_AMBIGQFL queue.  When  the  next
  operation completes, it will be awakened and look again.  This is done
  by the RETURN_CREDITS routine.  This can happen any  number  of  times
  until the ambiguity is resolved.

  FIND_BUFFER detects the ambiguity case (when it finds a buffer in  use
  in  some other process).  WRONG_LOCKBASIS and RETURN_CREDITS check the
  ambiguity queue for processes to waken.


                                   3-6


                                                  XQP I/O BUFFER CACHING


  3.1.11  Multi-block Disk Reads

  The directory and quota file data block pool allows multi-block reads.
  A  contiguous  group  of  buffers will be assembled in the FIND_BUFFER
  routine to be used in a single multi-block QIO when the desired buffer
  was  not  already in the cache and the caller requested it.  Directory
  and quota file processing will request  it.   The  number  of  buffers
  assembled are limited by the sysgen parameter ACP_MAXREAD.

  The starting point for the contiguous assembly is the BFRD pulled from
  the  POOL_LRU  list.   We  first  try  to  assemble  adjacent BFRDs in
  ascending memory sequence.  If we bump into the end of  the  pool,  we
  attempt  to  proceed  from  the  starting  BFRD  in  descending memory
  sequence.

  If any BFRD is already in use (BFRD$W_CURPID non-zero), we  quit.   If
  the  LBN  we  intend  to  read  into that BFRD is already in the cache
  somewhere, we quit.  If we exceed  our  buffer  credits  and  are  not
  extended anymore, we quit.


  3.1.12  Disk Writes

  All writing to disk (except for normal  virtual  write  functions  and
  erase   functions)   is  performed  by  WRITE_BLOCK  (in  RDBLOK)  (or
  WRITE_HEADER, also  in  RDBLOK,  which  performs  a  checksum  first).
  Buffers can be explicitly written in this way.  WRITE_BLOCK is invoked
  automatically when it  is  necessary  to  remove  a  buffer  from  the
  in-process  list  (dirty buffers must be only on the in-process list).
  WRITE_DIRTY can be called to write out all  dirty  buffers  associated
  with  a lockbasis (0 implies write all buffers).  TOSS_CACHE_DATA will
  do the same given a lock array index, except that it also  invalidates
  the  cache  buffers.   This  is  done when closing a file opened using
  OPEN_FILE.

  Most operations that modify buffers will simply mark them as dirty and
  allow  CLEANUP  to  write  them  (WRITE_DIRTY (0)).  There are various
  exceptions.

  ERR_CLEANUP force writes the current directory buffer when it performs
  a re-enter function.

  CREATE_HEADER force writes the index file header  when  advancing  the
  EOF  (not  currently ever done).  CREATE_HEADER force writes blocks of
  the index file bitmap when filling the FID cache.  DELETE_FID performs
  likewise when returning FIDs to the index file bitmap.

  DEALLOCATE_BAD force writes  (WRITE_DIRTY  (lockbasis))  the  modified
  file  headers  itself.   SCAN_BADLOG  will force write the BADLOG file
  header when extending its header.


                                   3-7


  XQP I/O BUFFER CACHING


  MARK_DELETE force writes the updated (marked as  deleted  or  actually
  deleted) headers out to disk.  DELETE_FILE does likewise.

  WRITE_AUDIT performs a WRITE_DIRTY given the primary lockbasis  before
  doing the FID_TO_SPEC translation which will release the lockbasis.

  EXTEND_CONTIG force writes data blocks as it copies them  to  the  new
  extended contiguous file.  The new header is force written.  Likewise,
  SHUFFLE_DIR force writes directory blocks during its copy.

  TRUNCATE force writes the file header with the map pointers  truncated
  so  that  it  guarantees that the header is updated before the storage
  map shows the blocks as free.


  3.1.13  System Wide Buffer State

  Buffers are either in the system  list,  possibly  marked  as  validly
  containing  the  data  described  from  disk, or in a in-process list,
  again possibly valid and also possibly marked dirty (modified, not yet
  written  to  disk).   READ_BLOCK  takes a buffer and moves it from the
  system list to the in-process list.  In the process,  READ_BLOCK  will
  read the block if the buffer descriptor describes it as invalid (refer
  also to cluster wide cache validation, later).  CREATE_BLOCK does  the
  same,  except  that it is called when it is known that the disk blocks
  contents are meaningless (such as  for  a  block  within  a  new  file
  extension).   Here,  CREATE_BLOCK simply zeros the block returned.  It
  is marked as dirty and valid.  (An exception is when the  desired  LBN
  is  -1, indicating that we simply want a free scratch buffer.  This is
  done in SHUFFLE_DIR, where spare blocks are needed to  hold  directory
  entries  being  moved.  The correct LBN for the buffers is established
  with RESET_LBN.)

  The buffers are moved back to the system list  via  RELEASE_LOCKBASIS,
  performed by RELEASE_SERIAL_LOCK (see below).


  3.1.13.1  INVALIDATE - INVALIDATE will move a buffer to the  front  of
  the  in-process  LRU  list  and  mark it as not valid (and not dirty).
  Several operations do this.

  The various readers and writers of the SCB (DISMOUNT in  ACPCNTRL)  do
  this  to  make sure that the SCB is not cached for a shadow set (since
  mount verification writes asynchronously to the SCB).

  CREATE_HEADER calls INVALIDATE upon a  file  header  if  it  finds  it
  doesn't  want  to use it.  This helps avoid confusion if the header is
  found in the cache when it shouldn't be.  READ_HEADER also invalidates
  headers that are invalid.


                                   3-8


                                                  XQP I/O BUFFER CACHING


  When reading a new header (CREATE_HEADER), if the read fails, we  want
  to  test  to  see  if  we  can read/write the block.  So, something is
  written (WRITE_BLOCK), the buffer is invalidated, and a READ_BLOCK  is
  done again.

  SHUFFLE_DIR will do an invalidate on a buffer being squished out.

  MARK_DELETE reads, as a data block, the first  block  of  a  directory
  being  deleted  to  make  sure  it  is  empty.   The  buffer  read  is
  invalidated since it will not be needed.


  3.1.13.2  RESET_LBN - RESET_LBN  changes  the  LBN  recorded  with   a
  buffer.

  When modifying the index  file  header  (CREATE_HEADER),  the  LBN  is
  changed  to  reflect  the  alternate  index  file header so that a new
  WRITE_BLOCK will get it.  INVALIDATE is called immediately  thereafter
  to  avoid  screwing up.  A similar operation is performed when reading
  the index file header (READ_IDX_HEADER).  If the file  size  from  the
  header  is incorrect, the alternate index file header is read by doing
  a READ_BLOCK of the alternate  index  file  header  and  performing  a
  RESET_LBN on the buffer if that succeeds.  (If it fails, the buffer is
  simply invalidated and we punt.) EXTEND_INDEX will also do  this  when
  actually extending the index file.

  RESET_LBN is used when copying a contiguous file when it is  extended.
  READ_BLOCK reads the old file as data blocks, RESET_LBN is called, and
  the blocks are explicitly written.  They can remain in the cache since
  the  LBN  recorded  does  match their new location.  This operation is
  also done when compressing a directory.


  3.1.13.3  KILL_CACHE - KILL_CACHE invalidates all  buffers  associated
  with a particular UCB.  Buffers in the system list are purged, buffers
  in our process list are marked invalid, buffers in other process lists
  are left alone.

  KILL_CACHE is called to flush the cache of any buffers when the volume
  is being dismounted or is flagged as nocache (CLEANUP).


  3.1.13.4  KILL_BUFFERS - KILL_BUFFERS performs the  same  function  as
  KILL_CACHE  except  that  it  takes  a  pool  number  and  a lockbasis
  (directory data and directory index  pools  only)  and  works  against
  CURRENT_UCB.

  KILL_BUFFERS is called to  flush  directory  blocks  when  we  find  a
  directory  as  write  accessed  (CLEANUP).  This is done when not in a


                                   3-9


  XQP I/O BUFFER CACHING


  cluster, since when in a cluster the sequence numbers associated  with
  the  serialization  locks will protect these buffers (refer to cluster
  validation below).   KILL_BUFFERS  is  also  called  when  deleting  a
  directory,  to  flush its data blocks out of the cache.  Also, turning
  off the directory bit for a directory flushes directory blocks.

  The special file write  virtual  function  (READ_WRITEVB)  performs  a
  KILL_BUFFERS  when  the  user  writes  to the index file, bitmap file,
  quota file or a directory.  (In a cluster, this is  done  through  the
  buffer sequence numbers, described below.)

  An explicit WRITE_DIRTY (-1) is performed when de-accessing the  quota
  file  (QUOTAUTIL)  to  write out the quota file blocks (after clearing
  quota cache).  KILL_BUFFERS (1, -1) purges the buffers associated with
  the quota file (data blocks).


  3.2  Cluster Wide Buffer Validation

  In a cluster, there is a separate I/O buffer cache on each  processor.
  The  contents  of  a  given  buffer  in  the  I/O  cache on a specific
  processor will become stale if that disk block is modified by the file
  system on another processor.

  Because each buffer corresponds to some  on-disk  structure  the  file
  system  manipulates,  the reading and writing of those buffers must be
  serialized by one of the serialization locks discussed in the previous
  section.   Those locks, and their value blocks, are the key to cluster
  wide buffer validation.


  3.2.1  Use Of Value Blocks

  The basic scheme is to maintain a sequence number in the  value  block
  of  serialization locks (which are associated with specific buffers in
  the cache).  This sequence number is incremented whenever  any  buffer
  associated  with that lock is modified.  All buffers associated with a
  given lock in a given cache retain a copy of the sequence number as of
  the  last  time  those  buffers were used and valid.  When a buffer is
  subsequently found in the cache by a  later  operation,  the  retained
  sequence   number   is   compared   to  the  current  value  from  the
  serialization lock.  If they match, no other  processor  has  modified
  the associated disk block, and hence the contents of the cached buffer
  are valid.  If the retained sequence number and the  current  sequence
  number  do not match, the contents of the cached buffer are stale, and
  it must be refreshed by reading the current contents from disk.

  Different parts  of  different  value  blocks  are  used  to  validate
  different  buffers.  The following buffers are validated by the F11B$s
  file number serialization lock:


                                   3-10


                                                  XQP I/O BUFFER CACHING


        o  File headers are validated by  the  FC_HDRSEQ  field  in  the
           F11B$s  lock  for  that  file.   Note  that a single sequence
           number is used to validate all  headers  for  a  given  file,
           therefore  modifying  just the primary header would cause all
           cached headers for that file elsewhere to become invalid.

        o  Directory data blocks are validated by the  FC_DATASEQ  field
           in  the F11B$s lock for a given directory file.  Same comment
           as above - all data blocks are validated by a single sequence
           number.   Note, however, that a directory file header and its
           data blocks are validated by  different  parts  of  the  same
           value  block,  hence  can  be  independently modified without
           invalidating each other.

           Data blocks of any file  opened  by  the  internal  OPEN_FILE
           routine are also validated by this field.

  The actual fields in the value  blocks  are  only  referenced  by  the
  SERIAL_FILE  and  RELEASE_SERIAL_FILE  routines.   They are referenced
  elsewhere by the LB_HDRSEQ and LB_DATASEQ vectors, indexed by the lock
  index   returned   by  SERIAL_FILE.   (The  LB_FILESIZE  values,  also
  corresponding to value block fields, are not used.)

  The following buffers are validated by the F11B$v allocation lock:

        o  Storage bitmap blocks (BITMAP.SYS data blocks)  use  the  low
           word of the VC_SEQNUM field.

        o  Index file bitmap blocks use the high word of  the  VC_SEQNUM
           field.

        o  Quota file data blocks use bits 1 through 15 of the  VC_FLAGS
           field.

  The validation  for  buffers  found  in  the  cache  is  done  by  the
  FIND_BUFFER  routine  in  the  RDBLOK  module.   Modification  of  the
  sequence numbers is done by the WRITE_BLOCK routine.  This is the only
  routine that writes modified buffers to disk.

  Note that when a node fails, the very latest copy of the  value  block
  may  be  lost.   If  that  is possibly the case, the lock manager will
  return an SS$_VALNOTVALID warning status on $ENQ operations requesting
  the  value  block.  The SERIAL_FILE and ALLOCATION_LOCK routines check
  for this status and increment all of the sequence number fields in the
  value   block   to   force   a   cache  miss  if  that  happens.   The
  SS$_VALNOTVALID condition is cleared by rewriting the value block.


                                   3-11


  XQP I/O BUFFER CACHING


  3.2.2  Volume Status Value Block Fields

  The allocation lock value block also  contains  the  fields  IBMAPVBN,
  SBMAPVBN, VOLFREE and IDXFILEOF.  These fields are used as follows.


  3.2.2.1  Free Volume Block Count - VOLFREE is passed  around  so  that
  the  last  node to update the volume free block count can reflect that
  to other nodes.  Note that this must be considered  only  approximate,
  since a node may crash holding the value block, thereby not reflecting
  the true last  value.   The  description  that  follows  assumes  that
  VOLFREE is good.

  EXTEND_INDEX uses the volume free value in its algorithm  to  estimate
  the  number  of  files  likely  to  yet be created on the volume, when
  deciding how much to extend the index file.  Likewise, this figure  is
  used by SELECT_VOLUME to pick a likely victim for a file.

  The free figure is used when deciding how many blocks to record in the
  extent cache (SMALOC).

  When a volume is unlocked, the unlocking node (which  must  also  have
  been  the  locking  node) has the only good notion of free space.  So,
  LOCK_VOLUME saves the free space figure from the  VCB  and  writes  it
  back   into  the  VCB  under  the  allocation  lock.   (Acquiring  the
  allocation lock will update the volume free  count  from  some  random
  value block.)


  3.2.2.2  Index   Map   VBN - The   index   map   VBN   is   maintained
  (FILL_FID_CACHE  and  REMOVE_FILE_NUM)  in the VCB as a starting point
  for header allocation (CREATE_HEADER).  This value is  used  since  it
  reflects  the  last  point of interest in the index file map, a likely
  place to look for new headers.   When  allocating  file  headers,  the
  value is incremented to the index file map block from which we succeed
  in performing an allocation.  If we return to the map  (from  the  FID
  cache)  a FID below this value, the value is decremented so that other
  nodes filling of their FID caches will start from here.  (Refer to the
  FID cache in the cache chapter.)


  3.2.2.3  Storage Map VBN - In a similar manner to the index  map  VBN,
  the  storage map VBN is kept to record a starting point of interest in
  the storage map.  It is updated only during storage  map  allocations.
  If  the desired blocks are not found from this point to the end of the
  map, a scan is started from the beginning.  As such, this value may be
  reset  to  a  lower value if the desired blocks are found lower in the
  map.


                                   3-12


                                                  XQP I/O BUFFER CACHING


  3.2.2.4  Index File EOF - The index file EOF (set in CREATE_HEADER and
  EXTEND_INDEX)  are  passed  around  as  an obvious limit to the header
  search.


  3.2.3  Associating Locks With Buffers

  The lock manager maintains two structures for a given  lock.   One  is
  the  resource  block,  which  contains the resource name and the value
  block.  The other is the lock block, which represents a specific  lock
  on  that  resource.  The resource and lock blocks are created when the
  first lock is taken out on a given resource name.  The resource  block
  disappears when the last lock is de-queued.

  The locks used to serialize access to a given disk block are  used  to
  validate a cached copy of that disk block in a cluster.  These are the
  F11B$v and F11B$s locks discussed earlier.  However, the F11B$s  locks
  are  normally  de-queued  at  the end of an operation.  This means the
  resource block would be de-allocated  and  we  would  lose  the  value
  block.  Therefore, if a buffer is to remain in the cache, we must keep
  a lock on it.  The buffers in the cache really belong to  the  system,
  not  to any given process.  Therefore, the concept of a "system owned"
  lock was invented.  This allows a granted lock to  be  converted  such
  that it is no longer associated with a given process.

  When a buffer is in the cache it must have a  NL  mode,  system  owned
  lock  associated with it.  The BFRL structure is used to keep track of
  those locks.  Multiple buffers may have the same lock basis, and hence
  many  BFRDs may point to the same BFRL.  The BFRL contains a reference
  count of BFRDs so we know when the lock can be completely de-queued.

  A problem with this is that the sequence number  is  transmitted  from
  node  to  node via value blocks associated with the lock backing a set
  of buffers.  Since many buffers may be associated with  a  given  BFRL
  (for  example,  there  might  be 10 storage bitmap blocks for the same
  volume in the cache), it is necessary  for  them  all  to  have  their
  sequence numbers updated in sync.

  If one of them is modified by another  operation  on  that  node,  the
  BFRD$L_SEQNUM  field will be updated, as well as the appropriate value
  block field.   However,  a  subsequent  operation  that  references  a
  different  bitmap  block  in  the  cache  will  get  a mismatch on the
  sequence numbers because it will be comparing the  value  block  field
  from  the  last  operation against its the sequence number it got when
  they were all brought in the first time.  It really is  valid  because
  no one has modified it, but we cannot tell that.

  Releasing the serialization locks and potentially converting  them  to
  system   owned   is  done  together  by  the  RELEASE_SERIAL_LOCK  and
  RELEASE_LOCKBASIS      routines.       RELEASE_SERIAL_LOCK       calls
  RELEASE_LOCKBASIS  which  then  scans  the  in-process  list of buffer


                                   3-13


  XQP I/O BUFFER CACHING


  searching for a given lock and associating a BFRL  with  them  if  one
  does   not   already   exist.    If  a  new  BFRL  has  been  created,
  RELEASE_LOCKBASIS returns with a status causing RELEASE_SERIAL_LOCK to
  convert  the  serialization lock to system owned and store the lock id
  in the BFRL, otherwise the serialization  lock  is  simply  de-queued.
  The  cache  serialization  interlock  is held during this scan, and we
  cannot stall while doing so.  For that reason,  all  modified  buffers
  must have been written prior to calling RELEASE_SERIAL_LOCK.  Explicit
  calls to WRITE_BLOCK for individual buffers,  or  to  the  WRITE_DIRTY
  routine to scan the lists will accomplish that.

  All of this stuff with locks is necessary for the cache to work  in  a
  cluster.  For a non-cluster system, their are no locks associated with
  the buffers, as they are not necessary.

  Since the allocation lock backs up storage bitmap and index  file  map
  blocks,  they  must  be written out and released before the allocation
  lock can be released.  ALLOCATION_UNLOCK performs this function.

  Note that the DELETE_FILE routine, when purging the  buffers  for  the
  extension  headers,  fabricates  a serial lock on the extension header
  file ID as a basis for purging the buffers.  This  buffer  purging  is
  done here, instead of waiting for cleanup, to avoid someone picking up
  these file IDs as primary headers later and getting our buffers.


  3.3  The Directory Index Cache Pool

  The directory index cache is the fourth  pool  in  the  buffer  cache.
  They  are  not  buffers,  though,  but  rather  an  index into a given
  directory file, constructed  on  the  fly  as  a  given  directory  is
  processed.

  The directory index cache is managed by the buffer cache code  because
  it  essentially  has the same cluster wide content validation problems
  that buffers do.

  A directory index block has a small header area followed by  about  30
  15  byte  cells.   These  represent  the  highest  record found in the
  corresponding directory data block.  This allows every block  to  have
  an entry for directory files smaller than 30 blocks, every other block
  for directories between 30 and 60, etc.  The V4  implementation  fixes
  the  cell size at 15 characters and limits it to 1 page.  15 bytes was
  picked because MAIL$800....  files that  are  about  a  day  apart  in
  creation vary in the fourteenth or fifteenth character.

  Instead of being located by hashing on LBN, a directory index block is
  pointed  to  by the directory FCB, the FCB$L_DIRINDX cell.  BFRD$L_LBN
  points back to the directory FCB.  The routine MAKE_DIRINDX in  RDBLOK
  is  called  from DIR_ACCESS in DIRACC to validate a directory FCB.  If
  the FCB has no corresponding DIRINDX block, one is  removed  from  the


                                   3-14


                                                  XQP I/O BUFFER CACHING


  list  for  the directory index pool and linked to the FCB.  Otherwise,
  the block is used.  The block is  validated  from  the  LB_HDRSEQ  and
  LB_DATASEQ  values  for the directory.  Only one directory index block
  is allowed to be used by the process (since the XQP only works against
  a  single  parent  directory  in an operation).  KILL_DINDX breaks the
  linkage between the FCB and the directory  index  block.   ERR_CLEANUP
  will  call  KILL_DINDX  when  it  needs  to  delete a directory with a
  corresponding directory index block.  Likewise, when MARK_DELETE  goes
  to  delete  a  file  (reference count hits 0), it will call KILL_DINDX
  before purging the FCBs.

  Unhooking the buffer descriptor for a directory index block (done when
  it  pops  to  the  top  of  the  LRU  list and is being used for a new
  directory, or in KILL_BUFFERS or KILL_CACHE, or  in  KILL_DINDX)  will
  also  break the link between it and its FCB.  SET_DIRINDX (called from
  CLEANUP, and also CLOSE_FILE) tries to keep around  FCBs  for  popular
  directories and keep the association of the FCB to the directory index
  block.  If the caller of SET_DIRINDX finds that  the  reference  count
  for  the  FCB  goes to zero, they would normally delete the FCB chain.
  If, however, their is a directory index block lying around,  FCB$V_DIR
  is  set  (at  IPL$_SCHED).   SEARCH_FCB  will  notice  (at IPL$_SCHED)
  whether an FCB has been left lying around for this reason.   Unhooking
  a  directory index block must check for this case, and de-allocate the
  FCB chain.  UNHOOK_BFRD clears the FCB$L_DIRINDX value first, so  that
  a  SEARCH_FCB  will  not  find  the  FCB  lying  around by virtue of a
  directory index block.  We check to see if a SEARCH_FCB did  find  the
  FCB  prior  to  clearing  FCB$L_DIRINDX  by  virtue  of  the fact that
  SEARCH_FCB cleared the FCB$V_DIR bit (also at IPL$_SCHED).

  DIR_ACCESS requests the creation of a directory index  block.   If  it
  finds  a valid one, it also knows it has valid FCBs (due to the checks
  in MAKE_DIRINDX).  Otherwise, it reads the FCBs and calls MAKE_DIRINDX
  for  real.   If the directory is not really a directory, KILL_DINDX is
  called to get rid of  the  bogus  directory  index  block.   Likewise,
  turning   off   the   directory  bit  (WRITE_ATTRIB)  will  also  call
  KILL_DINDX.

  The directory index block is built by UPDATE_INDX (called by ENTER and
  DIR_SCAN)  as  they  walk  down the directory.  DIR_SCAN then uses the
  block to save work on subsequent scans.

  The routine ZERO_IDX  (CLENUP)  is  called  by  SHUFFLE_DIR  when  the
  directory's  header  is  to  be  updated.  This routine increments the
  FCB$W_DIRSEQ and also  sets  the  corresponding  INUSE  value  in  the
  directory  index  block  to  zero,  since  the  block  layout  is  now
  different.  (FCB$W_DIRSEQ  is  updated  when  a  direct  access  of  a
  directory  occurs,  when SHUFFLE_DIR must change the directory header,
  and after an access to a  directory  that  does  not  locate  a  valid
  directory index block.)


                                   3-15


  XQP I/O BUFFER CACHING


  3.4  Invalidation Of Cached Buffers By Users

  Because all volume structures that the file system  uses  to  maintain
  the  volume  are  themselves files, it is possible for random users to
  access those files and  read  and  write  the  blocks  in  them.   For
  example,  anyone  with  write  access  to  a  directory  can  open the
  directory file and write junk into its data blocks.  Or with a  little
  privilege,  open BITMAP.SYS and rewrite it.  A disk rebuild does this,
  for example.

  The problem here is how to invalidate cached copies  of  those  blocks
  that may be in the I/O buffer cache.

  The solution is to trap all write virtual requests in the file system.
  This  is  done  by  setting a flag, WCB$V_WRITE_TURN in the WCB of any
  write accessed directory file, INDEXF.SYS,  or  BITMAP.SYS.   This  is
  done  when  the  file  is  accessed  by  either the ACCESS routine for
  directories, and the MAKE_ACCESS routines for the others.  Whenever  a
  write virtual function is performed on one of those files, the QIO FDT
  routine forces a window turn.  In the READ_WRITEVB  routine,  the  XQP
  will  take  out  the  appropriate  serialization  lock  (if it doesn't
  already have it) that would validate that buffer  and  increments  the
  appropriate  field  in  the  value  block.   This  really  only  works
  correctly if the volume blocking lock is held  by  the  process  doing
  this.  There are all sorts of race conditions because the user just in
  not synchronized in any real way with  the  file  system.   To  do  it
  right,  you  would have to introduce locking semantics on read virtual
  that the file system would respect.  I'm  pretty  sure  the  QUOTA.SYS
  file is not even handled this well.

  If the node is not in a cluster, the KILL_BUFFERS routine is called by
  READ_WRITEVB to scan the cache and invalidate the correct buffers.

  When a process initially accesses one of these files for writing,  the
  appropriate  cache  is  flushed  under the allocation lock.  The cache
  write lock is taken out on the cache (refer to the chapter on caches).
  This lock will be released in MAKE_DEACCESS.


                                   3-16


                                CHAPTER 4

                            ACCESS ARBITRATION


  Access arbitration is when you open a file  saying  "open  for  write,
  disallow  writers" and when someone else comes along and tries to open
  it for write, they fail.  For a  given  node,  access  arbitration  is
  handled  with  counters  in  the  FCB.  The FCB maintains counters for
  total accessors, total readers and total writers (on this node).


  4.1  Access Locks

  For a  cluster,  locks  are  used  to  control  access.   The  routine
  ARBITRATE_ACCESS  first  arbitrates  the  desired  access  against any
  pre-existing accesses on that node (reflected in the FCB).   If  those
  checks  fail, there is no point in checking further.  If they succeed,
  and the system is in a cluster (and the device is cluster accessible),
  either  the  NEW_ACCESS_LOCK  or  the  CONV_ACCLOCK  routines (both in
  LOCKERS) will be called, depending on whether an access  lock  already
  existed or not.

  The access lock is a root lock of the form

        F11B$a<volume id><file number>

  This is a system owned lock.

  The <volume id> follows the rules for F11B$v locks, and file number is
  like the same part in F11B$s locks.

  The various access and sharing combinations map  into  lock  modes  as
  follows:

        o  LCK$K_EXMODE - Read/write, disallow read/write

        o  LCK$K_PWMODE - Read/write, disallow write

        o  LCK$K_PRMODE - Read, disallow write


                                   4-1


  ACCESS ARBITRATION


        o  LCK$K_CWMODE - Read/write, allow read/write

        o  LCK$K_CRMODE - Read, allow read/write

        o  LCK$K_NLMODE - ignore whatever anyone else says.

  The current access lock mode is stored  in  FCB$B_ACCLKMODE  with  the
  lock id in FCB$L_ACCLKID.  Whenever a process comes along whose access
  is compatible with accessors on that node but whose access requires  a
  higher  lock,  ARBITRATE_ACCESS  is  called to raise that node's lock.
  CONV_ACCLOCK will lower the  lock  back  to  the  supplied  (previous)
  value.  If the FCB reference count is zero, CONV_ACCLOCK will de-queue
  the access lock outright, since no one is left on this node who  needs
  it.

  The MAKE_DEACCESS routine in CLENUP converts the lock if  the  process
  is  de-accessing  the file.  Using the updated reference counts in the
  FCB, a new ACCTL value  is  determined  which  LOCK_MODE  can  use  to
  determine  the  new  lock value.  If this lock value is lower that the
  current node lock (or the node's FCB reference count  goes  to  zero),
  CONV_ACCLOCK  is  called.   DEACC_QFILE (QUOTAUTIL) performs a similar
  computation when de-accessing the (node wide) quota file.

  NUKE_HEAD_FCB,  called  by  CLEANUP,   MARK_DELETE,   CLOSE_FILE   and
  UNHOOK_BFRD  (when  deleting a directory FCB lying around by virtue of
  its directory index block) will also request a CONV_ACCLOCK to NL mode
  for  the  purpose  of  possibly writing out the value block (discussed
  later).  CONV_ACCLOCK will de-queue the lock given that the  reference
  count for the FCB is zero.

  The normal call to  ARBITRATE_ACCESS  is  when  performing  an  access
  function.   CONN_QFILE  (QUOTAUTIL)  does  this when starting up quota
  operations.  DIRACC also uses it to determine if any has requested  no
  write  access  to  a directory (this would have to be from an explicit
  user open of a directory).  In this case, ARBITRATE_ACCESS  is  called
  to  see if we can write, but we return the access lock to its original
  mode (implying a null lock for us).  The same technique is  used  when
  opening  a  file  (except  for explicit interlock ignore) in OPEN_FILE
  (FILUTL).  The MODIFY functions of extend and truncate also  do  this.
  They  are  allowed to lower the access lock since they hold the serial
  lock on the file which will prevent any new accessors from  coming  in
  who might object to their intended operation.

  The combination of the FCB reference counts and the LOCK_COUNT of  the
  access lock indicates the other users of the file.  Certain operations
  do not allow other collections of users, even though they are  allowed
  by  the normal access rules.  The most obvious of these is truncation,
  described below.  Also, changing the security classification of a file
  (CHANGE_CLASS in RWATTR) allows no other accessors.


                                   4-2


                                                      ACCESS ARBITRATION


  4.2  Deferred Truncation

  A  writer  implicitly  disallows  truncation.   The  problem  is  that
  truncation depends on the ability to invalidate windows (WCBs) for all
  accessors.  This is especially difficult because it would be necessary
  to  revoke  I/O  that  was  in  driver  queues when the truncation was
  performed.

  The result is that truncation is  only  allowed  by  a  single  writer
  accessing  the  file.   If  there  are  readers when the truncation is
  performed, the actual truncation is deferred  until  the  last  reader
  de-accesses  the  file, in much the same way that a file is marked for
  deleted and doesn't really go away until completely de-accessed.

  The access lock is the mechanism  used  to  determine  when  the  last
  accessor  goes away.  This is because there will be exactly one access
  lock per node that has any accessors at all, and if a $GETLKI function
  that  returns  a  count  of locks on an access lock returns 1, then we
  know no one else is out there.  The routine LOCK_COUNT  performs  this
  function.

  When a truncation is being deferred, a flag is set in the value  block
  of  the  access  lock, as well as the VBN to truncate to.  This is how
  that information is passed to another node when the truncate operation
  occurs on one node, and the last de-accessor is somewhere else.  Since
  this information is also recorded in the  FCB  for  the  file,  it  is
  necessary to mark the FCBs stale cluster-wide.

  If, however, after the  writer  requesting  truncation  de-access  the
  file,  another  writer comes along, the delayed truncation is canceled
  (see ACCESS).  This is done by forcing the access lock to at least  PW
  mode,  clearing  the delayed truncation flags in the FCB (implying the
  lock value block) and doing a lock conversion to that same mode (which
  will  write  the  value  block).   The  first conversion to PW mode is
  always possible immediately.  (If the delayed truncation flag  is  on,
  it  indicates  that someone requested truncation while having the file
  accessed in PW mode (no other writers).  Since we succeeded in locking
  the   file   for  writing,  that  other  exclusive  writer  must  have
  de-accessed the file.  Since the delayed truncation  flag  is  on,  it
  indicates  that  some readers are still present (who must have allowed
  writers for us to have succeeded in getting write access).  Thus,  the
  highest  mode  requested  in  the  cluster  must  be  CR mode, thereby
  allowing us to convert to PW.)

  DEACCESS checks for a  request  to  truncate.   If  we  are  the  only
  accessor,  this  is done directly.  Otherwise, the truncation validity
  checks are made and the values stored in the value block.   (The  lock
  is  upgraded  to  at  least  PW  mode.   The  lowering of the mode (in
  MAKE_DEACCESS in CLENUP) when we  actually  finish  de-accessing  will
  write the value block out.) Also, DEACCESS notices if we were a reader
  and were the last one  to  de-access  a  file.   If  so,  and  delayed
  truncation was requested, the truncation is performed.


                                   4-3


  ACCESS ARBITRATION


  MODIFY itself only allows a truncation request  if  we  are  the  only
  accessor,  since it always does the truncation at the time of request.
  If we have the file accessed, we can simply check the reference counts
  in  the  FCB and the lock count for the access lock.  If we don't have
  the file accessed,  we  must  do  an  ARBITRATE_ACCESS  specifying  no
  readers and then check for our being the only accessor.  (CONV_ACCLOCK
  restores the old lock.)


  4.3  Marking FCBs Stale

  FCBs are an in memory summary of a number  of  pieces  of  interesting
  information  about  a  given  file  header.   That is, whenever you do
  anything to a file, you first build an FCB from the  header.   For  an
  accessed  file,  there  is an FCB for each header in the file, and the
  FCBs stick around in memory.

  If we have a file accessed on our node, it is quite reasonable  for  a
  process on another node to be changing the file header(s) for it, such
  as the protection, allocation (adding extension headers),  or  marking
  it for delete.  However, if an FCB is present, the XQP will believe it
  without looking at the header.  We'd like to know if someone else  has
  been  messing  with  our accessed file without reading the headers all
  the time so we can rebuild our FCB chain from the headers when we need
  to.

  A system blocking routine associated with the access lock is used  for
  this   purpose.    The   XQP$FCBSTALE  blocking  routine  (SYS  module
  SYSACPFDT) is armed with the primary FCB  address  as  its  parameter.
  Whenever  a  piece  of code messes with headers in a way that fouls up
  the FCB contents, it calls the routine MAKE_FCB_STALE.   This  routine
  queues for an EX mode lock on the access lock, triggering the blocking
  routine, which simply sets the FCB$V_STALE flag  in  the  FCB$W_STATUS
  field.

  The main call to MAKE_FCB_STALE  is  in  CLEANUP,  who  keys  off  the
  CLF_MARKFCBSTALE  flag  which  is  set  by  anyone who modifies a file
  header that would require rebuilding the FCBs.  This flag  is  set  by
  EXTEND  and RWATTR (protected attributes, UIC, class, file protection,
  ACL).  In the case of TRUNCATE, this  function  is  performed  by  its
  callers.   (In the case of MODIFY, truncation is allowed only if there
  are no other accessors, so MAKE_FCB_STALE is not called.  In DEACCESS,
  a  delayed truncation implies that there will be no accessors, so this
  is not needed.)

  Extending the quota file will explicitly call MAKE_FCB_STALE  for  the
  quota  file.   Performing  a  SHUFFLE_DIR  will  do  likewise  for the
  directory.

  An equivalent function is performed in  MARKDEL_FCB  when  an  FCB  is
  marked for delete.


                                   4-4


                                                      ACCESS ARBITRATION


  The stale flag will only be used for a cluster shareable device;  that
  is,  if  no  access lock is held by this node, stale will never be set
  and the FCB chain is good.

  DELETE fires the blocking AST (by hand) to mark that it has marked the
  file  as delete pending.  DEACCESS and SEARCH_FCB will force the local
  FCB chain stale if the access lock was held in NL mode (all  accessors
  on  our node requested no lock) since the FCBs are always questionable
  (blocking ASTs are not delivered to  NL  modes).   CREATE_HEADER  also
  sets the index file FCB stale, under a similar assumption.

  Various places in the XQP that look at the FCB test the stale flag and
  rebuild  the  FCB  chain from the headers if it is set.  These include
  ACCESS  (file  to  be  accessed),  SEARCH_QUOTA  (may   also   require
  serializing  on  the  quota  file))  (quota  file), CREATE (file being
  accessed), DEACCESS (file being de-accessed), MARK_DELETE (file  being
  deleted),   OPEN_FILE   (file  being  accessed),  MODIFY  (file  being
  modified), CONN_QFILE (quota file).

  When the FCB chain is found to be stale, REBLD_PRIM_FCB is called.  It
  will  initialize  a  new  FCB from the real file header, and rearm the
  blocking AST by converting  the  lock  to  the  same  mode.   This  is
  followed  by  BUILD_EXT_FCBS to pick up the extension FCBs which might
  also have changed.


                                   4-5


                                CHAPTER 5

                                  CACHES


  Other than the buffer cache, described under  I/O  buffer  management,
  the  system maintains other caches to speed up file system operations.
  These are described below.


  5.1  RMS Directory Cache

  The RMS directory cache is a list of directory names  and  their  file
  IDs  that RMS has seen before.  This is how it normally avoids calling
  the XQP every step of the way down a 6 level deep sub-directory  tree,
  for  example.   However,  it  needs to know if anyone has been messing
  with the directory structure, like deleting or renaming one.   If  so,
  it needs to call the XQP to step down the tree.

  To make this test and keep the overhead low, real low, it picks up the
  sequence  number UCB$W_DIRSEQ, and stores it with its cached directory
  entries.  Whenever the file system does  something  that  changes  the
  directory structure, it calls UPDATE_DIRSEQ (in CHKDMO) (done by ENTER
  when superseding a directory, REMOVE when removing a  directory  name,
  and  RWATTR  (when  the  directory  bit is turned off)).  The sequence
  number is also incremented when the volume is mounted, to  avoid  some
  races.

  The volume  lock  is  used  for  this  purpose.   RM$ARM_DIRCACHE  (in
  RM0SETDID)   converts   the   volume   lock   to  CR  mode  specifying
  RM$DIRCACHE_BLKAST (in RMSRESET in SYS) as  a  blocking  AST  routine.
  When  UPDATE_DIRSEQ  does  a  QEX_N_CANCEL  on  the  volume lock, this
  routine will bump the sequence number on the distant nodes.  The  high
  order  bit  of  the sequence number indicates that the blocking AST is
  armed.  CHECK_DISMOUNT clears this bit when the  lock  (and  therefore
  the  blocking  AST)  is  disarmed.   The  bit is also cleared when the
  blocking AST goes off.  When the blocking AST is  successfully  armed,
  and the sequence number matches what we started with, the armed bit is
  set.


                                   5-1


  CACHES


  5.2  File ID Cache

  The FID cache is effectively a pre-allocated section of the index file
  map.   It  is  found  from  VCB$L_CACHE and VCA$L_FIDCACHE.  The cache
  holds a list of known empty FIDs.  The  FID  cache  is  maintained  by
  CREATE_HEADER and DELETE_FID.

  When going to allocate a FID, the FID cache is checked first.  If  the
  cache  is empty, blocks will be read from the index file map until one
  is found with a free bit.  The free bits will  be  added  to  the  FID
  cache.  The index file map block is force written if this is done.  Of
  course, the file header for this FID must check out.

  If the FID cache is found not valid, we try  to  re-obtain  the  cache
  lock as part of making it valid.

  Entries are also added to the FID cache by DELETE_FID.  If the FID can
  be  put  into  the  cache successfully, fine.  Otherwise, some entries
  must be removed.  Since they will be written to the  index  file  map,
  and they will be read into (possibly) some other node's cache, we want
  to write out as many as possible that will fit in a given  index  file
  map  block.   The  index  map  VBN  of  this  block is recorded in the
  allocation lock value block.

  For a cache flush, all FIDs are returned.  A cache flush also  reduces
  the  cache lock to NL mode and marks the cache invalid (refer to cache
  flushing below).


  5.3  Extent Cache

  The extent cache is effectively a pre-allocated section of the  bitmap
  file.   It  is  found  from VCB$L_CACHE and VCA$L_EXTCACHE.  The cache
  holds a list of known free extents (LBN and size).  The  extent  cache
  is  maintained  by  routines  in  SMALOC.   The  idea is to maintain a
  certain fraction of the disk free space in the extent cache.

  When trying to allocate an extent, the extent cache is checked  first.
  If  this  fails,  allocation occurs directly from the bitmap.  If that
  fails, the extent cache is flushed to hedge our bets on another try at
  the  bitmap.   After the allocation, a try is taken at (re)filling the
  extent cache from the bitmap block in memory, and then from the bitmap
  itself.   The  VBN  of the block from which we succeed in finding free
  blocks is recorded in the storage map VBN field in the allocation lock
  value block.

  Returning blocks likewise returns first to the extent cache.  If  this
  overflows, some extents are purged back to the storage map.

  When operating on the extent  cache,  if  it  is  marked  invalid,  an
  attempt  is  made  to start keeping it valid by re-obtaining the cache


                                   5-2


                                                                  CACHES


  lock.

  When removing entries from the extent cache,  an  effort  is  made  to
  remove  those  extents  that  would  map into the same bitmap block on
  disk.  This saves us writes  and  saves  other  nodes  reads  to  find
  sufficient free space.

  For a complete flush, though, we just write out all extents.  In  this
  case, the cache lock is reduced to NL mode.


  5.4  Cache Flushing

  The presence of the FID and extent caches implies that the index  file
  and  storage  maps  do  not  actually  reflect the true amount of free
  space.

  When you really need to know  everything  that  is  really  available,
  you've  got  to  get  everyone  else  to flush their cache back to the
  bitmap where you can find it.   This  is  done  with  system  blocking
  locks, with help from the CACHE_SERVER process (PID XQP$GL_FILESERVER,
  routine XQP$GL_FILSERV_ENTRY, which is set to CACHE_SERVER).

  Flushing back the extent cache can be triggered by various  conditions
  generated  when  allocating  storage in the SMALOC module.  It is also
  triggered by write accessing the BITMAP.SYS file.  Flushing  the  file
  number  cache can be triggered by the CREATE_HEADER routine failing to
  find a free header, or by write accessing the INDEXF.SYS file.

  Basically, the idea is that when those caches  are  being  used,  they
  hold  a  lock  with  the  blocking routine XQP$UNLOCK_CACHE (SYSACPFDT
  module, SYS facility), and an AST parameter that  identifies  the  UCB
  with the cache type encoded in the low bits.  This lock has the form

        F11B$c<lockbasis>

  where <lockbasis> is either the index  file,  bitmap  file,  or  quota
  file.   When you need to flush, you queue an incompatible lock for the
  appropriate F11B$c lock (routine CACHE_LOCK) and the blocking  routine
  in  turn  queues  an  AST  for  the  CACHE_SERVER process, with an AST
  parameter telling it what to do.  The CACHE_SERVER, in  turn,  does  a
  normal  XQP  call  with a special ACPCONTROL function that flushes the
  correct cache.  The cache flushing function needs a  specific  process
  since  the  blocking  AST, being from a system lock, may go off in any
  random process (in particular, one without the XQP  mapped  (the  null
  process, for instance)).

  In the normal case, each node will take out a PR  mode  lock  on  each
  cache  of  interest.  If any node needs to do something special (write
  access one of the files or otherwise request a cache flush elsewhere),
  the  node  takes out or converts its lock mode to CW, firing the other


                                   5-3


  CACHES


  nodes blocking ASTs.  The other nodes flush  their  caches,  and  then
  convert  their  lock  to  NL  mode, allowing the other node to get its
  lock.

  Accessing for write one of the special files (INDEXF, BITMAP or QUOTA)
  causes  a  system lock in CW mode to be taken out on the cache.  Also,
  if quota processing is turned on, and we notice that someone  has  the
  quota  file  opened for write (while quota processing was off), the CW
  mode cache lock is taken.  We will wait for  other  nodes  to  release
  their PR locks.

  If CREATE_HEADER wants other nodes to flush their FID cache, it  takes
  out  a  process  CW  lock  on  the  cache  lock.   No  blocking AST is
  associated with this.  The lock is de-queued as soon  as  we  get  it.
  Likewise,  if  ALLOC_BLOCKS  fails  to  allocate,  it  will take out a
  process CW mode lock on the cache lock.  We wait to get the  lock,  so
  that  we  know  that  all other nodes gave up their PR locks (that is,
  that they flushed their caches).

  If a cache is found invalid (the starting state for a cache, also  the
  state  after  a flush), a PR mode lock is requested on the cache lock.
  We do not wait for this.  If we fail to take the lock, we simply  flag
  the  cache as invalid.  The quota cache will also be marked as needing
  flushing (since the quota software will still read quota records  into
  the  cache  and  modify them there).  We do not wait because some node
  may be holding a CW mode lock  indefinitely,  by  virtue  of  a  write
  access to the file associated with the cache lock.

  A complete flush of a cache will reduce the cache lock to NL  mode  to
  allow the requesting nodes to continue with their lock requests.

  CHECK_DISMOUNT  will  de-queue  any  special  cache  locks  it  finds.
  MAKE_DEACCESS  will  de-queue  a  cache write lock if held on the file
  being de-accessed.  DEACC_QFILE will de-queue the cache lock if held.


                                   5-4


                                CHAPTER 6

                          QUOTA FILE PROCESSING


  As mentioned earlier, one of the last functions performed  in  an  XQP
  request  is to reflect new disk usage in the quota file.  This is done
  unless FIB$V_NOCHARGE is set (which a user cannot since  GET_FIB  will
  clear  it).   The  flag  is set by EXTEND_INDEX so that the index file
  isn't charged.  (EXTEND checks this flag.)


  6.1  Quota File Operations

  There are various acpcontrol functions that  a  user  can  request  to
  operate on the quota file.  These are implemented by QUOTA_FILE_OP.

  A problem QUOTA_FILE_OP has to start with is that it must acquire  the
  serial lock on the quota file before it can obtain the allocation lock
  for the volume.  It must acquire the allocation lock to prevent  quota
  figures  from  changing.   However,  the  allocation lock protects the
  existence of the quota file FCB; that is, VCB$L_QUOTAFCB is not stable
  except under the allocation lock.  So, QUOTA_FILE_OP loops, requesting
  the serial lock on the quota file (using whatever random value it gets
  by  looking  at VCB$L_QUOTAFCB), getting the allocation lock, checking
  that VCB$L_QUOTAFCB  matches  what  it  believes,  and  unlocking  and
  re-trying until it does.

  QUOTA_FILE_OP performs the  protection  checks  needed.   SEARCH_QUOTA
  locates  the  desired  quota  record.   (Note that this search is done
  directly against the quota file, not the quota cache.)

  A disable quota function flushes the quota  cache  (described  below),
  force  writes  any  quota  file  buffer  blocks, and then performs the
  actual quota file de-access (DEACC_QFILE).

  An examine quota function simply returns the quota file record.

  A remove quota function entry returns the old  entry.   The  entry  is
  zeroed, under a exclusive lock on the quota entry.


                                   6-1


  QUOTA FILE PROCESSING


  A modify quota function patches the entry and writes  it.   Note  that
  the  process  must  hold  the volume blocking lock to modify the usage
  figure.

  An add quota function finds the next free record and writes the  quota
  information.   An  EXTEND_CONTIG will be done if necessary to grow the
  file.

  To actually de-access the quota file, DEACC_QFILE  starts  by  killing
  any quota file buffers (KILL_BUFFERS (1, -1)).  The access lock on the
  quota file is downgraded to show our de-access.  The  quota  cache  is
  de-allocated.  We also release our quota cache lock if we had it.

  The inverse  operation  of  connecting  the  quota  file  is  done  by
  CONN_QFILE.   CONN_QFILE  does a FIND to locate the quota file.  Under
  the quota file serialization lock, the FCB is found  or  created,  and
  the  extension  FCBs  built.   Write access is requested for the quota
  file.  MAKE_QFCB allocates the quota cache, linking  it  to  the  VCB.
  The  ACBs  in  the  cache  header  for  the  various blocking routines
  (described below) are set up here.  If the quota file is already write
  accessed, the quota cache lock is taken out.


  6.2  Quota Cache

  The quota cache has entries based on UIC  to  keep  track  of  allowed
  usage,  current  usage,  etc.,  without the need to read and write the
  QUOTA.SYS file itself all the time.  It is allocated by in paged  pool
  by  MAKE_DISK_MOUNT  in  the  MOUNT routine MOUDK2 and de-allocated by
  CHECK_DISMOUNT.

  The quota cache is found  by  chasing  through  VCB$L_QUOCACHE.   This
  points  to  a  VCA  block.   Each  entry  contains  a  UIC,  the quota
  information, a lock status block used with the quota cache entry locks
  (described below), the quota file record number, and LRU indexes.  The
  cache header contains a LRU counter.  When a new entry is  added,  the
  value is put into the entry and this counter incremented.

  FLUSH_QUO_CACHE returns each entry to disk.  The corresponding  record
  on  disk  is  located  and updated (CLEAN_QUO_CACHE).  Any quota entry
  locks are released, including a conversion to NL of  the  quota  cache
  lock itself.

  SCAN_QUO_CACHE looks up an entry in the cache.  If the cache is marked
  invalid,  this routine tries to get the normal (PR) cache lock so that
  the cache can stay valid.  If the entry is marked dirty, the record is
  updated  from  disk.   The quota entry lock is released for the entry.
  If the entry is not valid, the quota entry lock (PW)  is  obtained  to
  make it valid.

  CLEAN_QUO_CACHE updates the disk record from a cache entry.  The  disk


                                   6-2


                                                   QUOTA FILE PROCESSING


  buffer is marked dirty, the cache entry marked clean.

  ENTER_QUO_CACHE copies a given record into the cache.  The  LRU  index
  is updated if requested, the entry marked dirty if requested.

  If SCAN_QUO_CACHE finds the quota cache invalid and cannot obtain  the
  cache  lock, it sets the CACHEFLUSH flag in the cache header.  CLEANUP
  will check for this and flush the quota cache when  we  are  done  (to
  reflect our changes to the process holding the quota cache lock).


  6.3  Quota File Manipulation

  The main routine within quota file processing is  SEARCH_QUOTA.   This
  routine  locates  a  quota  record  for a given UIC.  It will scan the
  quota  cache,  updating  the  quota  file  from  the  cache  entry  if
  necessary.   If  the  record  can't  be found, or wild card search was
  requested, the quota file must be scanned.  The scan of the quota file
  is done before the quota cache in the wildcard case to get the records
  in proper order.  If the record returned is in the cache, the returned
  address  is  that  of DUMMY_REC within CHARGEQ.  This value is special
  cased elsewhere within CHARGEQ.  REAL_Q_REC  is  the  address  of  the
  buffer  containing  the  actual  disk  quota  record, if there is one.
  WRITE_QUOTA will update the cache entry, and/or the disk record  (mark
  the buffer dirty) depending on these variables.

  CHARGE_QUOTA  does  the  system  processing  of  charging  for  quota,
  checking  for  overdrawn, etc.  It will write out the new quota record
  if the quota charge is okay.


  6.4  Dynamic Quota Cache Entry Lock Passing

  For nodes in a cluster, for each entry in the quota  cache,  the  node
  holds a lock of the form

        F11B$q<volume id><UIC>

  in PW mode.  All the relevant information of the quota entry is packed
  into the value block, so it can be shared cluster-wide.  (If the value
  block comes back as invalid from the lock manager, we simply mark  the
  quota  cache entry as invalid.) When a new entry is added to the quota
  cache (by SCAN_QUO_CACHE,) the quota entry lock is obtained.  When  an
  entry   is   removed   from   the  cache,  either  by  explicit  flush
  (FLUSH_QUO_CACHE) or by LRU replacement (SCAN_QUO_CACHE), the lock  is
  de-queued.   The  lock  is  held,  normally,  as  a system owned lock,
  specifying the XQP$REL_QUOTA blocking routine in the SYSACPFDT  module
  (SYS).   A  subsequent operation on that node hitting that cache entry
  will not do any lock conversions at all.


                                   6-3


  QUOTA FILE PROCESSING


  When another node queues for the F11B$q lock, the blocking routine  is
  triggered  and sends an AST to the swapper (routine XQP$UNLOCK_QUOTA).
  If the quota entry is valid, the cache lock  is  demoted  to  CR  mode
  (compatible with the PW that the other node is requesting).  This will
  write out the value block, which the other node will pick up  when  it
  succeeds  in  getting  its  PW  mode  lock.  If the quota entry is not
  valid, the lock is de-queued entirely and the entry marked vacant.

  When a quota entry is being removed from the quota  file  itself,  the
  quota  entry  lock is requested in EX mode.  This forces any node that
  holds any lock to it to perform a cache flush of the entry.


                                   6-4


                                CHAPTER 7

                           DIRECTORY OPERATIONS


  Operations  upon  directories,  other  than  explicit   accessing   of
  directories,  is  handled  by the modules DIRACC, DIRSCN, FIND, ENTER,
  REMOVE and SHFDIR.

  FIND is called from  the  main  processing  routines  ACCESS,  DELETE,
  MODIFY and CONN_QFILE.  ENTER is called from CREATE.  REMOVE is called
  from FIND to perform a requested deletion.  These are  the  interfaces
  into the directory routines.

  FIND does the processing of taking the directory FID from the FIB  and
  locating  and accessing the directory, and finding the directory entry
  to get the file's FID.  ENTER adds a  new  directory  entry,  possibly
  removing/superseding one.  REMOVE removes a directory entry.

  DIR_ACCESS accesses a directory.  This is a  less  involved  operation
  than  accessing  a  file since the directory access only lasts for the
  duration of an XQP operation.  DIR_SCAN does the  walking  down  of  a
  directory.

  SHUFFLE_DIR extends and contracts directories as  requested  by  ENTER
  and REMOVE.

  ERR_CLEANUP can perform various directory operations.  It will  remove
  an  entry  if  need  be,  or  restore an entry by calling RESTORE_DIR,
  DIR_SCAN and MAKE_ENTRY directly.


  7.1  FIND

  FIND looks up a  directory  entry.   It  takes  as  input  the  buffer
  descriptors  and  the  FIB  supplied  by  the user.  It can optionally
  remove the entry (requested by  DELETE)  and  set  the  version  limit
  (sometimes  requested  by  MODIFY).   These sub-functions are provided
  because the calling routines do not wish to operate on  the  directory
  entry  themselves.   FIND  will  also return the resultant name string
  (requested by DELETE).


                                   7-1


  DIRECTORY OPERATIONS


  The real work of FIND is performed by DIR_ACCESS and  DIR_SCAN.   FIND
  handles a few cases of wild card searching.

  Note that the directory search is metered as a sub-operation.


  7.2  DIR_ACCESS

  DIR_ACCESS accesses a directory.  It is  called  by  FIND  and  ENTER.
  (REMOVE  works  against the directory entry described by the directory
  context area.) The arguments to DIR_ACCESS are the user's FIB and  the
  necessary  access  type  (read/write/execute).   DIR_FCB  is  set as a
  result.  The serialization lock is obtained on the directory.   Access
  locks are manipulated just to make sure that no node is preventing our
  access (due to explicit opening of the directory);  the  serialization
  lock obtained obviates the need to hold an access lock.

  Note that DIR_ACCESS is basically a no-op if DIR_FCB is set.   In  the
  case  of  create-if, ACCESS would have called DIR_ACCESS (via FIND) to
  try to find the directory entry.   This  would  have  done  a  execute
  protection  check.   The  write  protection  check  that  CREATE needs
  (normally done by DIR_ACCESS called from ENTER) will be skipped  since
  DIR_FCB  is  still  set.   So, CREATE performs the protection check in
  this case.


  7.3  DIR_SCAN

  DIR_SCAN locates a particular directory record.  It takes as inputs  a
  name  descriptor  block, a FID to locate, a starting block, record and
  version, a predecessor record, and a number of records  to  scan.   In
  the  case where a normal name lookup is to be done (supersede cleanup,
  ENTER test for duplicate file  name),  only  the  name  descriptor  is
  passed.   If  we need to remove an entry, a DIR_SCAN is done given the
  new starting point, specifying a  predecessor  record  and  no  record
  limit (find version set to -32768) to find the oldest version.  In the
  FIND case of supplied wild card context, FIND asks to  find  FID  (-1.
  -1.   -1) for the given starting block and record number from the wild
  card context (this is the case in which a record limit  is  provided).
  This  will fail, but will position to the desired record.  A full wild
  search is done from that starting point.  If  a  resultant  string  is
  supplied,  search  the  indicated  block  for the given entry.  If the
  search fails immediately (no records are traversed), search again from
  the  start.   If  the  version  is  not wild, we search for the oldest
  version so we are positioned at the start of the next name.   Thus  we
  are  left  positioned at the record, or where it used to be.  When all
  of this processing is done, the actual desired entry is found.

  The directory index block is used by DIR_SCAN to reduce  search  time.
  The  directory  index  block gives the last file name present in every


                                   7-2


                                                    DIRECTORY OPERATIONS


  nth block of the directory (where n is normally  1).   This  block  is
  used to select a starting and ending block for the search.

  The directory is processed a block at  a  time.   This  is  necessary,
  since  the buffer manager can only guarantee reading a single block at
  a time.  This is just as well, because this gives an  easy  handle  to
  build the directory index block.


  7.4  NEXT_REC

  The DIR_SCAN routine NEXT_REC returns a pointer to the next record  in
  the current block.  This routine validates the format of the entry.


  7.5  UPDATE_INDX

  The directory index block is  maintained  by  UPDATE_INDX,  called  by
  DIR_SCAN,  as  it  walks  down  directory  blocks not described by the
  directory index block, and by ENTER when it changes a block.  Updating
  the  index  is  a  simple string copy into the correct directory index
  block cell.


  7.6  NEXT_DIR_REC

  NEXT_DIR_REC will find the next directory entry, including reading the
  next  block  if  necessary.   The  routine returns a value only if the
  entry matches the name of the previous entry supplied.   This  routine
  is used by FIND and ENTER.


  7.7  ENTER

  ENTER adds a new entry into a directory.   It  also  includes  a  FIND
  operation.   ENTER  takes a buffer descriptor block and a FIB, as does
  FIND, as well as returning the resultant name string.  ENTER is called
  from CREATE (MAKE_ENTRY is called during ERR_CLEANUP).  As such, ENTER
  makes its own DIR_ACCESS call.  In the case of  superseding  an  entry
  (meaning  to  replace an existing entry that matches in name, type and
  version), this can be done in line.  Otherwise, the work  is  done  by
  MAKE_ENTRY.


                                   7-3


  DIRECTORY OPERATIONS


  7.8  MAKE_ENTRY

  MAKE_ENTRY does the work of  actually  entering  an  entry.   This  is
  called  in  ENTER, and also in ERR_CLEANUP to undo a remove operation.
  The inputs are a file name block and the user's FIB.  MAKE_ENTRY  keys
  off the directory position set by the caller's DIR_SCAN.  If the entry
  is added to the end of a block, the  directory  index  block  cell  is
  updated.

  MAKE_ENTRY handles the case needing a removal also.  In such  a  case,
  the directory context denoting the insertion point is saved.  A second
  DIR_SCAN locates the oldest  entry  to  remove.   After  removing  the
  entry, the directory context is restored.


  7.9  RESTORE_DIR

  RESTORE_DIR restores a saved directory context.  The only  interesting
  aspect  of this is the possible re-reading of a directory block.  This
  is done during the MAKE_ENTRY search for an entry to remove; also  the
  ERR_CLEANUP restoration of directory context before undoing a remove.


  7.10  REMOVE

  A directory entry is removed by REMOVE.  REMOVE  is  called  by  FIND,
  when  requested  to  remove  the  found  entry,  by  ENTER, when it is
  necessary to remove an  entry  due  to  version  limitations,  and  by
  ERR_CLEANUP, to undo an enter operation.  As an option, it will keep a
  name with no versions.  (This is done by ENTER, when we  are  removing
  an  entry,  and  we want to leave the name (in case there was only one
  version) for us to enter.) Note that removing an entry does not change
  the  directory  index  block,  since  the  old  name is just as good a
  pointer to this directory block as is  the  new  last  record  in  the
  block.


  7.11  SHUFFLE_DIR

  SHUFFLE_DIR expands/contracts a directory.  This is done by ENTER  and
  REMOVE.   The  argument  tells the direction to expand (1) or contract
  (-1).  The operation keys off the directory context generated  by  the
  caller.

  Performing a directory shuffle is a secondary context operation.

  If the operation is an extend, the directory  will  be  expanded  into
  unused  end  blocks  if  present.   Otherwise,  a contiguous extend is
  needed.  The directory is expanded by half its present size, with  the


                                   7-4


                                                    DIRECTORY OPERATIONS


  old blocks copied into the newly allocated space.  The copy is done so
  as to duplicate blocks, for safety in a crash.  The current  block  is
  split into two parts, depending on the current position.

  For a compression, a block is squished out.  The blocks following  the
  squish are copied downward.

  The directory file header is updated to show the new blocks/EOF.   The
  FCBs  are  rebuilt  to  map.   Finally,  the  directory index block is
  cleared out, now that  we  lost  track  of  the  names  in  the  block
  squished/added.


                                   7-5


                                CHAPTER 8

                              ACL OPERATIONS


  The ACL for a file is an obvious parameter to CHECK_PROTECT.  The  ACL
  is  stored  as  an  in-memory  list  (paged pool) located from the ORB
  located from the primary FCB for a file.  The ACL is  created  (copied
  from  the file header chain) by FILL_FCB, when the CLF_NOBUILD flag is
  off (that is, only during initial FCB creation).  The ACL is  threaded
  onto   the   ACL   queue  of  the  ORB,  with  ACL_INIT_QUEUE  called.
  Correspondingly, REBLD_PRIM_FCB calls ACL_DELETEACL upon this ACL, and
  then  calls  INIT_FCB2  (which  calls FILL_FCB) without CLF_NOBUILD so
  that the ACL is rebuilt.

  NUKE_HEAD_FCB,  called  by  CLEANUP,   MARK_DELETE,   CLOSE_FILE   and
  UNHOOK_BFRD  (when  deleting a directory FCB lying around by virtue of
  its directory index block) are  the  main  path  to  deleting  an  ACL
  (ACL_DELETEACL).   The  ACL  will be deleted by CHECK_DISMOUNT when it
  de-allocates any FCBs associated with the device.

  The ACL can be returned to the user via a  read  attributes  function.
  It is set by WRITE_ATTRIB.  GET_FIB initially sets FIB$L_ACL_STATUS to
  success.  This value gets its real value in  READ_ATTRIB/WRITE_ATTRIB.
  READ_ATTRIB  calls  ACL_DISPATCH  for  ACL  operations other than add,
  delete and modify.   WRITE_ATTRIB  will  pass  on  all  operations  to
  ACL_DISPATCH.   Writing  to  the ACL causes the FCB to be marked stale
  cluster-wide (forces rebuilding the in-memory ACL).  The  file  header
  ACL chain is rebuilt (ACL_BUILDACL).

  The ACL is initially built in a file header chain by  CREATE.   For  a
  file  just being entered, PROPAGATE_ATTR will copy an ACL.  For a real
  create, a WRITE_ATTRIB will copy the user's ACL.  If the file is owned
  by  other  than  the  creator, an explicit ACL term for the creator is
  added (ACL_ADDENTRY).  CREATE will explicitly  call  ACL_BUILDACL  for
  good  measure,  in  case  it  wasn't picked up in any ACL manipulation
  earlier.  For a new file, PROPAGATE_ATTR  (COPY_INFO,  actually)  will
  copy (ACL_COPYACL) the default protection ACL from the DIR_FCB.  For a
  new version of an existing file, the ACL is copied from the old file's
  FCB (obtained in secondary context).


                                   8-1


  ACL OPERATIONS


  8.1  ACL_BUILDACL

  This routine copies the in-memory ACL  into  the  file  header  chain.
  During  this copy, the BADACL bit is set in the primary header so that
  the presence of the corrupted ACL can be seen.  The file's  header  is
  extended if we run out of extension headers.


  8.2  ACL_COPYACL

  ACL_COPYACL copies specified ACEs from one  FCB  to  another.   It  is
  called to copy the entire ACL from a file to a new version thereof, or
  the default protection ACEs from a  directory  to  a  new  file.   The
  operation is a simple copy and thread operation.


  8.3  ACL_DISPATCH

  READ_ATTRIB/WRITE_ATTRIB call ACL_DISPATCH to  return/affect  the  ACL
  for  a file.  ACL_DISPATCH simply calls the desired routine, given the
  operation code specified in the user attribute area.  The caller  must
  reflect  any changes into the file headers.  The actual operations are
  done by routines contained in ACLSUBR.  The return value is stored  in
  FIB$L_ACL_STATUS.


  8.3.1  ACL_INIT_QUEUE

  ACL_INIT_QUEUE is called before any explicit operations upon the  ACL.
  An  ACL  exists  as  a threaded list from the ORB, manipulated under a
  mutex (at elevated IPL).  The mutex in the ORB is initialized by  this
  routine,  the  mutex  locked, and the queue head set to be empty under
  the lock.  The routine leaves with the mutex unlocked.


  8.3.2  ACL_ADDENTRY

  Adds an ACE to an ACL.  Note that ACL  segments  are  limited  to  512
  bytes, so adding an ACE may require splitting an ACL segment.


  8.3.3  ACL_DELENTRY

  Delete an ACE from an ACL.  The segment containing the  old  entry  is
  always deleted; the remaining ACEs are copied to a new segment.


                                   8-2


                                                          ACL OPERATIONS


  8.3.4  ACL_MODENTRY

  Modify an entry, just a delete followed by an add.


  8.3.5  ACL_FINDENTRY

  The basic ACE finder.  Matches ACEs depending on context of use.


  8.3.6  ACL_FINDTYPE

  Locate an ACE based on type.


  8.3.7  ACL_DELETEACL

  Delete the entire ACL.  Also called when deleting FCBs.


  8.3.8  ACL_READACL

  Return as much of the ACL as fits in the user area.


  8.3.9  ACL_ACLLENGTH

  Return the length of the total ACL.


  8.3.10  ACL_READACE

  Return a single ACE.


  8.3.11  ACL_LOCATEACE

  Locate ACE by context value.


                                   8-3


                                CHAPTER 9

                          USER BUFFER PROCESSING


  The IRP sent to the XQP contains an  address  to  an  ACP  I/O  buffer
  packet  in  IRP$L_SVAPTE.   AIB$L_DESCRIPT  points to a blockvector of
  elements, each a buffer descriptor.  Buffer descriptor here refers  to
  a user area that is supplying, or is to be supplied with, information.
  Each element (ABD) contains an offset to the data, the size,  and  the
  user  virtual  address  of the data.  The offset plus one added to the
  address of the buffer descriptor gives the address of the buffer  (the
  byte  preceding  that is the access mode taken from IRP$B_RMOD).  Each
  possible user buffer has a reserved index  in  the  blockvector.   The
  indexes are zero origin.  The last element reserved corresponds to the
  read/write  attribute  user  function.   All  buffers  from  then   on
  correspond  to  read/write attribute buffers.  IRP$L_BCNT contains the
  number of buffer descriptors present.  (Note that for a  window  turn,
  IRP$V_COMPLX is off, and none of this applies.)

  The actual user buffers  are  copied  into  the  AIB  buffers  by  FDT
  processing (SYS module SYSACPFDT).  They are copied back to the user's
  area by I/O processing completion (BUFPOST in SYS module IOCIOPOST).

  The first entry is for returning the window pointer.  This  is  not  a
  user supplied buffer.  BUILDACPBUF (in FDT processing) sets the window
  pointer return address to CCB$L_WIND.  GET_REQUEST  zeros  the  window
  pointer  return  length (except for window turns) so that the value is
  not returned.  MAKE_ACCESS restores the window pointer  return  length
  (to  4) and return the window pointer.  ZCHANNEL cleanup (which aborts
  a failed access attempt) returns a zero for the window pointer.

  The user's FIB occupies one element of the buffer descriptor list.  It
  is  copied  into LOCAL_FIB by GET_FIB.  The updated FIB is copied back
  to the FIB buffer by IO_DONE.

  The file name buffer is passed as the input to PARSE_NAME  from  ENTER
  and  FIND  to parse the user's file name into the internal name block.
  COPY_NAME (called from CREATE and FIND (for a spooled device))  copies
  the  file name buffer into the result string buffer.  It also sets the
  result string length buffer  value.   IO_DONE  clears  the  file  name
  return length to inhibit write-back of it.


                                   9-1


  USER BUFFER PROCESSING


  For quota file operations (QUOTA_FILE_OP), the  file  name  buffer  is
  used to pass a quota file transfer block (DQF).

  For operations on a spooled device, FDT  processing  placed  the  user
  name  and  account  in  the  file name string to be placed in the file
  header.

  RETURN_DIR,  called  from  ENTER  and  FIND,  returns  the  name  from
  DIR_ENTRY  and  DIR_VERSION into the result string buffer.  The result
  string length buffer is also set.  The result string is itself  passed
  to PARSE_NAME from FIND when processing a wild card search.

  Quota file operations call RET_QENTRY (QUOTAUTIL) to return the  quota
  record  (DQF) into the result string buffer.  The result string length
  is set here.

  If a user attribute buffer exists, a  read/write  attributes  function
  (READ_ATTRIB/WRITE_ATTRIB)  is  performed.   ACCESS  will  perform  an
  attribute read.  CREATE, DEACCESS, MODIFY will  perform  an  attribute
  write.   IO_DONE  sets  IRP$L_BCNT during non-read operations so as to
  inhibit write-back of the attributes.

  The attribute list sometimes contains placement  data  (processed  for
  compatibility)  when FIB$V_ALLOCATR is set.  GET_LOC_ATTR, called from
  CREATE and MODIFY, will scan the user's attribute list  for  placement
  data, copying it into standard format in the FIB.


                                   9-2


                                CHAPTER 10

                          SPOOL FILE OPERATIONS


  Spool files are non-entered files that are flagged as  spooled.   Such
  files  will  be sent to the symbiont when de-accessed.  The idea is to
  allow a process to pretend to be writing to a printer, when in fact it
  is  not.  For spool files, IRP$L_UCB (which is loaded into CURRENT_UCB
  by GET_REQUEST) refers to the spool file.  IRP$L_MEDIA is set  to  the
  spooled  device UCB (a printer).  Spool file operations are recognized
  when a process does a  create  function  to  a  printer  specified  as
  spooled.   This  will be translated into a creation of a spooled file.
  Requests to operate on spool files are recognized  in  FDT  processing
  (SYS  module  SYSACPFDT).   For  implicit spooling, the file name user
  buffer is replaced by the user name and account, to  become  the  file
  name  in  the header.  GET_REQUEST notices that IRP$L_UCB is different
  from IRP$L_MEDIA, setting CLF_SPOOLFILE.

  The SPOOL flag is set in the file header for spool  files  by  CREATE.
  This  flag  causes FILL_FCB to set FCB$V_SPOOL.  The SPOOL header flag
  is one of the characteristics that can not be changed by WRITE_ATTRIB.

  Operations on a spool file no-op directory operations.

  DEACCESS will set CLF_DOSPOOL when de-accessing  a  spool  file.   The
  CLF_DOSPOOL  cleanup  flag causes the file to be sent to the symbiont.
  SEND_SYMBIONT  picks  up  the  queue  name  for   the   request   from
  VCB$B_QNAMECNT  (an ASCIC string).  An itemlist is built providing the
  name, FID, etc.  of the file.   This  is  sent  to  $SNDJBC.   $SNDJBC
  notices  that it is being called from kernel mode, and does not do the
  various access checks it would otherwise  (which  would  cause  us  to
  hang).  If the symbiont request fails, the file is deleted (by setting
  CLF_DELFILE), with the job controller error status being  returned  to
  the user (in USER_STATUS[1]).


                                   10-1


                                CHAPTER 11

                             ACCESS OPERATION


  The ACCESS function is  invoked  if  a  IO$_ACCESS  function  code  is
  specified.  The basic steps follow.
  Find the directory entry if necessary.
  Serialize processing on the file.  (This  allows  stable  FCBs  to  be
  found.)
  Find the FCBs.  This is the point at which  we  detect  that  we  have
  serialized  on  the  wrong  lockbasis (the user is trying to access an
  extension header directly).  In this case, we release the serial  lock
  and  serialize  on the correct (primary header) lockbasis.  Create the
  primary FCB if not found.
  Check for access conflicts; obtain the access lock.
  Create a window to the file.  MAKE_ACCESS (thread the  window,  update
  access  counts,  return  the  window  address  in the user's attribute
  area).
  Set V_WRITE_TURN in the window for directories  or  other  interesting
  files.
  See if the expiration date needs updating.
  Build/check extension FCBs.
  Check user access to the file.
  If the file is interesting, flush caches  of  the  file.   Obtain  the
  appropriate cache lock.
  Check access for, and read attributes, if requested.
  Check the need for cathedral windows.


                                   11-1


                                CHAPTER 12

                             CREATE OPERATION


  The CREATE function is  invoked  if  a  IO$_CREATE  function  code  is
  specified,  or  if  DISPATCHER detects SS$_NOSUCHFILE from ACCESS when
  IO$V_CREATE was specified.  The basic steps are as follows.
  Clean-up from a previous  access  attempt,  if  this  is  a  create-if
  operation.   Perform  the  write access check on the parent directory,
  since the FIND within the access attempt only did an execute check.
  Find a volume for the file.  Check user's access  to  create  on  that
  volume.
  Create/allocate a file header.
  Create the primary FCB.
  Enter the file into the supplied directory.
  For a propagate operation, serialize on the file.  Search/build/create
  its FCBs.  Copy attributes to the file.
  Check the back link/name.  Update the header if no good.
  Do write attribute processing; correct ACL to include creator.
  Charge quota for the file.
  Access the file if requested.
  Perform extension, if requested.
  Update the file headers with the ACL.
  Re-map the file if extended (and cathedral windowed).
  Delete any file superseded/removed.


                                   12-1


                                CHAPTER 13

                             MODIFY OPERATION


  The MODIFY function is  invoked  if  a  IO$_MODIFY  function  code  is
  specified.  The basic steps are as follows.
  Locate the directory entry.
  Serialize on the file.
  Search/build/create the FCBs.
  Check accessors; obtain access lock.  Check for access to file.
  Perform write attributes processing.
  Perform extension or truncation.
  Update file header.


                                   13-1


                                CHAPTER 14

                             DELETE OPERATION


  The DELETE function is  invoked  if  a  IO$_DELETE  function  code  is
  specified.  The basic steps are as follows.
  Find the directory entry.
  Serialize on the file.  Read its header.
  Search/build/create FCBs.
  Check access on file.
  Make sure, if a directory, that its empty.
  Audit the deletion.
  Check for other accessors.
  Mark the header for delete.
  Kill any cache buffers for the file (directories only).
  Mark the FCB for deletion.
  If we are the only accessor, delete the file.
  Restore  the  access  lock  (manipulated  when  checking   for   other
  accessors).
  Delete any FCBs.
  For a directory  entry  removal  (no  file  deletion),  request  entry
  removal  of  cleanup.   Otherwise,  directory entry removal is done by
  DELETE_FILE.


                                   14-1


                                CHAPTER 15

                            DEACCESS OPERATION


  The DEACCESS function is invoked if a IO$_DEACCESS  function  code  is
  specified.  The basic steps are as follows.
  Serialize on the file.
  Rebuild the FCBs if something must be done to the file.
  Request cleanup deletion of the file if marked for deletion and we are
  the last accessor.
  Update revision count, etc.
  Update the file high water mark.
  Clear the de-access lock flag if attributes are being written.   Write
  the attributes.
  Perform any requested truncation.  If we are not  the  only  accessor,
  set  up  for  delayed  truncation.   If  we  are the last accessor and
  delayed truncation was requested, do it.

  DEACCESS returns with a zero as a condition value (error).  This  will
  invoke ERR_CLEANUP who will de-access and possibly delete the file.


                                   15-1


                                CHAPTER 16

                 WINDOW TURNING AND BAD BLOCK PROCESSING


  If the FDT routines, when processing a virtual  read  or  write,  find
  that the existing WCBs do not map the desired VBN, they force a window
  turn.   The  request  will  be  turned   into   an   IO$_READPBLK   or
  IO$_WRITEPBLK  operation.   Blocks  declared  as  bad will likewise be
  mapped into these functions to be sent to the  XQP.   DISPATCHER  will
  forward  these  I/O  function codes directly to READ_WRITEVB.  MAP_VBN
  does the work of mapping the VBN.  Bad block processing is started  by
  marking the FCB as having bad blocks.


  16.1  VBN Mapping

  MAP_VBN makes the FCBs valid, if needed.   (READ_WRITEVB  obtains  the
  serialization  lock on the file.) For an incompletely mapped cathedral
  window, the file is simply re-mapped.  We walk down the FCB chain,  to
  find  the  volume containing the desired blocks.  Given fresh FCBs, we
  try once more to map given the current windows (MAP_WINDOW).  If  this
  fails, TURN_WINDOW is called.

  TURN_WINDOW contains the gruesome code to update  WCBs.   The  routine
  handles cases where the file was truncated or extended, where the WCBs
  describe VBNs before or after the desired area, etc.  The new  desired
  window  pointers  are  built in a buffer within the routine.  They are
  copied into the actual WCB  at  IPL$_SYNCH  to  synchronize  with  FDT
  routines trying to map other virtual requests.

  READ_WRITEVB checks to see if this is an operation upon an interesting
  (storage  system)  file.   If  so,  cached  buffers are killed.  For a
  cluster,  the  sequence  numbers  are  incremented,  invalidating  our
  buffers.   The  appropriate lock (allocation or serial) is obtained so
  that the sequence number in the value block is also updated for  other
  nodes  to  see.  If this is not a cluster, the buffers are purged from
  the cache outright.

  Once the block has been mapped, the IRP is  re-queued  to  the  driver
  (REQUEUE_REQ) for I/O.


                                   16-1


  WINDOW TURNING AND BAD BLOCK PROCESSING


  16.2  Bad Block Processing

  In the case of a badblock processing request, the FCB  is  marked  bad
  (MARKBAD_FCB)  and  SCAN_BADLOG is called to add the bad blocks to the
  BADLOG file (in secondary context).

  Setting FCB$V_BADBLK will cause DEACCESS to set the BADBLOCK  flag  in
  the  file  header.   Likewise,  INIT_FCB2  will set FCB$V_BADBLK if it
  finds on FH2$V_BADBLOCK.   Setting  FH2$V_BADBLOCK,  in  turn,  causes
  DELETE_FILE to send the file to the bad block scanner for deletion.

  The bad block scanner (SEND_BADSCAN) sends a message through  the  bad
  block  scanner  mailbox  (ACP$BADBLOCK_MBX, created by INIT_FCP during
  SYSINIT) specifying the UCB and FID of the file  to  be  deleted.   If
  this  succeeds,  a  request is made for a process (name BADBLOCK_SCAN,
  all privileges, UIC [1, 3]) to run BADBLOCK.EXE.

  BADBLOCK.EXE  is  generated  from  the  BADBLK  facility.   The   main
  processing routine, MAIN_BAD (BADBLK module GETREQ) reads each message
  from the bad block mailbox.  For each, it patches the UCB address in a
  CCB  it holds for the purpose to that of the erring file.  The routine
  SCAN (BADBLK module SCANFILE) scans down the file separating  the  bad
  from the good.

  SCAN tests each block of the file (user mode I/O inhibiting  retries).
  As  it  does,  it truncates the trailing blocks from the file.  If the
  block is found to be bad,  SCAN  uses  the  special  MARKBAD  truncate
  option.   Either  way,  all  blocks  are truncated from the file.  The
  empty file is deleted.

  The truncate option FIB$V_MARKBAD causes the  specified  blocks  (only
  the  last  cluster)  to  be  sent  to  DEALLOCATE_BAD.  (This requires
  SYSPRV.)

  DEALLOCATE_BAD, in secondary context, serializes on the BADBLOCK file.
  A  map pointer is added to the last header to map the bad blocks.  The
  EOF  mark  and  highwater  mark  are  set  to  include  these  blocks.
  SCAN_BADLOG  is called to remove the BADLOG entry for these blocks, if
  one exists.  The bad block scanner will also check the BADLOG file for
  any references to the file when it is done.


                                   16-2


                                CHAPTER 17

                           ACPCONTROL FUNCTIONS


  All ACP control functions find there way into ACPCONTROL.


  17.1  Quota Operations

  User invoked quota operations are  done  by  QUOTA_FILE_OP,  described
  elsewhere.   The  only  interesting  point  is  that the block lock is
  requested for an add quota function.  This is done because  DISPATCHER
  does  not request the block lock for ACP control functions, since some
  affect the block lock state.

  Enabling quota processing on a volume  makes  a  call  to  CONN_QFILE.
  Disabling quota processing is handled by QUOTA_FILE_OP, as are adding,
  examining, modifying and deleting quota file entries.


  17.2  REMAP

  The re-map function invokes REMAP_FILE under  the  file  serialization
  lock.


  17.3  LOCK Volume

  Lock volume takes the block lock  for  the  volume  (TAKE_BLOCK_LOCK).
  VCB$V_NOALLOC.


  17.4  UNLOCK Volume

  The unlock volume function clears VCB$V_NOALLOC, establishes the  free
  space  figure  cluster-wide  (refer  to the free space allocation lock
  block field).  and de-queues the block lock.


                                   17-1


  ACPCONTROL FUNCTIONS


  17.5  Force Mount Verification

  Mount verification can be requested by a  process  with  SYSPRV.   The
  routine patches on PHY_IO to do a IO$_SHADMV.


  17.6  Dismount

  The dismount operation flushes all caches and marks  the  SCB  as  the
  volume  being  dismounted.  The interesting aspect of this is that the
  SCB is possibly written to  (asynchronously)  by  mount  verification.
  The  SCB  I/O  must  be re-tried to make sure that a consistent SCB is
  read/written.


                                   17-2


                                CHAPTER 18

                            EVENT NOTIFICATION


  The file system outputs two sets of messages.  A privileged  user  can
  request  notification  of  interesting file system events.  The system
  itself requests notification of security relevant events.   These  two
  sets of events are reported as follows.


  18.1  NOTIFY_USER

  The un-documented command WATCH allows a suitably privileged  user  to
  ask  the file system to notify the user when significant events occur.
  The list of  significant  events  is  stored  as  bits  in  the  array
  PIO$GW_DFPROT  indexed  by the XQP event index.  Various places in the
  file system check their corresponding bit and call NOTIFY_USER to send
  the user a message.

  To be able to capture these messages, it is  necessary  to  perform  a
  normal  $PUT  on  them,  not  allowed  in kernel mode.  Since the file
  system can't jump out to exec mode, it  is  necessary  to  send  these
  messages  via  declaring  an AST in a higher mode.  Since RMS can't be
  called from exec mode AST level, it  is  necessary  to  send  them  to
  supervisor  mode.   So,  NOTIFY_USER  builds  the  message  in an area
  allocated in the CLI data area (CTL$AG_CLIDATA), passing  the  address
  as  an  AST  parameter  to  an  AST  routine in supervisor mode.  This
  routine is copied into an allocated area in the CLI  data  area.   The
  address  of  the  routine  is stored in XQP$_AST_ROUTINE.  The routine
  itself is NOTIFY_AST, found in XQPMSG.

  XQPMSG contains the descriptive messages for the various  file  system
  operations.   Each  is  described  by  a  descriptor.   Unfortunately,
  descriptors are not position independent, so it is necessary to fix up
  the   descriptors   when   the   XQP  is  mapped.   This  is  done  by
  FIXUP_MESSAGES, called from INIT_FCP.

  The NOTIFY_AST routine opens,  if  needed,  a  FAB/RAB  to  SYS$OUTPUT
  (variables  XQP$_IFI  and  XQP$_ISI).   The message is $PUT here.  The
  space used by the message is de-allocated in the CLI data area.


                                   18-1


  EVENT NOTIFICATION


  18.2  PERFORM_AUDIT

  When all file system activity for a request is done, PERFORM_AUDIT  is
  called  if  necessary.  During the course of the request, audit blocks
  were placed in AUDIT_ARGLIST (by CHECK_PROTECT).  These  requests  are
  passed  to  NSA$EVENT_AUDIT  one  at a time.  The reason why they were
  deferred until the request was processed is that for each audit entry,
  a  FID_TO_SPEC  translation  is done.  FID_TO_SPEC disturbs other file
  system operations so much (it releases the primary serialization lock)
  that it is best to do this last.

  The exception is that a WRITE_AUDIT call appears in MARK_DELETE, since
  the  file  will  not  exist  to be audited after the operation of that
  routine.


                                   18-2


                                CHAPTER 19

                    ERROR HANDLING, STATUS AND CLEANUP


  As a general rule, the file system  modules  do  not  clean  up  after
  themselves.   An  operation  performed in secondary context must clean
  itself up before returning to primary context, but the primary context
  need  not be cleaned up.  This is because the dispatcher will invoke a
  routine that will clean up everything before considering  the  request
  finished.  If the dispatcher notices that the operation completed with
  error  (USER_STATUS),  ERR_CLEANUP  is  invoked.   If  this  succeeds,
  CLEANUP  is called, otherwise ERR_CLEANUP is called again.  If CLEANUP
  fails, ERR_CLEANUP is tried again.  This is repeated for a very large,
  but not infinite, number of times before we give up.

  Errors can occur at various points in the  processing  of  a  request.
  Some  routines  return  error  status which are handled by the calling
  routine.  Some routines signal errors.  A fatal error is  signaled  in
  such  a way that the dispatcher knows to run ERR_CLEANUP.  ERR_CLEANUP
  knows how to cleanup secondary context.


  19.1  Error Handling

  A routine, when detecting an error, can do one of  three  things.   It
  can  return  the error as a return status.  Secondly, it can store the
  error  status  in  the  user  return  status  (USER_STATUS)  by  using
  ERR_STATUS.   (ERR_STATUS only stores the status value if the existing
  value is success or informational.) This is done for  errors  detected
  by  main  line  code  that are not fatal but that the user should see.
  Since  invoking  ERR_STATUS  writes  USER_STATUS   directly,   calling
  routines can't intercept the error, so only errors the user truly must
  see should be reported this way.

  The third way to report an error is to invoke  ERR_EXIT.   This  macro
  basically  signals  the condition value.  (It actually performs a CHMU
  of the argument, which translates into a signal in kernel  mode.   The
  macro  will perform a return, given what is left in R0, if the handler
  returns.) DISPATCHER establishes a  condition  handler  (MAIN_HANDLER)
  which copies the argument into USER_STATUS (again, only if USER_STATUS


                                   19-1


  ERROR HANDLING, STATUS AND CLEANUP


  does not already indicate an error), places USER_STATUS into the value
  that  will  be  restored  into  R0,  and  unwinds  to the routine that
  established the handler.   As  such,  the  mainline  call  to  an  XQP
  processing  routine  will  effectively  return  with  the status value
  passed to ERR_EXIT.  The processing routine will be aborted.   No  XQP
  routines handle the unwind condition.

  Various routines will establish their own handler to intercept  errors
  that they feel should not be fatal.

  ACL_BUILDACL has a handler that causes it to abort  itself,  returning
  the  error  status, but not to return the error status to USER_STATUS,
  and not to abort the entire XQP operation.

  Both the CREATE routines PROPAGATE_ATTR and its  subroutine  COPY_INFO
  have  a  condition  handler  that aborts the called routine, returning
  zero.  The actual error is ignored.  This allows the secondary context
  operation to not abort the mainstream of CREATE.

  READ_NEW_HEADER, called in CREATE_HEADER to  read  the  new  potential
  header,  has  a handler to trap disk errors.  Surface errors cause the
  value returned from the READ_BLOCK call  to  be  zero.   Other  errors
  cause a re-signalling of the error.

  The processing of a delayed truncation by DEACCESS has  a  handler  so
  that  the  de-accessing  user  doesn't  have  to see such errors.  The
  handler simply aborts the truncate operation.

  DELETE establishes a handler around the reading of the header so  that
  it  is  possible to delete a bad header.  The handler aborts the read,
  returning USER_STATUS to the value prior to the read.  Note  that  the
  USER_STATUS  before  the read is saved in SAVE_STATUS, which is an XQP
  impure area variable.  This is so since the handler can't  access  any
  local variables in DELETE, and the XQP cannot have any own variables.

  Flushing the special caches (FID, quota and extent) is  done  under  a
  handler  that  aborts  an  I/O (simply returning zero from the I/O) to
  ensure completion even against I/O errors.  (Flushing these caches  is
  not essential, but it might as well take its best shot.)

  Contiguous file extension has a condition handler enabled  around  the
  truncation  of  the  old blocks.  Since a good new copy exists at this
  time, we don't want the new file to be lost by aborting the  extension
  operation.   The  handler  aborts  the  truncate,  without  an  error.
  SHUFFLE_DIR does the same thing  when  truncating  the  old  directory
  during an extend.

  OPEN_FILE has a handler to clean up the  aborted  open.   The  handler
  always  re-signals  the  error.   However,  it  will examine the state
  around it to see what it should clean  up.   If  we  did  not  hold  a
  serialization  lock  on  the  file  (STSFLGS [STS_KEEP_LOCK]), then we
  clean up the FCBs in the usual way.  This is important since OPEN_FILE


                                   19-2


                                      ERROR HANDLING, STATUS AND CLEANUP


  is  always  called in secondary context, and we need to clean up after
  ourselves.

  READ_ATTRIB  has  a  condition   handler   to   ignore   errors   from
  MAKE_NAMEBLOCK.


  19.2  USER_STATUS

  As mentioned above, USER_STATUS is the repository for the status  from
  the  various routines.  Actually, USER_STATUS is a two longword vector
  that is returned to the user (IRP$L_MEDIA).  These two longwords  will
  form  the  IOSB  returned  to the user.  Various places set the second
  longword; it is saved and restored across certain operations.

  EXTEND sets this second longword to the size  extended.   EXTEND_INDEX
  purposely  zeros  this field since it knows that EXTEND would have set
  it, and the extension of the index file is  not  of  interest  to  the
  user.

  For a contiguous extend (ALLOC_BITMAP),  this  value  is  the  largest
  contiguous extent size found.

  For  a  spool  operation  (SEND_SYMBIONT),  this  value  is  the   job
  controller error status.

  For a file truncation (TRUNCATE), this value is the number  of  blocks
  that  had  to  be kept on the file (that is, the number less than what
  the user requested to truncate) such that  the  truncated  file  would
  have an integral number of clusters.

  READ_WRITEVB sets this value to the second  word  of  the  I/O  status
  block returned by the I/O.


  19.3  Status Flags

  The variable STSFLGS contains  various  status  bits  set  and  sensed
  throughout  the  XQP.   They  serve  a similar function to the cleanup
  flags, which indicate certain operations that must be done at the  end
  of an operation.  The status flags are global flags that allow special
  processing to be requested by a routine (admittedly  a  bad  practice)
  without having to pass extra arguments to the routine.


  19.3.1  STS_LEAVE_FILEHDR

  This flag is set for READ_HEADER.  When set, READ_HEADER will not  set
  the  value of the returned header into FILE_HEADER.  SEARCH_QUOTA will


                                   19-3


  ERROR HANDLING, STATUS AND CLEANUP


  set this if it needs to un-stale the quota file FCBs.  DIR_ACCESS will
  also.   They  do  this  since  these are secondary operations to start
  with, and the real file header is to be saved.   BUILD_EXT_FCBS,  when
  passed  the  optional  primary  FCB argument, will also set this flag.
  This is how its callers keep FILE_HEADER from being set in this way.


  19.3.2  STS_DISK_READ

  This flag is set by READ_BLOCK to indicate whether a request  required
  a  disk  read  or  not.   This  flag  is only used by READ_HEADER.  If
  READ_HEADER requests a file header, and it  was  in  the  cache,  then
  either  it was validated on its way in (previous READ_HEADER call), or
  it was internally generated, and clearly validated and  checksumed  on
  its  way  out.   The presence of this flag means that READ_HEADER need
  not validate the returned buffer.


  19.3.3  STS_HAD_LOCK and STS_KEEP_LOCK

  These flags are convenient for secondary context operations, where the
  secondary  context file might be the same as the primary context file.
  In such a case, we might already hold a serialization lock when we  go
  for  it.   SERIAL_FILE  sets  STS_HAD_LOCK  when  this  was  the case.
  OPEN_FILE will see this  flag,  and  set  STS_KEEP_LOCK,  for  use  in
  CLOSE_FILE.  CLOSE_FILE will see STS_KEEP_LOCK set and not release the
  serialization lock.  (It does, however, clear PRIM_LCKINDX.  This will
  keep  cleanup  from  trying  to release this lock when cleaning up the
  secondary operation.   Restoring  the  primary  context  will  restore
  PRIM_LCKINDX.)

  FID_TO_SPEC will key off this flag also, to  decide  when  to  release
  locks it finds on the way to the MFD.


  19.4  Cleanup

  The steps in normal cleanup are as follows.

  Leave secondary context if in such.  Secondary context is  responsible
  for  performing  its  own  normal  cleanup.  (ERR_CLEANUP will cleanup
  secondary context before leaving secondary context.)

  Flush the quota cache if so indicated.  (CACHEFLUSH  was  set  in  the
  cache  header  if  we  failed to get the quota cache lock, which would
  have been caused by someone write locking the  quota  file.)  For  any
  volume  marked  as  being dismounted or /NOCACHE, flush out the buffer
  caches.  Either way, write out  all  dirty  buffers.   (This  is  done
  storage  map  buffers  first, to hedge that the storage map is updated


                                   19-4


                                      ERROR HANDLING, STATUS AND CLEANUP


  before  the  file  headers.   This  needs  to  be  made  deterministic
  someday.)

  Invalidate windows if requested.

  Clean out the directory FCB.  As usual, the FCB is saved if there is a
  directory  index  block associated with it.  If the directory is write
  accessed, though, we kill any directory  buffers  and  invalidate  the
  directory index block.  Generally forget about the directory.

  Mark the primary FCB stale cluster-wide, if requested.  Purge the FCBs
  unless  they  should  be  saved  (accessed  or  directory  index block
  associated).


  19.5  Error Cleanup

  The purpose of error cleanup is to undo  various  things  done  within
  routines  that  were  not  undone  because  of errors occurring in the
  routines.  The steps in ERR_CLEANUP is basically  driven  off  various
  cleanup flags and variables set in the processing routines.


  19.6  Cleanup Flags and Actions

  The various  cleanup  flags  and  variables  and  their  meanings  and
  corresponding clean up actions are listed below.


  19.6.1  CLF_CLEANUP

  Set by ERR_CLEANUP to indicate that cleanup is in progress.   Copy  of
  this  bit  saved in context save area is set by SAVE_CONTEXT.  Cleared
  by RESTORE_CONTEXT.  Causes REMOVE to not flag a cleanup  re-enter  of
  the entry being removed.


  19.6.2  CLF_CLOSEFILE

  Set by OPEN_FILE once the file is opened.  (Prior to this,  a  handler
  in  OPEN_FILE will close the file.) Cleared by CLOSE_FILE.  Causes the
  internal file associated with CURRENT_WINDOW to be closed.


                                   19-5


  ERROR HANDLING, STATUS AND CLEANUP


  19.6.3  CLF_DEACCESS

  Set by DEACCESS to cause the de-access to occur.  Set  by  MAKE_ACCESS
  when  the  access  is  complete.   Causes  the  header associated with
  PRIM_LCKINDX to be de-accessed (MAKE_DEACCESS).


  19.6.4  CLF_DEACCQFILE

  Set by MAKE_QFCB once CURRENT_VCB[VCB$L_QUOTAFCB] is set.  Causes  the
  quota file to be de-accessed.


  19.6.5  CLF_DELFILE

  Set in CREATE to indicate that a header was created.  Set by  DEACCESS
  to request the deletion of a for which deferred deletion was requested
  for which we are the last accessor.  Set by SEND_SYMBIONT if  the  job
  controller  call  failed  (thereby  indicating that the job controller
  will not delete the file and so we must).  Causes the file  associated
  with  CURRENT_FIB  to be deleted.  The directory index block is killed
  likewise.


  19.6.6  CLF_DELWINDOW

  Set by ACCESS and CREATE in case its access  attempt  fails.   Set  by
  DEACCESS  to  cause  the de-access to occur.  Causes the window blocks
  linked from CURRENT_WINDOW to be de-allocated.


  19.6.7  CLF_DIRECTORY

  Set by GET_FIB to indicate that  a  directory  operation  (lookup)  is
  necessary.  Has no associated cleanup.


  19.6.8  CLF_DOSPOOL

  Set by DEACCESS.  Causes the header associated with PRIM_LCKINDX to be
  sent to the symbiont.


                                   19-6


                                      ERROR HANDLING, STATUS AND CLEANUP


  19.6.9  CLF_FIXFCB

  Set by EXTEND_CONTIG when it is decided that the file  must  truly  be
  extended.   Set  by  EXTEND  once  an  extend has started.  Cleared by
  DEACCESS after performing a truncation requested at de-access.  Set by
  WRITE_ATTRIB.  Set by TRUNC.  Causes the FCB chain to be re-built.


  19.6.10  CLF_FIXLINK, PREV_LINK, PREV_INAME

  Set in CREATE when the backlink/name in the header  is  changed  (done
  when  the  previous  link  is  zero).   Set  in  MARK_DELETE  when the
  corresponding directory entry is being removed,  thereby  zeroing  the
  backlink.   Causes the backlink and file name in the header associated
  with PRIM_LCKINDX to be reset.


  19.6.11  CLF_FLUSHFID

  Causes the FID cache to be flushed.  (Currently not used.)


  19.6.12  CLF_GRPOWNER

  Set by GET_REQUEST.  Indicates process has effective  GROUP  privilege
  to volume.  Has no associated cleanup.


  19.6.13  CLF_HDRNOTCHG

  Set in CREATE once a header is  created  and  recorded  (FILE_HEADER).
  Cleared  when  the  quota  for it has been charged.  It is also set by
  PROPAGATE_ATTR in CREATE to avoid bungling quotas when file  owner  is
  different  from  process  UIC.  Causes WRITE_ATTRIB, when changing the
  owner, to not include the  blocks  taken  by  the  file  headers  when
  changing the charging of the blocks.  Causes DELETE_FILE to not return
  the blocks taken by the headers to the quota account.


  19.6.14  CLF_INCOMPLETE

  Set by TURN_WINDOW in  various  cases.   Causes  (in  REMAP_FILE)  the
  CURRENT_WINDOW to be marked incomplete.


                                   19-7


  ERROR HANDLING, STATUS AND CLEANUP


  19.6.15  CLF_INVWINDOW

  Set by TRUNC.  Causes the windows associated with  PRIMARY_FCB  to  be
  invalidated.


  19.6.16  CLF_MARKFCBSTALE

  Set by EXTEND.  Set by  WRITE_ATTRIB  for  the  following  attributes:
  protected,  UIC,  access class, file protection, ACL.  Causes the FCBs
  associated with PRIMARY_FCB to be marked stale cluster wide.


  19.6.17  CLF_NOBUILD

  Set by UPDATE_FCB when it performs its FILL_FCB.  Set by EXTEND_HEADER
  when  it performs its INIT_FCB2.  Set by SHUFFLE_DIR before generating
  new FCBs for the shuffled directory.  Causes INIT_FCB2 to not build an
  ACL segment.  Cleared when INIT_FCB2 completes the FCB update.


  19.6.18  CLF_NOTCHARGED

  Set by EXTEND once an extend has started but the blocks  are  charged.
  Cleared once the blocks are charged.  Causes DELETE_FILE to not return
  the blocks taken by a file to the quota account.  Checked by TRUNC  to
  see if the blocks should be uncharged.


  19.6.19  CLF_PFCB_REF_UP

  Set by FID_TO_SPEC to indicate that it has incremented  the  reference
  count  for  PRIMARY_FCB  to  hold  the FCB while it chases back links.
  Causes the reference count to be decremented.


  19.6.20  CLF_REENTER, PREV_NAME, PREV_VERSION

  Cleared in MARK_DELETE once deletion is committed.  Set in  REMOVE  if
  CLF_CLEANUP  is  not  set.  Causes a MAKE_ENTRY of the old name entry.
  With CLF_SUPERSEDE on, this causes the re-enter of SUPER_FID.


                                   19-8


                                      ERROR HANDLING, STATUS AND CLEANUP


  19.6.21  CLF_REMAP

  Causes the file to be re-mapped.  (Not currently used.)


  19.6.22  CLF_REMOVE

  Set by ENTER once the entry has been recorded.  Causes the removal  of
  the entry indicated by the directory context area.


  19.6.23  CLF_SPOOLFILE

  Set  by  GET_REQUEST.   Indicates  a  spoolfile  operation.   Has   no
  associated cleanup.


  19.6.24  CLF_SUPERSEDE

  Set by ENTER when the enter causes an entry to be superseded.   CREATE
  will  check  this  flag  and  delete  a  file removed during its enter
  operation.  Causes the directory  record  to  be  fixed  up  with  the
  SUPER_FID file ID superseded.


  19.6.25  CLF_SYSPRV

  Set by GET_REQUEST.  Indicates that  the  user  has  effective  SYSPRV
  privilege on the volume.  Has no associated cleanup.


  19.6.26  CLF_TRUNCATE

  Set by EXTEND once an extend has started.  Causes the CURRENT_FIB file
  to be truncated back.


  19.6.27  CLF_VOLOWNER

  Set by GET_REQUEST.  Indicates that the process has effective owner of
  the volume.  Has no associated cleanup.


                                   19-9


  ERROR HANDLING, STATUS AND CLEANUP


  19.6.28  CLF_ZCHANNEL

  Set by ACCESS and CREATE in case its access  attempt  fails.   Set  by
  DEACCESS  to  cause the de-access to occur.  Causes the window pointer
  being returned to the user to be zeroed, and the user to  be  credited
  for closing a file (JIB$W_FILCNT).


  19.6.29  NEW_FID, NEW_FID_RVN

  Set by CREATE_HEADER to indicate a  FID  that  was  created.   Set  by
  DELETE_FILE  to  request  deletion  of  the FID.  NEW_FID is zeroed in
  CREATE once the FID is recorded (FILE_HEADER) and in DELETE_FILE  once
  the  FID  is  deleted.   Cleared  in  EXTEND_HEADER  when  the  header
  extension is complete.  Causes the specified FID to be deleted.


  19.6.30  UNREC_COUNT, UNREC_RVN, UNREC_LBN

  Set by SELECT_VOLUME when asked to allocate a certain number of blocks
  on  some  volume.   These  blocks will be picked up by EXTEND.  Set by
  EXTEND_CONTIG when a new extent into which to copy the file  has  been
  allocated  but  not  recorded  in  the  map pointers.  Likewise set by
  SHUFFLE_DIR.  Cleared once the copy  has  been  done  and  the  header
  updated.   Set  in  EXTEND  during a header extension to record blocks
  allocated that have yet to be added to the header since we must extend
  it.   Cleared  in  EXTEND once the blocks are returned if we failed to
  add them or when we succeed to add them to  the  header.   Causes  the
  specified blocks to be returned to the storage map.


                                  19-10


                                CHAPTER 20

                             XQP STORAGE AREA


  The breakdown, and usage, of the XQP storage area follows.


  20.1  IO_CCB (non-impure)

  REF BBLOCK:  CCB for IO_CHANNEL, created by INIT_FCP.  IO_CCB$L_UCB is
  set to CURRENT_UCB by GET_REQUEST and to the new UCB by SWITCH_VOLUME.
  IO_CCB$L_UCB is used to refer to the desired UCB by WRITE_BLOCK (since
  buffer   writes   due   to  LRU  replacement  may  be  to  other  than
  CURRENT_UCB).


  20.2  IO_CHANNEL (non-impure)

  LONG:   channel  assigned  by  INIT_FCP.   Used  for   forcing   mount
  verification  on  shadow sets, issuing an unload/available function at
  volume dismount, erasing blocks of the index file when extending  EOF,
  reading  and  writing random blocks (READ_BLOCK, WRITE_BLOCK), erasing
  blocks for highwater and erase on return processing.


  20.3  BLOCK_LOCKID (non-impure)

  LONG:  lock id of activity blocking lock held by this  process  (refer
  to activity blocking)


  20.4  USER_STATUS

  VECTOR [2]:  XQP status  to  be  returned  to  user  (refer  to  error
  processing)


                                   20-1


  XQP STORAGE AREA


  20.5  IO_STATUS

  VECTOR [2]:  status block for XQP I/O


  20.6  IO_PACKET

  REF BBLOCK:  address of current I/O request packet, set in DISPATCHER


  20.7  CURRENT_UCB

  REF BBLOCK:  address of UCB of current request, set in GET_REQUEST and
  SWITCH_VOLUME


  20.8  CURRENT_VCB

  REF BBLOCK:  address of VCB of current request, set in GET_REQUEST and
  SWITCH_VOLUME


  20.9  CURRENT_RVT

  REF BBLOCK:  RVT of current volume set, or UCB, set in GET_REQUEST


  20.10  CURRENT_RVN

  LONG:  RVN of current volume, set in  GET_REQUEST  and  SWITCH_VOLUME.
  This value drives APPLY_RVN.


  20.11  SAVE_VC_FLAGS

  WORD:  save volume context flags.   These  flag  bits  belong  to  the
  allocation  lock  value  block.   They  contain  the quota file buffer
  sequence number in bits 1 to 15.


  20.12  STSFLGS

  BITVECTOR [8]:  various internal status flags (refer to status flags)


                                   20-2


                                                        XQP STORAGE AREA


  20.13  BLOCK_CHECK

  BYTE:  make operation blocking check (refer to basic request flow)


  20.14  NEW_FID

  LONG:  file number of unrecorded file ID (refer to cleanup processing)


  20.15  NEW_FID_RVN

  LONG:  RVN of NEW_FID (refer to cleanup processing)


  20.16  HEADER_LBN

  LONG:  LBN of last  file  header  read  (CREATE_HEADER,  READ_HEADER).
  This  value  is  placed  into FCB$L_HDLBN by FILL_FCB.  The setting of
  this value by READ_HEADER is another reason why headers often need  to
  be  re-read  after various operations, especially secondary operations
  such as badblock processing.


  20.17  BITMAP_VBN

  LONG:  VBN of current storage map block.  This  value  is  used  along
  with  BITMAP_RVN  to  determine  the  validity of BITMAP_BUFFER.  This
  value is cleared when the allocation  lock  is  released,  since  we'd
  better not have a bitmap buffer active at this time.  An INVALIDATE of
  the BITMAP_BUFFER will also clear this.


  20.18  BITMAP_RVN

  LONG:  RVN of current storage map block (BITMAP_BUFFER).


  20.19  BITMAP_BUFFER

  REF BBLOCK:  address of current storage map block.  This value is used
  as  an  optimization  in  ALLOC_BLOCKS to decide if it needs to read a
  storage map block.  The validity of  BITMAP_BUFFER  is  decided  by  a
  non-zero value in BITMAP_VBN.


                                   20-3


  XQP STORAGE AREA


  20.20  SAVE_STATUS

  LONG:  saved status during CREATE's attribute  copy,  READ_IDX_HEADER.
  In  DELETE,  it  is  used to restore the old USER_STATUS if the delete
  fails, so as to ignore the delete of a bad header.


  20.21  PRIVS_USED

  BBLOCK [4]:  Privileges used  to  gain  access.   This  bit  array  is
  maintained  by  CHECK_PROTECT.   This  value can be returned as a read
  attribute.


  20.22  ACB_ADDR

  REF BBLOCK:  address of ACB for cross process ASTs, set in  READ_BLOCK
  to the CDRP portion of the IO_PACKET.


  20.23  BFR_LIST

  BLOCKVECTOR [4,8,BYTE]:  listheads for in-process  buffers  (refer  to
  buffer management)


  20.24  BFR_CREDITS

  VECTOR [4,WORD]:  buffers credited to this process  (refer  to  buffer
  management)


  20.25  BFRS_USED

  VECTOR  [4,WORD]:   buffers  actually  in-process  (refer  to   buffer
  management)


  20.26  CACHE_HDR

  REF   BBLOCK:    Address   of   buffer   cache    header,    set    by
  GET_REQD_BFR_CREDITS.


                                   20-4


                                                        XQP STORAGE AREA


  20.27  CLEANUP_FLAGS (save context area)

  BITVECTOR [32]:  cleanup action flags (refer to cleanup processing)


  20.28  FILE_HEADER (save context area)

  REF  BBLOCK:   address  of  current  file  header,  set   by   CREATE,
  CREATE_HEADER.  Mainly set by READ_HEADER, unless STS_NOUPDHDR is set.
  EXTEND_HEADER sets this to  the  new  extension  header.   DELETE_FILE
  zeros FILE_HEADER when it writes out the deleted header.


  20.29  PRIMARY_FCB (save context area)

  REF BBLOCK:  address of primary file FCB, set  by  GET_REQUEST.   Also
  set  by  ACCESS,  CREATE,  MARK_DELETE,  EXTEND_CONTIG,  EXTEND_INDEX,
  OPEN_FILE (cleared by CLOSE_FILE),  MODIFY,  DEACC_QFILE,  CONN_QFILE,
  SHUFFLE_DIR.   Cleared  by  MARK_DELETE  when we are done deleting the
  file.  Cleared by GET_FIB, ACCESS, MODIFY when the FID in  the  user's
  FIB  does  not match that of the FCB associated with the channel (okay
  for access if only a find was desired).


  20.30  CURRENT_WINDOW (save context area)

  REF BBLOCK:  address of file window, set by GET_REQUEST.  Also set  by
  ACCESS,  CREATE,  EXTEND_INDEX,  OPEN_FILE  (cleared  by  CLOSE_FILE).
  Cleared by GET_FIB, ACCESS, DELETE, MODIFY when the FID in the  user's
  FIB does not match that of the FCB associated with the channel.


  20.31  CURRENT_FIB (save context area)

  REF BBLOCK:  pointer to FIB currently in  use,  set  to  LOCAL_FIB  by
  GET_FIB and GET_REQUEST.  Set to SECOND_FIB by SAVE_CONTEXT (LOCAL_FIB
  is not in the save context area).


  20.32  CURR_LCKINDX (save context area)

  LONG:  Current file header  lock  index  (refer  to  serialization  of
  activity).


                                   20-5


  XQP STORAGE AREA


  20.33  PRIM_LCKINDX (save context area)

  LONG:  Primary file  lock  basis  index  (refer  to  serialization  of
  activity).


  20.34  LOC_RVN (save context area)

  LONG:  RVN specified by placement data, set by GET_LOC.


  20.35  LOC_LBN (save context area)

  LONG:  LBN specified by placement data, set by GET_LOC.


  20.36  UNREC_LBN (save context area)

  LONG:  start LBN of unrecorded blocks (refer to cleanup processing).


  20.37  UNREC_COUNT (save context area)

  LONG:  count of unrecorded blocks (refer to cleanup processing).


  20.38  UNREC_RVN (save context area)

  LONG:  RVN containing unrecorded blocks (refer to cleanup processing).


  20.39  PREV_LINK (save context area)

  BBLOCK [FID$C_LENGTH]:  old  back  link  of  file  (refer  to  cleanup
  processing).


  20.40  CONTEXT_SAVE

  VECTOR [CONTEXT_SIZE, BYTE]:  area to save primary context


                                   20-6


                                                        XQP STORAGE AREA


  20.41  LB_LOCKID

  VECTOR  [LB_NUM]:   serial  lock  ids  (refer  to   serialization   of
  activity).


  20.42  LB_BASIS

  VECTOR  [LB_NUM]:   lock  name  bases  (refer  to   serialization   of
  activity).


  20.43  LB_HDRSEQ

  VECTOR [LB_NUM]:  file header cache sequence numbers (refer to  buffer
  management)


  20.44  LB_DATASEQ

  VECTOR [LB_NUM]:  file data block  cache  sequence  number  (refer  to
  buffer management)


  20.45  LB_FILESIZE

  VECTOR [LB_NUM]:  value block file size (refer to buffer management)


  20.46  DIR_FCB

  REF BBLOCK:  FCB of directory file, set  in  DIR_ACCESS.   Cleared  in
  DELETE if we are deleting the directory itself.


  20.47  DIR_LCKINDX

  LONG:  Directory lock basis index (refer to serialization of activity)


  20.48  DIR_RECORD

  LONG:  record number  of  found  directory  entry  within  the  block.
  Maintained  by  DIR_SCAN  and FIND.  Zeroed before an ENTER operation.
  DIR_RECORD + 1 becomes the low order 6 bits of the wild  card  context


                                   20-7


  XQP STORAGE AREA


  (FIB$L_WCC) returned to the user.


  20.49  DIR_CONTEXT

  BBLOCK  [DCX_LENGTH]:   current  directory  context.   The   directory
  context  is  saved  within  ENTER  when  it is necessary to do another
  DIR_SCAN,  to  find  the  lowest  entry  to  remove.    Restored   (by
  RESTORE_DIR) when a directory operation is to be done at cleanup time.


  20.50  DIR_VBN (directory context)

  LONG:  VBN of DIR_BUFFER.  DIR_VBN - 1 forms the high 10 bits  of  the
  wild card context (FIB$L_WCC) returned to the user.


  20.51  DIR_BUFFER (directory context)

  REF BBLOCK:  pointer to current directory block.


  20.52  DIR_ENTRY (directory context)

  REF BBLOCK:  pointer to current directory  entry.   A  non-zero  value
  indicates the presence of a directory entry/block/version.


  20.53  DIR_VERSION (directory context)

  REF BBLOCK:  pointer to current directory version entry.


  20.54  DIR_END (directory context)

  REF BBLOCK:  pointer to end of directory entries


  20.55  DIR_PRED (directory context)

  REF BBLOCK:  pointer to record preceding record found


                                   20-8


                                                        XQP STORAGE AREA


  20.56  VERSION_LIMIT (directory context)

  WORD:  version limit of current entry


  20.57  VERSION_COUNT (directory context)

  WORD:  number of versions found


  20.58  LAST_ENTRY (directory context)

  VECTOR [,BYTE]:  name string of last record in previous block (counted
  string)


  20.59  OLD_VERSION_FID

  BBLOCK [FID$C_LENGTH]:  Old version's FID, set by DIR_SCAN


  20.60  PREV_VERSION

  LONG:  version number of previous directory entry, used  for  re-enter
  or un-supersede cleanup.


  20.61  PREV_NAME

  VECTOR [FILENAME_LENGTH+1, BYTE]:  name  of  previous  entry  (counted
  string) used for re-enter cleanup.


  20.62  PREV_INAME

  VECTOR [FILENAME_LENGTH+6, BYTE]:  previous internal file  name  (from
  file header) used for backlink/name cleanup (rename function).


  20.63  SUPER_FID

  BBLOCK [FID$C_LENGTH]:  file ID of  superseded  file.   Re-entered  by
  ERR_CLEANUP if necessary.


                                   20-9


  XQP STORAGE AREA


  20.64  LOCAL_FIB

  BBLOCK  [FIB$C_LENGTH]:   primary   FIB   of   this   operation   (see
  CURRENT_FIB)


  20.65  SECOND_FIB

  BBLOCK  [FIB$C_LENGTH]:   FIB  for  secondary  file   operation   (see
  CURRENT_FIB)


  20.66  LOCAL_ARB

  BBLOCK [ARB$C_HEADER]:  local copy of caller's ARB


  20.67  QUOTA_RECORD

  LONG:  record number  of  quota  file  entry,  returned  as  wild-card
  context to user.


  20.68  FREE_QUOTA

  LONG:  record number of free quota file entry


  20.69  REAL_Q_REC

  REF BBLOCK:  buffer address of quota record read


  20.70  QUOTA_INDEX

  LONG:  cache index of cache entry found


  20.71  DUMMY_REC

  BBLOCK  [DQF$C_LENGTH]:   dummy  quota  record  for  cache   contents.
  Special  cased  in  WRITE_QUOTA  to mean that the quota record pointer
  does not point into a cache buffer.


                                  20-10


                                                        XQP STORAGE AREA


  20.72  AUDIT_COUNT

  LONG:  number of argument lists in AUDIT_ARGLIST


  20.73  MATCHING_ACE (non-impure)

  BBLOCK [ATR$S_READACL]:  Matching ACE storage, set by CHECK_PROTECT to
  the   ACE   upon  which  the  access  check  matched,  returnable  via
  READ_ATTRIB.


  20.74  FILE_SPEC_LEN (non-impure)

  VECTOR [1, WORD]:  current length of FULL_FILE_SPEC


  20.75  FULL_FILE_SPEC (non-impure)

  VECTOR [1022, BYTE]:  storage area to hold output of FID_TO_SPEC, used
  by WRITE_AUDIT and READ_ATTRIB.


  20.76  PMS Metering Cells (non-impure)

  LONG:  used to record total disk reads, total disk writes, total cache
  reads,  number  of  reads/writes/cache  reads/CPU/page  faults for the
  function and sub-function


  20.77  AUDIT_ARGLIST (non-impure)

  BBLOCK  [AUDIT_LENGTH*MAX_AUDIT_COUNT]:   used  to  accumulate   audit
  records


                                  20-11


                                CHAPTER 21

                               ROUTINE LIST


  ACCESS, facility F11X module ACCESS
        calling sequence:
  main driver for access function

  ACL_ACLLENGTH, facility F11X module ACLSUBR
        calling sequence:  (ACL_QUEUE_HEAD, ACL_CONTEXT, COUNT, LENGTH)
  determine total ACL length

  ACL_ADDENTRY, facility F11X module ACLSUBR
        calling   sequence:    (ACL_QUEUE_HEAD,   ACL_CONTEXT,   LENGTH,
              ACE_BUFFER)
  add ACE to ACL

  ACL_BUILDACL, facility F11X module ACLCNTRL
        calling sequence:  (FIRST_FCB)
  copy the memory ACL to the file header(s)

  ACL_COPYACL, facility F11X module ACLCNTRL
        calling sequence:  (OLD_FILE_FCB, NEW_FILE_FCB, OPTION)
  copy ACEs from one file to another

  ACL_DELENTRY, facility F11X module ACLSUBR
        calling sequence:  (ACL_QUEUE_HEAD, ACL_CONTEXT, COUNT, ACE)
  delete ACE from ACL

  ACL_DELETEACL, facility F11X module ACLSUBR
        calling sequence:  (ACL_QUEUE_HEAD, ACL_CONTEXT)
  delete whole ACL

  ACL_DISPATCH, facility F11X module ACLCNTRL
        calling sequence:  (CODE, ADDRESS, COUNT, ACE)
  dispatch on ACL request to ACL utilities

  ACL_FINDENTRY, facility F11X module ACLSUBR
        calling sequence:   (ACL_QUEUE_HEAD,  ACL_CONTEXT,  COUNT,  ACE,
              INTERNAL)
  find ACE


                                   21-1


  ROUTINE LIST


  ACL_FINDTYPE, facility F11X module ACLSUBR
        calling sequence:   (ACL_QUEUE_HEAD,  ACL_CONTEXT,  COUNT,  ACE,
              INTERNAL)
  find specific type of ACE

  ACL_INIT_QUEUE, facility F11X module ACLSUBR
        calling sequence:  (ORB_ADDRESS)
  initialize ACL as a mutex protected list

  ACL_LOCATEACE, facility F11X module ACLSUBR
        calling  sequence:   (ACL_QUEUE_HEAD,  ACE_INDEX,   ACL_POINTER,
              ACL_SPLIT)
  locate ACE by context

  ACL_MODENTRY, facility F11X module ACLSUBR
        calling sequence:  (ACL_QUEUE_HEAD, ACL_CONTEXT, COUNT, ACE)
  modify ACE

  ACL_READACE, facility F11X module ACLSUBR
        calling sequence:  (ACL_QUEUE_HEAD, ACL_CONTEXT, COUNT, ACE)
  read ACE

  ACL_READACL, facility F11X module ACLSUBR
        calling   sequence:    (ACL_QUEUE_HEAD,   ACL_CONTEXT,   LENGTH,
              ACE_BUFFER)
  read as much of ACL as possible

  ACPCONTROL, facility F11X module ACPCNTRL
        calling sequence:
  main dispatch for ACP control functions

  ALLOCATE, facility F11X module ALLOCB
        calling sequence:
  allocate unpaged memory

  ALLOCATION_LOCK, facility F11X module LOCKERS
        calling sequence:  NOVALUE
  acquire index file/storage map allocation lock

  ALLOCATION_UNLOCK, facility F11X module LOCKERS
        calling sequence:  NOVALUE
  release allocation lock

  ALLOC_BLOCKS, facility F11X module SMALOC
        calling sequence:  (FIB, BLOCKS_NEEDED, START_LBN, BLOCKS_ALLOC)
  allocate a contiguous set of blocks

  ALLOC_PAGED, facility F11X module ALLOCB
        calling sequence:
  allocate paged memory

  ARBITRATE_ACCESS, facility F11X module LOCKERS


                                   21-2


                                                            ROUTINE LIST


        calling sequence:  (ACCTL, FCB)
  check, obtain access lock

  BLOCK_WAIT, facility F11X module LOCKERS
        calling sequence:  NOVALUE
  wait for volume blocking lock to be released

  BUILD_EXT_FCBS, facility F11X module EXTFCB
        calling sequence:  (PRIMHDR, PFCB) :  NOVALUE
  build the extension FCB chain

  CACHE_LOCK, facility F11X module LOCKERS
        calling sequence:  (LOCK_BASIS, LOCK_ID, MODE)
  acquire cache lock, possibly flushing caches on other nodes

  CACHE_SERVER, facility F11X module FILESERV
        calling sequence:
  run  in  cache  server  process  to  call  ACPCONTROL  functions  when
  requested

  CHANGE_OWNER, facility F11X module RWATTR
        calling sequence:  (UIC, ORG_FCB, ORG_HEADER)
  change owner fields in header chain, change quota

  CHARGE_QUOTA, facility F11X module CHARGEQ
        calling sequence:  (UIC, BLOCK_COUNT, FLAGS) :  NOVALUE
  charge/credit quota to user

  CHECKSUM, facility F11X module CHKSUM
        calling sequence:
  checksum file header

  CHECK_DISMOUNT, facility F11X module CHKDMO
        calling sequence:  NOVALUE
  check for device now idle, dismount if requested and idle

  CHECK_HEADER2, facility F11X module CHKHD2
        calling sequence:  (HEADER, FILE_ID, HEADER_STATUS)
  validate file header

  CHECK_PROTECT, facility F11X module CHKPRO
        calling sequence:  (ACCESS,  HEADER,  FCB,  ACMODE,  ALT_ACCESS,
              REQUIRED)
  check user's right to access a file

  CLEANUP, facility F11X module CLENUP
        calling sequence:
  cleanup after an operation

  CLEAN_QUO_CACHE, facility F11X module CHARGEQ
        calling sequence:  (J, Q_RECORD) :  NOVALUE
  mark quota record dirty, clean up cache entry


                                   21-3


  ROUTINE LIST


  CLOSE_FILE, facility F11X module FILUTL
        calling sequence:  (WINDOW) :  NOVALUE
  close internally opened file

  CONN_QFILE, facility F11X module QUOTAUTIL
        calling sequence:  (ABD, FIB) :  NOVALUE
  open and connect quota file

  CONTINUE_THREAD, facility F11X module DISPATCH
        calling sequence:
  AST routine to continue operation upon event completion

  CONV_ACCLOCK, facility F11X module LOCKERS
        calling sequence:  (LCKMODEARG, FCBARG)
  convert the access lock for an FCB to exclude an accessor

  COPY_NAME, facility F11X module CPYNAM
        calling sequence:  (ABD)
  copy name from buffer descriptor to result string

  CREATE, facility F11X module CREATE
        calling sequence:
  main driver for the create function

  CREATE_BLOCK, facility F11X module RDBLOK
        calling sequence:  (LBN, COUNT, TYPE)
  create a zeroed cache block (do not read the block)

  CREATE_FCB, facility F11X module CREFCB
        calling sequence:  (HEADER, PRIMFCB)
  allocate, initialize FCB, add to VCB

  CREATE_HEADER, facility F11X module CREHDR
        calling sequence:  (FILE_ID)
  find a new file id

  CREATE_WINDOW, facility F11X module CREWIN
        calling sequence:  (ACCTL, SIZE, HEADER, PID, FCB)
  create a window

  DALLOC_PAGED, facility F11X module ALLOCB
        calling sequence:
  de-allocate paged memory

  DEACCESS, facility F11X module DEACCS
        calling sequence:
  main driver for the de-access function

  DEACC_QFILE, facility F11X module QUOTAUTIL
        calling sequence:
  de-access quota file, remove cache


                                   21-4


                                                            ROUTINE LIST


  DEALLOCATE, facility F11X module ALLOCB
        calling sequence:
  de-allocate unpaged memory

  DEALLOCATE_BAD, facility F11X module DELBAD
        calling  sequence:   (FIB,  FILE_HDR,  POINTER,  LAST_COUNT)   :
              NOVALUE
  remove blocks from file into BADBLOCK file

  DELETE, facility F11X module DELETE
        calling sequence:
  main driver for delete function

  DELETE_FID, facility F11X module DELFIL
        calling sequence:  (FILENUM) :  NOVALUE
  release the file id to FID cache/index file map, possibly flush cache

  DELETE_FILE, facility F11X module DELFIL
        calling sequence:  (FIB, FILEHEADER) :  NOVALUE
  delete contents of file and header

  DEL_EXTFCB, facility F11X module CLENUP
        calling sequence:  (START_FCB)
  delete extension FCBs, decrement VCB transaction counts

  DEQ_LOCK, facility F11X module LOCKERS
        calling sequence:  (LOCK_ID) :  NOVALUE
  de-queue random lock

  DIR_ACCESS, facility F11X module DIRACC
        calling sequence:  (FIB, WRITE) :  NOVALUE
  access a directory

  DIR_SCAN, facility F11X module DIRSCN
        calling sequence:  (NAME_DESC, FILE_ID, START_BLOCK,  START_REC,
              START_VER, START_PRED, REC_COUNT)
  look up a name/FID in a directory

  DISPATCH, facility F11X module DISPATCH
        calling sequence:
  dispatch from XQP queue

  DISPATCHER, facility F11X module DISPAT
        calling sequence:  NOVALUE
  main request dispatching

  ENTER, facility F11X module ENTER
        calling sequence:  (ABD, FIB, RESULT_LENGTH, RESULT) :  NOVALUE
  main driver for enter function

  ERASE_BLOCKS, facility F11X module ERASE
        calling sequence:  (START_LBN, BLOCK_COUNT, CHANNEL)


                                   21-5


  ROUTINE LIST


  do logical I/O to erase contiguous blocks of file

  ERR_CLEANUP, facility F11X module CLENUP
        calling sequence:
  cleanup after an aborted operation

  EXTEND, facility F11X module EXTEND
        calling sequence:  (USER_FIB, FILEHEADER) :  NOVALUE
  extend a file, possibly also header

  EXTEND_CONTIG, facility F11X module EXTCONTIG
        calling sequence:  (FIB, FCB, SIZE)
  extend contiguous file (by copying blocks)

  EXTEND_HEADER, facility F11X module EXTHDR
        calling   sequence:    (FIB,   OLD_HEADER,   FCB,    NEW_VOLUME,
              BLOCKS_NEEDED)
  create an extension header

  EXTEND_INDEX, facility F11X module EXTIDX
        calling sequence:  NOVALUE
  extend the index file

  FID_TO_SPEC, facility F11X module RWATTR
        calling sequence:  (HEADER) :  NOVALUE
  translate FID to filespec via backlinks

  FILE_SIZE, facility F11X module FILESIZE
        calling sequence:  (HEADER)
  find file size from a given header

  FILL_FCB, facility F11X module INIFC2
        calling sequence:  (FCBARG, HDRARG, PRIMFCB) :  NOVALUE
  fill in FCB from file header

  FIND, facility F11X module FIND
        calling sequence:  (ABD, FIB, FIND_MODE, RESLEN_ARG, RESULT_ARG)
              :  NOVALUE
  locate, operate upon directory entry

  FINISH_REQUEST, facility F11X module DISPATCH
        calling sequence:
  lower volume activity count, possibly release block lock

  FIXUP_MESSAGES, facility F11X module XQPMSG
        calling sequence:
  fix up message descriptors when XQP mapped

  FLUSH_QUO_CACHE, facility F11X module QUOTAUTIL
        calling sequence:  NOVALUE
  flush dirty entries to quota file, release cache


                                   21-6


                                                            ROUTINE LIST


  FMG$MATCH_NAME, facility F11X module MATCHNAME
        calling sequence:
  wildcard matching

  GET_FIB, facility F11X module GETFIB
        calling sequence:  (ABD)
  copy user FIB, set default values

  GET_LOC, facility F11X module GETLOC
        calling sequence:  (FIB, LOCRVN, LOCLBN) :  NOVALUE
  find desired LBN/RVN for file placement

  GET_LOC_ATTR, facility F11X module GTLCAT
        calling sequence:  (ABD, FIB) :  NOVALUE
  convert compatibility mode placement data into FIB format

  GET_MAP_POINTER, facility F11X module GETPTR
        calling sequence:  NOVALUE
  decode file map pointer

  GET_QUOTA_LOCK, facility F11X module CHARGEQ
        calling sequence:  (J, MODE) :  NOVALUE
  lock quota cache entry (cluster only)

  GET_REQD_BFR_CREDITS, facility F11X module RDBLOK
        calling sequence:  NOVALUE
  reserve the minimum required cache buffers

  GET_REQUEST, facility F11X module GETREQ
        calling sequence:
  get request from XQP queue

  GET_TIME, facility F11X module GETTIM
        calling sequence:  (BUFFER, TIME) :  NOVALUE
  ODS-1 conversion

  INITXQP, facility F11X module DISPATCH
        calling sequence:
  initialize XQP

  INIT_FCB2, facility F11X module INIFC2
        calling sequence:  (FCBARG, HEADER, PRIMFCB) :  NOVALUE
  fill in FCB and filesize

  INIT_FCP, facility F11X module INIFCP
        calling sequence:
  create impure storage area

  INIT_FID_CACHE, facility F11X module CREHDR
        calling sequence:  (CACHE) :  NOVALUE
  set FID cache valid


                                   21-7


  ROUTINE LIST


  INVALIDATE, facility F11X module RDBLOK
        calling sequence:  (BUFFER) :  NOVALUE
  invalidate a cache buffer's contents

  IOC$BUFPOST, facility SYS module IOCIOPOST
        calling sequence:
  post completion of buffered I/O

  IOC$DALLOC_DMT, facility SYS module IOSUBPAGD
        calling sequence:
  de-allocate device on dismount

  IOC$MAPVBLK, facility SYS module IOSUBRAMS
        calling sequence:
  map using window chain

  IO_DONE, facility F11X module IODONE
        calling sequence:
  post QIO completion directly

  KILL_BUFFERS, facility F11X module RDBLOK
        calling sequence:  (POOL, LOCKBASIS) :  NOVALUE
  kill buffers associated with CURRENT_UCB (and a lockbasis)

  KILL_CACHE, facility F11X module RDBLOK
        calling sequence:  (UCB) :  NOVALUE
  toss out buffers associated with a UCB

  KILL_DINDX, facility F11X module RDBLOK
        calling sequence:  (FCB) :  NOVALUE
  invalidate a directory index block

  LOCK_COUNT, facility F11X module LOCKERS
        calling sequence:  (LOCKID)
  return number of lockers for a LOCKID

  LOCK_IODB, facility F11X module LOCKDB
        calling sequence:
  lock I/O data base mutex

  LOCK_MODE, facility F11X module LOCKERS
        calling sequence:  (ACCTL)
  compute access lock mode for request

  MAKE_ACCESS, facility F11X module MAKACC
        calling sequence:  (FCB, WINDOW, ABD) :  NOVALUE
  hook up windows to FCB, update VCB fields for new file access

  MAKE_DIRINDX, facility F11X module RDBLOK
        calling sequence:  (FCB)
  find/validate a directory index (cache) block


                                   21-8


                                                            ROUTINE LIST


  MAKE_ENTRY, facility F11X module ENTER
        calling sequence:  (NAME_DESC, FIB) :  NOVALUE
  actually make the directory entry

  MAKE_FCB_STALE, facility F11X module LOCKERS
        calling sequence:  (FCBARG) :  NOVALUE
  make all nodes mark the given FCB stale

  MAKE_NAMEBLOCK, facility F11X module MAKNMB
        calling sequence:  (LENGTH, STRING, NAMEBLOCK) :  NOVALUE
  ODS-1 conversion

  MAKE_POINTER, facility F11X module MAKPTR
        calling sequence:  (COUNT, LBN, FILE_HEADER, PLACEMENT_CODE)
  encode map pointer

  MAKE_STRING, facility F11X module MAKSTR
        calling sequence:  (NAMEBLOCK, STRING)
  ODS-1 conversion

  MAP_IDX, facility F11X module CREHDR
        calling sequence:  (VBN, COUNT)
  map block of index file

  MAP_VBN, facility F11X module MAPVBN
        calling sequence:  (VBN, WINDOW, BLOCK_COUNT, UNMAPPED_BLOCKS)
  map VBN to LBN

  MAP_WINDOW, facility F11X module MPWIND
        calling sequence:
  caller for IOC$MAPVBLK

  MARKDEL_FCB, facility F11X module DELETE
        calling sequence:  (FCB)
  mark the FCB as delete pending (cluster wide)

  MARK_COMPLETE, facility F11X module WITURN
        calling sequence:  (WINDOW) :  NOVALUE
  mark a window chain as totally mapped

  MARK_DELETE, facility F11X module DELETE
        calling sequence:  (FIB,  DO_DELETE,  RESULT_LENGTH,  RESULT)  :
              NOVALUE
  mark file for deletion, delete if possible

  MARK_DIRTY, facility F11X module RDBLOK
        calling sequence:  (BUFFER) :  NOVALUE
  mark a cache buffer as modified

  MARK_INCOMPLETE, facility F11X module WITURN
        calling sequence:  (FIRST_BLOCK) :  NOVALUE
  mark a window chain as not totally mapped


                                   21-9


  ROUTINE LIST


  MODIFY, facility F11X module MODIFY
        calling sequence:
  main driver for modify (extent/truncate) function

  MOUNT, facility F11X module MOUNT
        calling sequence:
  mark UCB as mounted

  NEXT_DIR_REC, facility F11X module DIRSCN
        calling sequence:  (OLD_REC, VBN)
  advance to next directory record (and block) for the same name

  NEXT_HEADER, facility F11X module NXTHDR
        calling sequence:  (HEADER, FCB, EXT_FID, SEGNUM)
  read the next extension header

  NEXT_REC, facility F11X module DIRSCN
        calling sequence:  (ENTRY)
  find the next directory entry within the current block

  NOTIFY_AST, facility F11X module XQPMSG
        calling sequence:
  supervisor mode routine to notify user

  NOTIFY_USER, facility F11X module DISPAT
        calling sequence:  (CONTROL_STRING, FAO_ARGS) :  NOVALUE
  sends message to user (via supervisor mode AST routine)

  NUKE_HEAD_FCB, facility F11X module CLENUP
        calling sequence:  (FCB) :  NOVALUE
  remove FCB from volume list, delete all appendages and locks

  OPEN_FILE, facility F11X module FILUTL
        calling sequence:  (FID, WRITE)
  open a file for internal use

  PARSE_NAME, facility F11X module PARSNM
        calling  sequence:   (NAME_DESC,  NAME_BUFFER,  COUNT,   STRING,
              FLAGS) :  NOVALUE
  convert file name to name block

  PMS_END, facility F11X module PMS
        calling sequence:  NOVALUE
  end metering main function

  PMS_END_SUB, facility F11X module PMS
        calling sequence:  NOVALUE
  end metering sub-function

  PMS_START, facility F11X module PMS
        calling sequence:  NOVALUE
  start metering main function


                                  21-10


                                                            ROUTINE LIST


  PMS_START_SUB, facility F11X module PMS
        calling sequence:  (INDEX) :  NOVALUE
  start metering sub-function

  PURGE_EXTENT, facility F11X module SMALOC
        calling sequence:  (ENTRY_COUNT, CACHE_LIMIT) :  NOVALUE
  purge entries from extent cache, return to BITMAP

  QEX_N_CANCEL, facility F11X module LOCKERS
        calling sequence:  (LOCKID)
  jiggle locks to fire blocking AST

  QUOTA_FILE_OP, facility F11X module QUOTAUTIL
        calling sequence:  (ABD, FIB) :  NOVALUE
  basic dispatching for quota file operations

  READ_ATTRIB, facility F11X module RWATTR
        calling sequence:  (HEADER, ABD)
  read user requested attributes

  READ_BLOCK, facility F11X module RDBLOK
        calling sequence:  (LBN, COUNT, TYPE)
  read a block (and possibly some more)

  READ_DATA, facility F11X module FILUTL
        calling sequence:  (WINDOW, VBN, COUNT)
  read data from internal file

  READ_HEADER, facility F11X module RDHEDR
        calling sequence:  (FILE_ID, FCB, REALBASIS_A)
  read main or extension file header

  READ_IDX_HEADER, facility F11X module CREHDR
        calling sequence:
  read primary/alternate index header

  READ_WRITEVB, facility F11X module RWVB
        calling sequence:
  read/write virtual block, including writing special files

  REBLD_PRIM_FCB, facility F11X module CREFCB
        calling sequence:  (PFCB, HEADER)
  delete old ACL and extension FCB chain, refill FCB

  RELEASE_CACHE, facility F11X module RDBLOK
        calling sequence:  NOVALUE
  release the buffer cache lock, wake others

  RELEASE_LOCKBASIS, facility F11X module RDBLOK
        calling sequence:  (LCKINDX)
  release buffers associated with a lockbasis value


                                  21-11


  ROUTINE LIST


  RELEASE_SERIAL_LOCK, facility F11X module LOCKERS
        calling sequence:  (LOCK_INDEX) :  NOVALUE
  release serial lock, lock block, check caches

  REL_QUOTA_LOCK, facility F11X module CHARGEQ
        calling sequence:  (J) :  NOVALUE
  release quota cache lock

  REMAP_FILE, facility F11X module ACPCNTRL
        calling sequence:  NOVALUE
  completely map file

  REMOVE, facility F11X module REMOVE
        calling sequence:  (KEEP_NAME) :  NOVALUE
  remove a directory entry

  REQUEUE_REQ, facility F11X module REQUEU
        calling sequence:
  re-queue request to driver (as a physical/logical request)

  RESET_LBN, facility F11X module RDBLOK
        calling sequence:  (BUFFER, LBN) :  NOVALUE
  change the LBN associated with a buffer

  RESTORE_CONTEXT, facility F11X module GETREQ
        calling sequence:  NOVALUE
  restore context area after XQP sub-function

  RESTORE_DIR, facility F11X module ENTER
        calling sequence:  (CONTEXT) :  NOVALUE
  reposition directory according to saved context, restore context

  RETURN_BLOCKS, facility F11X module SMALOC
        calling sequence:  (START_LBN, BLOCK_COUNT,  ERASE_REQUESTED)  :
              NOVALUE
  return a set of blocks to the storage map

  RETURN_CREDITS, facility F11X module RDBLOK
        calling sequence:  NOVALUE
  return the reserved cache buffers

  RETURN_DIR, facility F11X module RETDIR
        calling sequence:  (COUNT, STRING, ABD) :  NOVALUE
  return result data from directory scan to user's result string

  RM$ARM_DIRCACHE, facility RMS module RM0SETDID
        calling sequence:
  arm blocking AST to notice UCB$W_DIRSEQ change

  RM$DIRCACHE_BLKAST, facility SYS module RMSRESET
        calling sequence:
  increment DIRSEQ in UCB


                                  21-12


                                                            ROUTINE LIST


  SAVE_CONTEXT, facility F11X module GETREQ
        calling sequence:  NOVALUE
  save context area for XQP sub-function

  SCAN_BADLOG, facility F11X module BADSCN
        calling sequence:  (FID, BASE_VBN, BASE_LBN, MODE,  BLOCK_COUNT)
              :  NOVALUE
  add/remove entry from BADLOG file

  SEARCH_FCB, facility F11X module SCHFCB
        calling sequence:  (FILE_ID)
  look for FCB off of VCB

  SEARCH_QUOTA, facility F11X module CHARGEQ
        calling sequence:  (UIC, FLAGS, START_REC, USE_CACHE)
  search quota file/cache for a UIC

  SELECT_VOLUME, facility F11X module SELVOL
        calling sequence:  (FIB, BLOCKS_NEEDED) :  NOVALUE
  pick the best volume for an allocation

  SEND_BADSCAN, facility F11X module SNDBAD
        calling sequence:  (FID) :  NOVALUE
  send message to, spawn badblock routine

  SEND_ERRLOG, facility F11X module SNDERL
        calling sequence:  (MODE, UCB)
  generate error log entry

  SEND_SYMBIONT, facility F11X module SNDSMB
        calling sequence:  (HEADER, FCB) :  NOVALUE
  send spool request to job controller

  SERIAL_CACHE, facility F11X module RDBLOK
        calling sequence:  NOVALUE
  serialize (lock) the buffer cache

  SERIAL_FILE, facility F11X module LOCKERS
        calling sequence:  (FID_ADDR)
  acquire serial lock, return lock block

  SET_DIRINDX, facility F11X module CLENUP
        calling sequence:  (FCB)
  try to make an association between a directory and its index block

  SET_EXPIRE, facility F11X module ACCESS
        calling sequence:
  marks the window as needing expiration recording when closed

  SET_REVISION, facility F11X module DEACCS
        calling sequence:  (HEADER, MODE) :  NOVALUE
  update revision date in header


                                  21-13


  ROUTINE LIST


  SHUFFLE_DIR, facility F11X module SHFDIR
        calling sequence:  (DIRECTION) :  NOVALUE
  extend/compress a directory

  START_REQUEST, facility F11X module DISPATCH
        calling sequence:
  test for volume activity blocking and raise volume activity count

  SWITCH_CHANNEL, facility F11X module SWITVL
        calling sequence:  (UCB) :  NOVALUE
  switch XQP to new UCB for new volume

  SWITCH_VOLUME, facility F11X module SWITVL
        calling sequence:  (NEW_RVN) :  NOVALUE
  switch volume context, switch allocation lock to new volume if held

  TAKE_BLOCK_LOCK, facility F11X module LOCKERS
        calling sequence:  NOVALUE
  acquire volume activity blocking lock for the process

  TOSS_CACHE_DATA, facility F11X module RDBLOK
        calling sequence:  (LCKINDX) :  NOVALUE
  write, invalidate all buffers associated with a lockbasis

  TRUNCATE, facility F11X module TRUNC
        calling sequence:  (FIB, FILEHEADER, TRNVBN) :  NOVALUE
  truncate blocks off end of file

  TRUNCATE_HEADER, facility F11X module TRUNC
        calling sequence:  (FIB, HEADER, POINTER, LAST_COUNT) :  NOVALUE
  return truncated blocks to storage map, clear map pointers

  TRUNC_CHECKS, facility F11X module TRUNC
        calling sequence:  (FIB, HEADER) :  NOVALUE
  perform validity checks on truncate request

  TURN_WINDOW, facility F11X module WITURN
        calling sequence:  (WINDOW, HEADER, DESIRED_VBN, START_VBN)
  create window block(s), turn windows to desired VBN

  UNHOOK_BFRL, facility F11X module RDBLOK
        calling sequence:  (BFRDARG) :  NOVALUE
  unhook a buffer descriptor from a buffer lock

  UNLOCK_IODB, facility F11X module LOCKDB
        calling sequence:
  unlock I/O data base mutex

  UNLOCK_XQP, facility F11X module DISPAT
        calling sequence:  NOVALUE
  release all locks


                                  21-14


                                                            ROUTINE LIST


  UPDATE_DIRSEQ, facility F11X module CHKDMO
        calling sequence:
  causes update  of  DIRSEQ  in  UCB  for  all  nodes  (invalidates  RMS
  directory caches)

  UPDATE_FCB, facility F11X module CREFCB
        calling sequence:  (HEADER) :  NOVALUE
  update (fill) primary FCB from given header

  UPDATE_INDX, facility F11X module DIRSCN
        calling  sequence:   (BLOCK,  STR_SIZE,  STR_ADDR,   DIRFCB)   :
              NOVALUE
  record the directory entry name in the directory index cache cell

  WAIT_FOR_AST, facility F11X module DISPATCH
        calling sequence:
  suspend XQP activity, return to user, wait for event

  WRITE_ATTRIB, facility F11X module RWATTR
        calling sequence:  (HEADER, ABD, CONTROL_ACCESS) :  NOVALUE
  write user attributes

  WRITE_AUDIT, facility F11X module DISPAT
        calling sequence:  (AUDIT_BLOCK) :  NOVALUE
  generate, write audit record

  WRITE_BLOCK, facility F11X module RDBLOK
        calling sequence:  (BUFFER) :  NOVALUE
  write a buffer back to disk

  WRITE_DIRTY, facility F11X module RDBLOK
        calling sequence:  (LOCKBASIS) :  NOVALUE
  write buffers associated with a lockbasis back to disk

  WRITE_HEADER, facility F11X module RDBLOK
        calling sequence:  NOVALUE
  write a file header (checksum before write block)

  WRITE_QUOTA, facility F11X module CHARGEQ
        calling sequence:  (Q_RECORD) :  NOVALUE
  mark quota record for writing

  WRONG_LOCKBASIS, facility F11X module RDBLOK
        calling sequence:  (HEADER) :  NOVALUE
  return a buffer found to have the wrong lockbasis

  XQP$BLOCK_ROUTINE, facility SYS module SYSACPFDT
        calling sequence:
  block volume activity (decrement activity), invoke  XQP$DEQBLOCKER  if
  idle

  XQP$DEQBLOCKER, facility SYS module SYSACPFDT


                                  21-15


  ROUTINE LIST


        calling sequence:
  de-queue blocking lock (swapper)

  XQP$FCBSTALE, facility SYS module SYSACPFDT
        calling sequence:
  blocking routine to mark FCB stale

  XQP$REL_QUOTA, facility SYS module SYSACPFDT
        calling sequence:
  blocking AST to invoke XQP$UNLOCK_QUOTA

  XQP$UNLOCK_CACHE, facility SYS module SYSACPFDT
        calling sequence:
  pass system blocking cache AST on to cache server process

  XQP$UNLOCK_QUOTA, facility SYS module SYSACPFDT
        calling sequence:
  de-queue/demote quota cache entry lock

  XQPMERGE, facility F11X module XQPMERGE
        calling sequence:
  force a new file system into P1 space

  ZERO_IDX, facility F11X module CLENUP
        calling sequence:  NOVALUE
  clear directory index block

  ZERO_ON_ERROR, facility F11X module DISPAT
        calling sequence:  (SIGNAL, MECHANISM)
  handler to return zero as routine value

  ZERO_WINDOWS, facility F11X module CLENUP
        calling sequence:  (FCB)
  de-allocate windows off the FCB


                                  21-16