VS018, SE010
**************************************************************
0145:	I/O PLUMBING - I/O Interception Without Pain
	Glenn C. Everhart, RAXCO, Inc.

   Intercepting control during the I/O process is part of the craft
of system programming that has numerous uses. Now, in general, you want
to use well defined interfaces because:

	* Control flow is well defined there, may even be documented
		and stable.
	* All control narrows to flow through these interfaces, rather
		than going through many.
	* Data structures have to contain all information here, rather
		than having control info encoded in misc. parts of
		machine state.

   It is a fundamental concept of systems programming to find new uses
for existing interfaces and to intercept control there.

   On real OSs, it is also fundamental to avoid assuming you have
complete control of those interfaces. Manufacturers and other systems
programmers use the same ones. It is important to design your applications
so multiple users of an interface can exist cleanly. The PC world has had
many problems because its interfaces started to be used by a large group
who did not do this.

   Now back to VMS control stealing.

   Consider the major gateways that I/O passes through.

  I/O Flow:

 (Leave out other hacks like stealing the CHMK or CHME vectors by bashing
  the SCB; I consider only I/O here.)

  Process -> QIO call
  EVERY one of these points can be a place to intercept the operation.
1	-> sysqioreq code in VMS kernel; sets up IRP and device
		independent fields. Validates that driver can
		do the operation (using 1st FDT mask, 64 bits)
	-> sets buffered bit if appropriate (generally so for
		XQP calls).
2	-> calls driver FDT routines inside a tight loop. (In the
		FDT routines the kernel stack has a JSB - pushed PC
		and one CALLG frame from the qio call.) Finds FDT table
		from DDT, pointed to by ucb$l_ddt.
3	-> FDT routines do additional setup, finish getting IRP
		set up (P1 to P6 in arg list on VAX; in IRP already
		on Alpha). Exit via RSB (getting back to loop and to
		next FDT), or to error exit, or to code to queue
		IRP to driver start_io or to XQP. In these cases
		eventually intermediate return to exe$qioreturn or
		friends for intermediate return (pending i/o done).
		Return pops stack back, sets IPL to 0, sets R0 to
		code (usually 1 where all's well).
4	   (note: you can patch the global XQP entry point also if you
	          like: IRP is complete here and you get here via
		  normal knl AST.
5	-> Driver start-io entry. Does actual hardware protocol setup.
		I won't go into this; hardware can be timing dependent
		and patching from past here thru interrupt code is not
		usually desirable.
6	-> Post processing queue handles status return to IOSB, sometimes
		buffer copy/interpret, etc.


7       You can similarly patch almost anywhere else in VMS, so long as
		you can figure out how not to break synchronization. (Yes,
		you can even steal the CHMK handler from the SCB!)

What One Does with these Patch Locations

1. Patch at the system call - use to monitor calls (per process...means you
	bash the table in P1 space) and record whatever user info you like.
	Difficulty: you don't usually own any process space for dedicated
	records or for replacing arguments and clobbering user calls is
	risky or worse. FTS012 is an example program that patches here.
	Your code can lie to the process too... Note: patching the SCB to steal
	the CHMK or CHME vector is more complete and can catch calls via S0
	entry points too.

2. You can steal the driver's FDT table by pointing the DDT at your own
	after you insert your FDT processing ahead of what's there.
	Table is a series of (function mask, address) pairs. Code at
	"address" gets control when (and ONLY when) function mask bits
	for the current I/O function are set. You're in user process context
	at IPL 2 here and can allow, disallow, or to some extent modify the
	I/O or its results at this point.

	Here's how.

	It is possible to interrupt FDT processing PROVIDED you save the
	full context (registers), don't disturb the IRP, and make DARN
	sure the user process can't clobber its inputs. Remember that
	at FDT time you do not yet busy the driver, so you must save the
	I/O per channel, per process when doing this. Finding a fast
	way to get to this info is important; a base pointer can be stored
	in a pseudodriver UCB, a lock value block, a logical name, or
	(if you can find one) some unused system cell. The last is usually
	ill advised.

	Once you save the I/O context you can notify a process about
	what you want; techniques such as using sch$postef to set
	a local event flag or exe$writembx to write to a MB: unit
	are examples. You then need to return to user context.

	It is VITAL if intercepting here not to let user processing
	occur. Blocking ASTs is optional, but the mainline can't be
	allowed to run or args can get modified (RMS does this).
	You can inhibit ASTs via the PCB$B_ASTEN byte (set to 1 to
	allow only kernel mode ASTs, for example), and inhibit a process
	with no side effects by putting it in RWAST state (see the calls
	to EXE$RWAIT) momentarily. This does not disturb other wait states
	that the process might enter. You can use other waits if desired.
	Also be sure the CCB doesn't go idle. Blocking AST processing
	while your daemon runs gives added safety. You can select which
	modes to block, too, so user mode code can be kept quiet
	while inner modes are undisturbed. Some care is needed due to
	possible locks; you must be sure you will fairly quickly
	and without fail reenable the process lest you cause deadlocks
	somewhere.

	To get back into your processing, and undo a wait, using a special
	kernel AST gets you to much the same state as AST processing.
	Your AST must restore registers, reenable ASTs to where they
	were, and reissue the user's I/O, then exit the AST.

	Reissuing is ALMOST straightforward...just duplicate the
	FDT call loop lower on the stack. NOTE though that you
	return at IPL 0, so must maintain synch. "by hand" and
	return to IPL 2 as soon as you get back. This means the
	technique is not 100% general, but works fairly well. An
	advantage of special vs. regular kernel ASTs here is their
	simplicity of scheduling by VMS.


	Uses:  You can do whatever you like, in process code, to check
		what should happen on certain I/Os. You can move files,
		deny or allow access to them, return info to alter
		process priority once back in the process, change
		extend parameters, change where reads/writes go,
		monitor I/O file transactions, build extra responses
		in to different I/O patterns, play any trick you like.

	       This is not a good place to try to do caching, though,
		due to lack of synchronization. Also, remember that MSCP
		served disks act by sending IRPs to driver start_io, so
		in a cluster everyone needs to have this processing locally.

3. Stealing FDT entries themselves can be done. It's easier to add your
	own, though.

4. Stealing the XQP entry point. This is a very good place to monitor
	I/O. It is system wide and requires you decode the IRP packet
	formats for XQP operations, and has the same difficulty 2 does
	of finding a place for its data. You get control in a kernel
	AST (for the XQP) with the IRP all built. Creating an I/O
	error here is less well defined than in FDTs, a disadvantage,
	but you can twiddle I/O operations similarly to 2 and need
	not block a process, as all its arguments are fully encapsulated
	in the IRP by this point. Adding your own call to a process here
	is possible, though care is needed to ensure the process' I/O
	that you call is not blocked. This is an issue in 2 also.

	Uses: Applications that modify file extensions or monitor I/O
		are reasonable fits here. Other applications such as
		have been mentioned for 2 are generally possible, though
		changing I/O can be more involved here, as the XQP
		has its own methods for doing I/O which are not that
		similar to normal user ones. The synchronization is
		however cleaner in that you start in a kernel AST and
		can resume in one, and do not have hardcoded SETIPL #0
		instructions to work around.

5. Stealing driver start_io entry. This location is often taken for
	purposes of implementing cache systems. You just change the
	pointer in the DDT (Driver Dispatch Table) and your code
	gains control before the regular driver's. At this point
	(or in the earlier ones) you can "steal" the IRP completion
	by filling in IRP$L_PID with the address of your completion
	routine. This location gains control at I/O postprocessing
	then. A "normal" driver's start_io entry is controlled by
	driver busy, so only one set of cells is needed to hold data
	until I/O done.
	Paul Sorenson and John Osudar suggested moving the DDT into
	a pseudodriver UCB, so that on a call, the UCB$L_DDT pointer
	of the intercepted driver points to a known offset in the
	intercepting driver's UCB. This makes access to data very
	fast. My driver [VAX92B.GCE92B.NET92B]QDRIVERSKEL.MAR) on the
	Fall 1992 SIG tapes is an example of code that does this with
	some extra work to allow multiple applications to steal the
	same entries. Both FDT and DDT stealing are there in the code.

	CDdriver is a good example of the use of stealing start_io; it
	implements a single CPU cache. (To do this across a cluster
	means taking out block or file locks across cluster, in the
	cacher, so when anyone starts to write a block the other nodes
	can disable it. There are some very touchy timing issues
	that make getting this right a hard problem. I'm not expert
	in them. Clearly, though, if you rely on the lock manager,
	you must arrange that access be delayed at any node wanting
	to write if that node doesn't own the lock. This means either
	rolling your own delay mechanism or using some mechanism that
	VMS uses, like RMS file locks. The delay is needed to block
	a write to a block that's in cache on another CPU, so the
	other CPU can invalidate its cache.
	  This can be done by techniques like fddriver or ztdriver 
	(remote virtual tape/disk drivers in sig tapes) use, saving
	the IRP aside until your communicating process that's dealing
	with the locks gets done its operation and then continuing
	the operation either with a fake interrupt in the pseudodriver
	you've got the extra code in, or via some FDT level entry
	that forks and reissues the IRP along from fork level to
	get it back to its original track. Incidentally, if your driver
	is using altstart and keeping its own queues, this gets much
	more complex.

6. Stealing the post processing queue. This technique involves some
	earlier access to use the IRP$L_PID hook, and in general you
	need to at least call COM$POST eventually to complete I/O
	on the packet. However, you can use this point to edit what
	the I/O returns. (You can steal the iopost (ipl4) interrupt
	instead, but the IRP$L_PID hook is much easier to use.)

	I have heard reports of insertion of close at this point also.
	Now, in general, you can do $QIO from kernel mode (leaving out
	issues of synchronization), provided that:
		1. All arguments are r/w from KERNEL mode
		2. The QIO mode is kernel
		3. On VAX, previous mode needs to be Kernel since
			PROBE instructions use previous mode for their
			checks.
		4. Kernel AST delivery need to be enabled if you wish to avoid
			hangs, for XQP processed operations like deaccess
			(=close).

	The difficulty you have is dropping down from IPL4 to do the I/O
	without breaking synchronization and lousing up the IPL4 queue.
	The simplest thing to do is probably to send an AST to yourself
	and do the operations there; this is well documented. You can
	request another interrupt at IPL 4 after having requeued the
	packet (see fddriver sources for a sample fragment) so you
	can get back, and even get the system to complete the I/O for
	you.

7. Stealing generic VMS locations. ("blue sky")
	Suppose we have an IPL 0 site within VMS, a process, an image,
	or the like and want to insert some process code. (If a site is
	handled at some higher IPL and synchronization the synchronization
	issues have to be handled also.)

	At the site, insert a $cmkrnl call to get into your processing
	in kernel mode with your own entry mask. (Actual details of
	bashing a particular site are not the issue here; you need to
	be able to stash your patch into nonpaged pool and replicate
	whatever instructions you bash.

	Your patch will look like this:

	patch:	duplicate instructions
		IF the process is not the service process THEN
		  store off any desired info from the site
		  movq r0,-(sp)
		  $cmkrnl_s routin=mypatch2
		  movq (sp)+,r0
		END IF
		return to original site

	.entry mypatch2,^m<r2,r3,r4,r5,r6,r7,r8,r9,r10,r11>

	allocate a message buffer
	Fill in with address of patchAST (below), and whatever
	other info is desired.
	fill in r3,r4,r5 to point at the buffer and the mailbox UCB
	of the mailbox set up by your service process

	call EXE$WRITEMBX to send the message buffer to the
		mailbox if at high IPL, or just use $qio
		if not.

	Free the buffer
	Set local semaphore for this process
	Disable some AST deliveries if appropriate
	Loop in a loop calling SCH$RWAIT until your semaphore
	is set. (This prevents your process from noticing a wait
	while the service process runs)

	Pick up any results from the patchAST and bash whatever is
	appropriate. Reenable AST deliveries if appropriate.

	movl	#ss$_normal,r0
	RET

; patchAST entered via kernel or special kernel AST fired off from the
; service process.
	.entry patchAST,^M<regs>
	Pick off any arguments from the ACB so we can return 
	them to the "mypatch2" procedure.

	Set the local semaphore so the SCH$RWAIT loop terminates
	RET


The service process' outline is:

	Establish the mailbox and stash its UCB address where the
	patch can get it fast.

	forever:
	  Read the mailbox
	  Do whatever the process darn well pleases with the
		information there, having access to the entire
		machine/cluster/net...
	  Send an AST to the address received in the mailbox message
	  Loop

	Again, if you're not starting at IPL 0, you need to handle
forking issues to get to the desired synchronization. Remember that
you can queue ASTs to get to IPL 2 or fork to other levels.


What can you do with this? Some example hacks:
	* Steal RMS entries from a process and get some other process
		to do things first. (Want to duplicate transactions
		without buying rms journalling??)
	* Steal VMS entries and filter them with user mode code
		any way you please. This can be per process or per
		system (IF you're careful!)
	* Stick a patch where DCL comes up with its common error message

  %DCL-W-IVVERB, unrecognized command verb - check validity and spelling

		which is ALWAYS the same and boring. Instead, why not
		have your system generate messages like:

	The way you type, we could be here all night.
	Things seem slow to you? I've got four SPACEWARs and an Ada compile
		running.
	I don't understand either. That command should have worked.
	You didn't really want the answer to that, did you?
	A puff of orange smoke appears and indicates...you screwed up again.


	A process can readily generate such things, blast them over, and
then flag the patch should generate a skip around the code that
comes up with the boring normal error message. (The process could
also perhaps even try a second time to figure out what you might
really want.)

	* You can from this mode change things like process privs, prompts,
		priority, etc. Doing this randomly is not so great, but
		imagine the following actions: <evil grin>

	"This person is opening the Ada compiler. NOBODY should be
	running THAT interactively! We'll lower THAT boy's
	priority to zero and null all his privs..."

	"Here's someone running TECO. Since he's obviously a wizard,
	raise his priority to 15 and give him SETPRV."

	"This is the 500th I/O with this guy in MAIL. Let's teach him
	a lesson by changing his DCL prompt to "MAIL> "."

	"This guy is running SYSGEN; he's fair game. Let me scare the
	wits out of him by changing his prompt to ">>>" and responding
	to a few commands like a system console for a few lines."

	"Here's a person who is in MAIL and using a lot of swear
	words or scatological language. Tsk tsk. He shouldn't have
	such a foul mouth, so I'll tell my process (HISMOMMY) to take
	over his terminal and demand an apology, and I won't let
	him get real control back until he types I'M SORRY."

	"Joe SystemProgrammer is running games. We don't want the 
	boss to catch him, so make him invisible."

	"J. RandomUser is running games. Generate some message
	about an earthquake ruining Colossal Cave and kick him
	off."		(who says we have to be fair??)

	"The guy in the next cubicle is kind of paranoid and always
	seems to sign his vaxmail with "top secret crypto nuclear".
	He just typed that in. So I'll break in on him with some
	message like

%SECURMON-I-FWDNSA, message copy forwarded to approved monitor

	and then let him go on...oughta really freak him out."


***********************************************************************
0147: Pro-active security
Many security tools found on sig tapes or commercially available limit
themselves to reporting security problems. This session will discuss a
few tools the author has worked on which go beyond reporting and can
help prevent security violations from happening actively. Topics will
include cryptodisks, write-once disks, access monitors, dynamic
privilege and rights controls, network-wide identifiers, and possibly
others and will discuss the utility of these techniques in control
of system activity. A brief discussion of ways of testing untrusted
images will also be given, using commonly available tools.
  

	Pro-Active Security

	There are lots of security checking tools...things like
Guess, Password Policy, and the various suites of command files
that check for under-protected files, weak passwords, writeable
files, and the like. Under unix, there are cops, crack, and tripwire,
all of which have been on sig tapes.
	These are Good Things, and necessary before you can go further.
	You MUST know what people correspond to what accounts and have
		some control over who may alter what.

	Problem with them: They assume you can use the underlying system
		to close the holes, and that adequate security is
		possible via just the vendor OS.

	But the vendor OS may assume a level of trust in an individual.
		Some people wear multiple hats in multiple colors.

	What is a "subject" in the security sense? A person?

	Counterexample: Joe Clerk.
		Using PAYROLL.EXE on PAYROLL.DAT, his role is "pay clerk".
		Using COPY.EXE on PAYROLL.DAT, what's his role?

	The vendor model might at best report access. But if he renames
	his file copy program "RECORDCHECK.EXE", how will you know?
	Even keystroke records can be obfuscated.


	How to deal with this?
	Make parts of the subject identity vary with what a person
		is doing.

	Get the computer to "know" more about what people are doing in
	realtime.

	One technique for this involves monitoring terminal activity.
	This is actually quite good for a lot of things; terminal
	input, for example, of lots of strings with [*...] in them
	over a short period is much more concise in telling you the
	person is looking for something or working with directory
	trees than file system monitoring. However, DCL input can be
	disguised. If the user were to use

$ a = "["
$ b = "*..."
$ c = "]"
	and somewhat later did
$ myhome = a+b+c

	it becomes very hard for any terminal monitor to tell what
	is going on. If the characters are hex, even a human has
	trouble. (a = f$string(91) anyone?)

	The file system is harder to fool, though. If a user wants
	to open a file, generally the file system has to know about it
	and monitoring code added here can be a valuable addition to
	a security suite. Note that these kinds of things can be done
	to directory files too; access controls on [000000]000000.dir
	work nicely.

	Let me describe some things I've done.

	By watching access to sensitive files (requires hooking into
		the I/O subsystem), you can arrange a database driven
		monitor which can:

		* Check user permission to use the file, etc.
			- at time of day
			- at terminal/workstation
			- by image ID
			- by default directory
			- check file integrity
			- etc.
		   ---and allow/deny access only after
			a whole series of tests at much finer grain
			than normal VMS (unix) ones.
		* Monitor can get info about what is written so it
			can dynamically tell if suspicious things
			are going on based on site criteria, and
			optionally freeze the access until an authority
			can check.

		* Force priv masks to a particular state when
			running an image.
			- Semi-trusted image can be used, with guarantee
				that it never has BYPASS, CMKRNL, etc.
			- Can enforce confinement of file access by 
				ensuring the image never gets write
				access to, e.g. SYSUAF.DAT.

		* Grant/revoke identifiers depending on what you're
			running.
			- Useful where the information confinement needs
				to be more elaborate
			- Lets you guarantee that an application will
				ALWAYS have the identifier in effect, so
				an identifier gets to control an application
				as a subject.
			- It can be simpler however to build an access
				monitor to manage who may open what, using
				a database as needed, rather than try to
				manage multiway access controls by adding
				long ACLs. If coarse granularity will do,
				though, consider device ACLs.


	These facilities let you get much finer grain control over
	what access to sensitive files is allowed. Since the decisions
	get made by process (user mode) code, audit journalling of the
	results is simple & essentially free.

	Another assumption: audit files will be useful.
	Once you get someone who gets his "favorite set of privs",
	(thanks, Bruce) ALL BETS ARE OFF wrt audit files.

	What do do?

	The simple approach: send the files to a separate machine
	that ONLY receives data and logs it, as they are written.
	A PC running a terminal emulator in log-session mode, recording
	all the audit messages isn't feasible to break into remotely.
	A periodic dump of audit files to tape is pretty hard to edit
	also, but not as hard. A disk file, on the other hand, is VERY
	easy to edit and alter.

	Someone who can get privs OFTEN knows to go after audit files and
	how to turn auditing off.

	If you don't have a spare machine to dedicate, there are
	alternatives.

	* A virtual write once, read many disk can be set up. This is
		a driver and symbiont program pair, which look like
		a disk to VMS, but use a file on a real disk. A
		secret transform of the data is used to avoid keeping
		the file data from being directly interpretable
		so it must be accessed via the virtual worm. In
		addition to doing this transform, the driver
		  - Disallows delete altogether
		  - Checks on writes that the block being written
			has not previously been written and allows
			the write only where it is new.
		  - Report fatal error when these are tried.

		The driver must be "told" to allow modifications
		to directories and the index file. Simple solution
		is to have a "fence" block, place index file, bitmap,
		and directories below it.

	Strength: A file found there cannot be tracelessly modified.
	Weakness: Files can be scrambled by messing up container file,
		or headers can be messed up. The files will still tend
		to be not fragmented (write-once, remember?) and
		can be recovered with some human intervention if this
		happens.

	If you find a corrupted WORM disk, you HAVE a problem.

	You can also secure files. Against priv'd users only
		cryptography works.

		- There are lots of file encrypt/decrypt tools
			in PD and commercially. Use:
		  > Decrypt file
		  > Use or modify file
		  > Re-encrypt file

		- Some editors have this built in so file never
			appears in clear on disk. OK if you want
			only to edit file. Not so hot for 
			PAYROLL.DAT.

		- Cryptodisk is a virtual disk that encrypts
			everything on it, decrypts on access.
			Can have driver level access list.
		- Automatic encrypt/decrypt, no forgetting to
			re-encrypt.
		- Process that encrypts is a subproc. of disk owner
			so it automatically goes away if owner does.
		- Do NOT use DES since software DES is a terrible
			CPU hog and would inhibit use of system.
			Use XOR with very long random bitstring
			instead...blindingly fast. (yes, other
			algorithms exist...)