VS018, SE010 ************************************************************** 0145: I/O PLUMBING - I/O Interception Without Pain Glenn C. Everhart, RAXCO, Inc. Intercepting control during the I/O process is part of the craft of system programming that has numerous uses. Now, in general, you want to use well defined interfaces because: * Control flow is well defined there, may even be documented and stable. * All control narrows to flow through these interfaces, rather than going through many. * Data structures have to contain all information here, rather than having control info encoded in misc. parts of machine state. It is a fundamental concept of systems programming to find new uses for existing interfaces and to intercept control there. On real OSs, it is also fundamental to avoid assuming you have complete control of those interfaces. Manufacturers and other systems programmers use the same ones. It is important to design your applications so multiple users of an interface can exist cleanly. The PC world has had many problems because its interfaces started to be used by a large group who did not do this. Now back to VMS control stealing. Consider the major gateways that I/O passes through. I/O Flow: (Leave out other hacks like stealing the CHMK or CHME vectors by bashing the SCB; I consider only I/O here.) Process -> QIO call EVERY one of these points can be a place to intercept the operation. 1 -> sysqioreq code in VMS kernel; sets up IRP and device independent fields. Validates that driver can do the operation (using 1st FDT mask, 64 bits) -> sets buffered bit if appropriate (generally so for XQP calls). 2 -> calls driver FDT routines inside a tight loop. (In the FDT routines the kernel stack has a JSB - pushed PC and one CALLG frame from the qio call.) Finds FDT table from DDT, pointed to by ucb$l_ddt. 3 -> FDT routines do additional setup, finish getting IRP set up (P1 to P6 in arg list on VAX; in IRP already on Alpha). Exit via RSB (getting back to loop and to next FDT), or to error exit, or to code to queue IRP to driver start_io or to XQP. In these cases eventually intermediate return to exe$qioreturn or friends for intermediate return (pending i/o done). Return pops stack back, sets IPL to 0, sets R0 to code (usually 1 where all's well). 4 (note: you can patch the global XQP entry point also if you like: IRP is complete here and you get here via normal knl AST. 5 -> Driver start-io entry. Does actual hardware protocol setup. I won't go into this; hardware can be timing dependent and patching from past here thru interrupt code is not usually desirable. 6 -> Post processing queue handles status return to IOSB, sometimes buffer copy/interpret, etc. 7 You can similarly patch almost anywhere else in VMS, so long as you can figure out how not to break synchronization. (Yes, you can even steal the CHMK handler from the SCB!) What One Does with these Patch Locations 1. Patch at the system call - use to monitor calls (per process...means you bash the table in P1 space) and record whatever user info you like. Difficulty: you don't usually own any process space for dedicated records or for replacing arguments and clobbering user calls is risky or worse. FTS012 is an example program that patches here. Your code can lie to the process too... Note: patching the SCB to steal the CHMK or CHME vector is more complete and can catch calls via S0 entry points too. 2. You can steal the driver's FDT table by pointing the DDT at your own after you insert your FDT processing ahead of what's there. Table is a series of (function mask, address) pairs. Code at "address" gets control when (and ONLY when) function mask bits for the current I/O function are set. You're in user process context at IPL 2 here and can allow, disallow, or to some extent modify the I/O or its results at this point. Here's how. It is possible to interrupt FDT processing PROVIDED you save the full context (registers), don't disturb the IRP, and make DARN sure the user process can't clobber its inputs. Remember that at FDT time you do not yet busy the driver, so you must save the I/O per channel, per process when doing this. Finding a fast way to get to this info is important; a base pointer can be stored in a pseudodriver UCB, a lock value block, a logical name, or (if you can find one) some unused system cell. The last is usually ill advised. Once you save the I/O context you can notify a process about what you want; techniques such as using sch$postef to set a local event flag or exe$writembx to write to a MB: unit are examples. You then need to return to user context. It is VITAL if intercepting here not to let user processing occur. Blocking ASTs is optional, but the mainline can't be allowed to run or args can get modified (RMS does this). You can inhibit ASTs via the PCB$B_ASTEN byte (set to 1 to allow only kernel mode ASTs, for example), and inhibit a process with no side effects by putting it in RWAST state (see the calls to EXE$RWAIT) momentarily. This does not disturb other wait states that the process might enter. You can use other waits if desired. Also be sure the CCB doesn't go idle. Blocking AST processing while your daemon runs gives added safety. You can select which modes to block, too, so user mode code can be kept quiet while inner modes are undisturbed. Some care is needed due to possible locks; you must be sure you will fairly quickly and without fail reenable the process lest you cause deadlocks somewhere. To get back into your processing, and undo a wait, using a special kernel AST gets you to much the same state as AST processing. Your AST must restore registers, reenable ASTs to where they were, and reissue the user's I/O, then exit the AST. Reissuing is ALMOST straightforward...just duplicate the FDT call loop lower on the stack. NOTE though that you return at IPL 0, so must maintain synch. "by hand" and return to IPL 2 as soon as you get back. This means the technique is not 100% general, but works fairly well. An advantage of special vs. regular kernel ASTs here is their simplicity of scheduling by VMS. Uses: You can do whatever you like, in process code, to check what should happen on certain I/Os. You can move files, deny or allow access to them, return info to alter process priority once back in the process, change extend parameters, change where reads/writes go, monitor I/O file transactions, build extra responses in to different I/O patterns, play any trick you like. This is not a good place to try to do caching, though, due to lack of synchronization. Also, remember that MSCP served disks act by sending IRPs to driver start_io, so in a cluster everyone needs to have this processing locally. 3. Stealing FDT entries themselves can be done. It's easier to add your own, though. 4. Stealing the XQP entry point. This is a very good place to monitor I/O. It is system wide and requires you decode the IRP packet formats for XQP operations, and has the same difficulty 2 does of finding a place for its data. You get control in a kernel AST (for the XQP) with the IRP all built. Creating an I/O error here is less well defined than in FDTs, a disadvantage, but you can twiddle I/O operations similarly to 2 and need not block a process, as all its arguments are fully encapsulated in the IRP by this point. Adding your own call to a process here is possible, though care is needed to ensure the process' I/O that you call is not blocked. This is an issue in 2 also. Uses: Applications that modify file extensions or monitor I/O are reasonable fits here. Other applications such as have been mentioned for 2 are generally possible, though changing I/O can be more involved here, as the XQP has its own methods for doing I/O which are not that similar to normal user ones. The synchronization is however cleaner in that you start in a kernel AST and can resume in one, and do not have hardcoded SETIPL #0 instructions to work around. 5. Stealing driver start_io entry. This location is often taken for purposes of implementing cache systems. You just change the pointer in the DDT (Driver Dispatch Table) and your code gains control before the regular driver's. At this point (or in the earlier ones) you can "steal" the IRP completion by filling in IRP$L_PID with the address of your completion routine. This location gains control at I/O postprocessing then. A "normal" driver's start_io entry is controlled by driver busy, so only one set of cells is needed to hold data until I/O done. Paul Sorenson and John Osudar suggested moving the DDT into a pseudodriver UCB, so that on a call, the UCB$L_DDT pointer of the intercepted driver points to a known offset in the intercepting driver's UCB. This makes access to data very fast. My driver [VAX92B.GCE92B.NET92B]QDRIVERSKEL.MAR) on the Fall 1992 SIG tapes is an example of code that does this with some extra work to allow multiple applications to steal the same entries. Both FDT and DDT stealing are there in the code. CDdriver is a good example of the use of stealing start_io; it implements a single CPU cache. (To do this across a cluster means taking out block or file locks across cluster, in the cacher, so when anyone starts to write a block the other nodes can disable it. There are some very touchy timing issues that make getting this right a hard problem. I'm not expert in them. Clearly, though, if you rely on the lock manager, you must arrange that access be delayed at any node wanting to write if that node doesn't own the lock. This means either rolling your own delay mechanism or using some mechanism that VMS uses, like RMS file locks. The delay is needed to block a write to a block that's in cache on another CPU, so the other CPU can invalidate its cache. This can be done by techniques like fddriver or ztdriver (remote virtual tape/disk drivers in sig tapes) use, saving the IRP aside until your communicating process that's dealing with the locks gets done its operation and then continuing the operation either with a fake interrupt in the pseudodriver you've got the extra code in, or via some FDT level entry that forks and reissues the IRP along from fork level to get it back to its original track. Incidentally, if your driver is using altstart and keeping its own queues, this gets much more complex. 6. Stealing the post processing queue. This technique involves some earlier access to use the IRP$L_PID hook, and in general you need to at least call COM$POST eventually to complete I/O on the packet. However, you can use this point to edit what the I/O returns. (You can steal the iopost (ipl4) interrupt instead, but the IRP$L_PID hook is much easier to use.) I have heard reports of insertion of close at this point also. Now, in general, you can do $QIO from kernel mode (leaving out issues of synchronization), provided that: 1. All arguments are r/w from KERNEL mode 2. The QIO mode is kernel 3. On VAX, previous mode needs to be Kernel since PROBE instructions use previous mode for their checks. 4. Kernel AST delivery need to be enabled if you wish to avoid hangs, for XQP processed operations like deaccess (=close). The difficulty you have is dropping down from IPL4 to do the I/O without breaking synchronization and lousing up the IPL4 queue. The simplest thing to do is probably to send an AST to yourself and do the operations there; this is well documented. You can request another interrupt at IPL 4 after having requeued the packet (see fddriver sources for a sample fragment) so you can get back, and even get the system to complete the I/O for you. 7. Stealing generic VMS locations. ("blue sky") Suppose we have an IPL 0 site within VMS, a process, an image, or the like and want to insert some process code. (If a site is handled at some higher IPL and synchronization the synchronization issues have to be handled also.) At the site, insert a $cmkrnl call to get into your processing in kernel mode with your own entry mask. (Actual details of bashing a particular site are not the issue here; you need to be able to stash your patch into nonpaged pool and replicate whatever instructions you bash. Your patch will look like this: patch: duplicate instructions IF the process is not the service process THEN store off any desired info from the site movq r0,-(sp) $cmkrnl_s routin=mypatch2 movq (sp)+,r0 END IF return to original site .entry mypatch2,^m allocate a message buffer Fill in with address of patchAST (below), and whatever other info is desired. fill in r3,r4,r5 to point at the buffer and the mailbox UCB of the mailbox set up by your service process call EXE$WRITEMBX to send the message buffer to the mailbox if at high IPL, or just use $qio if not. Free the buffer Set local semaphore for this process Disable some AST deliveries if appropriate Loop in a loop calling SCH$RWAIT until your semaphore is set. (This prevents your process from noticing a wait while the service process runs) Pick up any results from the patchAST and bash whatever is appropriate. Reenable AST deliveries if appropriate. movl #ss$_normal,r0 RET ; patchAST entered via kernel or special kernel AST fired off from the ; service process. .entry patchAST,^M Pick off any arguments from the ACB so we can return them to the "mypatch2" procedure. Set the local semaphore so the SCH$RWAIT loop terminates RET The service process' outline is: Establish the mailbox and stash its UCB address where the patch can get it fast. forever: Read the mailbox Do whatever the process darn well pleases with the information there, having access to the entire machine/cluster/net... Send an AST to the address received in the mailbox message Loop Again, if you're not starting at IPL 0, you need to handle forking issues to get to the desired synchronization. Remember that you can queue ASTs to get to IPL 2 or fork to other levels. What can you do with this? Some example hacks: * Steal RMS entries from a process and get some other process to do things first. (Want to duplicate transactions without buying rms journalling??) * Steal VMS entries and filter them with user mode code any way you please. This can be per process or per system (IF you're careful!) * Stick a patch where DCL comes up with its common error message %DCL-W-IVVERB, unrecognized command verb - check validity and spelling which is ALWAYS the same and boring. Instead, why not have your system generate messages like: The way you type, we could be here all night. Things seem slow to you? I've got four SPACEWARs and an Ada compile running. I don't understand either. That command should have worked. You didn't really want the answer to that, did you? A puff of orange smoke appears and indicates...you screwed up again. A process can readily generate such things, blast them over, and then flag the patch should generate a skip around the code that comes up with the boring normal error message. (The process could also perhaps even try a second time to figure out what you might really want.) * You can from this mode change things like process privs, prompts, priority, etc. Doing this randomly is not so great, but imagine the following actions: "This person is opening the Ada compiler. NOBODY should be running THAT interactively! We'll lower THAT boy's priority to zero and null all his privs..." "Here's someone running TECO. Since he's obviously a wizard, raise his priority to 15 and give him SETPRV." "This is the 500th I/O with this guy in MAIL. Let's teach him a lesson by changing his DCL prompt to "MAIL> "." "This guy is running SYSGEN; he's fair game. Let me scare the wits out of him by changing his prompt to ">>>" and responding to a few commands like a system console for a few lines." "Here's a person who is in MAIL and using a lot of swear words or scatological language. Tsk tsk. He shouldn't have such a foul mouth, so I'll tell my process (HISMOMMY) to take over his terminal and demand an apology, and I won't let him get real control back until he types I'M SORRY." "Joe SystemProgrammer is running games. We don't want the boss to catch him, so make him invisible." "J. RandomUser is running games. Generate some message about an earthquake ruining Colossal Cave and kick him off." (who says we have to be fair??) "The guy in the next cubicle is kind of paranoid and always seems to sign his vaxmail with "top secret crypto nuclear". He just typed that in. So I'll break in on him with some message like %SECURMON-I-FWDNSA, message copy forwarded to approved monitor and then let him go on...oughta really freak him out." *********************************************************************** 0147: Pro-active security Many security tools found on sig tapes or commercially available limit themselves to reporting security problems. This session will discuss a few tools the author has worked on which go beyond reporting and can help prevent security violations from happening actively. Topics will include cryptodisks, write-once disks, access monitors, dynamic privilege and rights controls, network-wide identifiers, and possibly others and will discuss the utility of these techniques in control of system activity. A brief discussion of ways of testing untrusted images will also be given, using commonly available tools. Pro-Active Security There are lots of security checking tools...things like Guess, Password Policy, and the various suites of command files that check for under-protected files, weak passwords, writeable files, and the like. Under unix, there are cops, crack, and tripwire, all of which have been on sig tapes. These are Good Things, and necessary before you can go further. You MUST know what people correspond to what accounts and have some control over who may alter what. Problem with them: They assume you can use the underlying system to close the holes, and that adequate security is possible via just the vendor OS. But the vendor OS may assume a level of trust in an individual. Some people wear multiple hats in multiple colors. What is a "subject" in the security sense? A person? Counterexample: Joe Clerk. Using PAYROLL.EXE on PAYROLL.DAT, his role is "pay clerk". Using COPY.EXE on PAYROLL.DAT, what's his role? The vendor model might at best report access. But if he renames his file copy program "RECORDCHECK.EXE", how will you know? Even keystroke records can be obfuscated. How to deal with this? Make parts of the subject identity vary with what a person is doing. Get the computer to "know" more about what people are doing in realtime. One technique for this involves monitoring terminal activity. This is actually quite good for a lot of things; terminal input, for example, of lots of strings with [*...] in them over a short period is much more concise in telling you the person is looking for something or working with directory trees than file system monitoring. However, DCL input can be disguised. If the user were to use $ a = "[" $ b = "*..." $ c = "]" and somewhat later did $ myhome = a+b+c it becomes very hard for any terminal monitor to tell what is going on. If the characters are hex, even a human has trouble. (a = f$string(91) anyone?) The file system is harder to fool, though. If a user wants to open a file, generally the file system has to know about it and monitoring code added here can be a valuable addition to a security suite. Note that these kinds of things can be done to directory files too; access controls on [000000]000000.dir work nicely. Let me describe some things I've done. By watching access to sensitive files (requires hooking into the I/O subsystem), you can arrange a database driven monitor which can: * Check user permission to use the file, etc. - at time of day - at terminal/workstation - by image ID - by default directory - check file integrity - etc. ---and allow/deny access only after a whole series of tests at much finer grain than normal VMS (unix) ones. * Monitor can get info about what is written so it can dynamically tell if suspicious things are going on based on site criteria, and optionally freeze the access until an authority can check. * Force priv masks to a particular state when running an image. - Semi-trusted image can be used, with guarantee that it never has BYPASS, CMKRNL, etc. - Can enforce confinement of file access by ensuring the image never gets write access to, e.g. SYSUAF.DAT. * Grant/revoke identifiers depending on what you're running. - Useful where the information confinement needs to be more elaborate - Lets you guarantee that an application will ALWAYS have the identifier in effect, so an identifier gets to control an application as a subject. - It can be simpler however to build an access monitor to manage who may open what, using a database as needed, rather than try to manage multiway access controls by adding long ACLs. If coarse granularity will do, though, consider device ACLs. These facilities let you get much finer grain control over what access to sensitive files is allowed. Since the decisions get made by process (user mode) code, audit journalling of the results is simple & essentially free. Another assumption: audit files will be useful. Once you get someone who gets his "favorite set of privs", (thanks, Bruce) ALL BETS ARE OFF wrt audit files. What do do? The simple approach: send the files to a separate machine that ONLY receives data and logs it, as they are written. A PC running a terminal emulator in log-session mode, recording all the audit messages isn't feasible to break into remotely. A periodic dump of audit files to tape is pretty hard to edit also, but not as hard. A disk file, on the other hand, is VERY easy to edit and alter. Someone who can get privs OFTEN knows to go after audit files and how to turn auditing off. If you don't have a spare machine to dedicate, there are alternatives. * A virtual write once, read many disk can be set up. This is a driver and symbiont program pair, which look like a disk to VMS, but use a file on a real disk. A secret transform of the data is used to avoid keeping the file data from being directly interpretable so it must be accessed via the virtual worm. In addition to doing this transform, the driver - Disallows delete altogether - Checks on writes that the block being written has not previously been written and allows the write only where it is new. - Report fatal error when these are tried. The driver must be "told" to allow modifications to directories and the index file. Simple solution is to have a "fence" block, place index file, bitmap, and directories below it. Strength: A file found there cannot be tracelessly modified. Weakness: Files can be scrambled by messing up container file, or headers can be messed up. The files will still tend to be not fragmented (write-once, remember?) and can be recovered with some human intervention if this happens. If you find a corrupted WORM disk, you HAVE a problem. You can also secure files. Against priv'd users only cryptography works. - There are lots of file encrypt/decrypt tools in PD and commercially. Use: > Decrypt file > Use or modify file > Re-encrypt file - Some editors have this built in so file never appears in clear on disk. OK if you want only to edit file. Not so hot for PAYROLL.DAT. - Cryptodisk is a virtual disk that encrypts everything on it, decrypts on access. Can have driver level access list. - Automatic encrypt/decrypt, no forgetting to re-encrypt. - Process that encrypts is a subproc. of disk owner so it automatically goes away if owner does. - Do NOT use DES since software DES is a terrible CPU hog and would inhibit use of system. Use XOR with very long random bitstring instead...blindingly fast. (yes, other algorithms exist...)