Enterprise Access Control

This discussion is not about authentication (determining that
some subject is who he says he is); there are strong authentication
techniques known.

It is about access control.

Access control is often defined as "measures which ensure that
resources are available only to people authorized to use them".

This is an attempt to computer encode the principle of NEED TO KNOW
which exists in the non-computer world.

However, in the non-computer world, one can always ask someone
for what purpose he needs to know some information, and it is easy
for humans to judge when the use someone is putting the information
to is inappropriate.

Computers don't have the luxury of human pattern recognition and common
sense, but they do have available some information about what use
information is being put to or how it is used.

For example, computers generally can know when access is made, from
where it is made, what programs are being used to do it, what
privileges a user has, how often he is getting at data, and
whatever other session context one has resources to gather.

Sometimes they can know more still. Some companies may computerize
the information of when someone has called in sick, and thus make
available information about folks who are absent. Some places may
record who is in who's family and be able to tell when data accessed
is about the subject or his family.

A more complete definition of access control, which might express
more of what NEED TO KNOW is intended to deal with, could be
"measures which ensure that resources are available only to people
authorized to use them, in the performance of authorized activities".

"Authorized activities" covers a great deal of ground, but obviously
implies that more work needs to be done ahead of time to specify what
activities are OK and which are not. Such specification cannot be
completely done, and very often won't be worth doing. However, most
enterprises have a few extremely sensitive data collections which
can be identified by hand if necessary, and which are the high value
targets for tampering, fraud, or theft.

Rather than attempt to define everything that can be done to a file,
however, we can reduce misuse, fraud, and abuse by specifying some
limitations on what can be done. (It is a useful research topic to
define what the best set of such limits might be.)

This effort is not supported by current operating systems, but needs
the construction of some new technology to permit it. Let us now
define some of this technology and how to build it. Later, we'll
talk about some of the ways such added decision abilities can be
used.


If we focus on access to computer files here, it is clear that an
open operation exists for essentially all filesystems at which point
those OSs which do access control at all will check permissions.
This operation can quite generally be intercepted. In VMS, my Safety
application (see S97 sigtapes) does this with code inserted at FDT
time ahead of open (and a few other file operations). Windows NT can
use a similar design, adding a pseudodriver that replaces the
dispatch table and inserts code ahead of the filesystem driver.
A Linux design can involve replacing some entry points in the
filesystem entry table.

The key technique needed here is to permit one to add a decision
process ahead of the operations being performed, which can either
allow the operation to take place or which can return an error
or take some other evasive action. (Safety allows one to cause
a different file to be accessed, for example, hiding the file which
was not authorized but permitting one to collect information about
what an unauthorized user might have intended.)

It is particularly powerful to allow the decision process to take
place, at least for selected cases, in user mode. User mode daemons
have access to all OS facilities, networking, and even whole
applications. While one does well to avoid such calls for EVERY
decision, they are likely to be wanted for the MOST valuable files
an enterprise owns. (In the Safety intercept, the file number was
available; a simple bitmap in kernel space allowed me to specify
whether a filenumber (modulo 16K in that case) might correspond to a
file that should be examined via a short kernel thread or, if
that indicated possible interest, by a daemon in user mode. For
files not marked, this allowed processing to continue with very
few added instructions.)

Schematically, then, we construct a piece of kernel code which is
inserted at a suitable internal interface. At this point, it must
save the entire machine state (including such niceties as previous mode
in DEC processors, or the equivalent in others). The simplest kind
of intercept simply stalls at this point using a convenient process
based wait primitive. No new wait primitives are needed; as a general
matter (and specifically for NT or VMS) the process context is still
valid to allow the process to wait at this point. It must then
send a message to a second daemon process which can be preregistered
with the kernel intercept, sending it whatever information is available
about the access being requested and the process requesting it.
The process then must use whatever information is desired to make
access decisions and cause the wait to end, at which point the
context is restored and the processing either is terminated with an
error or is allowed to continue as normal. It can be added that the
error returned need not signal a security violation to the accessor
denied access; it may be preferable to indicate some obscure hardware
error, for example. The accessor does not, after all, need to know
that he failed an access check.

Similar logic can be used for other operations like create, delete,
or even directory lookup (provided a clean directory lookup interface
exists; in VMS none did, really, until Spiralog came along. NT
starts life with a cleaner one. Unix comes with a clean interface
also.)

The processing described above must of course ensure it does not
interfere with whatever the daemons it uses do, nor with internal
I/O done by the filesystem itself or by lower level drivers. Failure
to observe this restriction leads to hangs in general.

The nature of the actual access decision can be almost anything.
The daemon is a user mode process, and can be supplied at least
locally with the file identity being accessed, its name, the accessing
process name, and locally can in general find out the user doing
the access and the program the user is using. Time and location
information is also generally available locally. (The time overhead
of such a system, even where it has to do full examination of a
local database, was measured for Safety as a tad over 1% in
runtime.)

Where one is dealing on a network, the remote user and remote application
may be available only on the source machine for the connection.
Thus, a robust network system will have a (correct) crypto-secured
communications algorithm to ask authenticators at the source machine
about whatever it wants to know about the connection. Such a system
would of course need caching information to avoid imposing unacceptable
delays, and would need a logging strategy to supply access information
for accesses it mediates. (Attempts to access very sensitive files
are after all very high quality information for use by intrusion
detection algorithms. Our intercept collects these with very low
overhead.)

The result could be a system which allows one to regulate access to
files based on

   * Permitted and forbidden roles
   * Permitted and forbidden programs
   * Permitted and forbidden times of day
   * Permitted and forbidden locations
   * Whether the user has "too many" privileges enabled

and on others such as denying access where a user appears to be coming
from two different locations at once, where a user is recorded not
in the facility, etc. Also it would be simple to forbid access where
the user is running an image known to run autonomous code or code
downloaded autonomously, as for example an ActiveX applet.

The simple example is that we can let our clerks access our
general ledger files or customer lists with the GL or Customer
applications, but keep them from doing so with a copy app.
Provision for "override" is needed of course. A defragger, for example,
wants to see everything...

In general it is not feasible to add new on-disk structures to hold
the new information being stored here to control accesses. Rather, the
daemon needs to maintain its own data files and records to track access
permissions. (These permissions must be by something immutable like a
file number, inode number, etc., since file name can generally be easily
altered. Safety has provision to look up filenumber if it is not made
directly available by the filesystem interface. Because the daemon's
accesses are not interfered with, it can do this lookup using supported
calls.)

Thus we are adding our own databases (and protecting them with the
new facilities!) to an existing filesystem, "on the side" of the
existing storage. This means that we are not constrained by the
underlying design of the filesystem; it also means that these
databases, being accessed from user-mode code, can be made to
reside anywhere. An enterprise wide, replicated database talking
something like LDAP with an IPSEC authentication layer is a perfectly
reasonable structure to erect to hold the added information where we
want to keep our added information in a way which doesn't impose a
single point of failure. (The daemon will offer a convenient place to
keep cached information in its own per-machine address space as well,
and the sizes can be adjusted to prevent excessive network traffic.)

Do All We Can

While the described structure will allow us to keep added security
information about our filesystem, the mechanisms here actually permit
a more general structure. Where normal files are opened, a filesystem
request presents some file information and issues a request to locate
the file involved. The same position in an OS which handles OPEN
requests also handles directory reads and lookups, and we can insert
our own processing ahead of the normal processing where we want to
change what goes on.

The normal lookups which are done present a file identifier and get
retrieval information for that file. Normally, this kind of operation
becomes inefficient as directories grow. It is possible instead to get
the retrieval information and feed it to a DBMS which presents
some or all of the retrieval information, perhaps modifying the
request on the way by. (DBMS systems are designed to make their
retrieval fast even for large amounts of stored data, which is why
we use one.) The part of the information stored could be
a volume on which the information resides, for a very simple kind
of operation. A more interesting case however is that which happens
when the DBMS stores security and other information along with the
file identifier. Thus the query can be implicitly qualified by
"environmental" information, so that files a person is not permitted
to access would to a degree not even be visible. By encoding content
information in the relation so stored, which can be gathered with
techniques similar to those used to index the Web, one might
allow only files with selected content to appear.

The possibility of content sensitive directories is extremely
important generally. Consider how difficult it is to find files
referring to some particular topic where the filename gives no help.
Normally it requires a very time consuming linear search, and
if the search is not carried out by the author of the information
it can be quite difficult to succeed with it. Where a DBMS
exists to do the searching, the normal filesystem can be of value
in classifying things.

One would perhaps implement the desired "environmental" information
by adding in some "syntactic sugar" into directory pathnames
to encode added qualifiers.

Imagine a change directory command like

cd $key=payroll$

to show files within whatever else was in the path only if they
contained keyword "payroll" in them. The intercept layer we create
will see and can strip these requests in passing them to its
DBMS (so any lower level requests passed on don't have such oddities
buried within). Programs will need no new changes to use such a
system, so they can already work with directory paths or defaults.

However, searches for desired information might run three orders of
magnitude faster, because appropriate directory information would
have been built into the system.

Generalizing the notion of "directory" in this way, from a list of
filenames to a list of access retrieval and selection information,
will be a vital feature of enterprise systems in the future. Just now,
nobody has it. Eventually everybody will need it.

This gives a structure which offers value not only in the enterprise
security area, but which has uses in general computing in that it
will give machine assistance to the problem of finding or classifying
information in almost any desired way. While what has been described
is not also of use directly by other databases, providing a network
API in our distributed LDAP - using database will encourage DBMS vendors
to allow their products to check the same access controls.

It should be noted that by simply counting accesses to files, and
making the counts or count rates available to the checking code,
many abuses and intrusions can be caught early. Modern DBMS
systems are gaining the ability to handle time based information
and even simpler systems based at each host and reading limits
can be useful in some situations.

Consider a Social Security Administration employee caught a year
or so ago. She had been working with a credit card fraud ring,
supplying authentication information to the ring from SS databases
she had normal access to. Her accesses were not unusual, except that
she was looking at thousands of records per month where everyone
else was looking at hundreds per year. Where insiders are abusing
or stealing data, access rate checks are likely to point it up.

Who Can Build It?

Obviously an OS vendor is in a good position to build a system like
this.  However, a third party can also do it; required effort is
perhaps max 2-3 man years, and Safety is an existence proof that it can
be done.  An OS vendor must be concerned with whether such a system
can be booted from and may add other constraints, such as an
insistence on building everything in kernel mode code, which will
make doing this slow, hard to test, and unreliable in that
kernel mode tends not to tolerate program faults. A third party
is generally in a better position to avoid these problems. Those
businesses who can identify their, say, dozen most valuable files
and/or who find significant time used in finding files of certain
contents - and they are many - will see without much difficulty
the value of a system which can reduce risk to those files.

The filesystem call interface has been highly stable in VMS, NT,
and Unix for some time, and is likely to remain so. The device driver
interface and to some extent the filesystem interface have been
documented and used for a variety of vendor code. In the case of NT,
where the vendor has in general been willing to change internal
interfaces, the filesystem interface is nevertheless one which has
been stable (and very similar to the VMS one) for several versions
and which is used by a variety of different pieces of vendor code
whose replacement would further destabilize a system for which
stability is the current key problem. The system described above
does not remove the vendor access controls, nor his lookup, but
adds new processing above it. This processing may (perhaps will) make
the previous processing redundant, but it does not remove it nor
make it behave differently. A third party vendor writing such a
system might find, of course, that his system became the preferred
way for developers to control security or access.

The security enhancement aspects of the system make it worth having
even if only a small set of enterprise files are selected for
added protection. The data location aspects are of course more
useful for larger sets of data, but the characteristic of adding
processing are that as much or as little of a set of files can be
access enhanced as desired.

A final enhancement is to be noted in that the file deletion
calls are also captured. This makes it possible to check whether
one should save off a copy of a deleted file before it is actually
deleted, making possible a clean undelete operation, or making
it possible to archive (selected) deleted material. Safety allows
selection by all or part of filename (so one need not save .LIS,
.MAP, .OBJ, .TMP, for example). It has been suggested that such a
facility allows one to treat mail files normally yet still keep
copies for historical archives as the US Government regulations
require.

Finally, it is clear that such a system can be built by anyone on
a base like Linux, where the sources are freely available. It is
also clear that network access as well as file access can be
controlled. The key point in selecting a control locus in an OS is
that control should be intercepted at as low a level as possible
above that which defines an abstraction. By intercepting just above
a filesystem, we can control that abstraction at a point where
other OS components do not modify it or provide back doors, and
can therefore build a tight control system whose characteristics
are well known to its builders. To a degree, the person who does this
takes control of the abstraction from the OS vendor.

Because an extended control facility will have its own controls,
access controls and information classification will take place within
the extended system, not the native OS facilities. The extended system
can of course track the native OS one where the native interface is
clean and passes through filesystem calls, but in general one
must prepare customers for the idea of setting up defaults in a native
OS but using the extended system for finegrained controls. The
inclusion of extended data location facilities, and extended delete
protection, in a product is probably essential for the success of
a third party product. It is fortunate that access by content is
provided basically nowhere at present, and that reasonable protection
from accidental erasure is rare. Everyone has the "Oh, no!!" experience
of accidentally deleting something, and the "now WHERE did I put that??"
experience. And more corporations wind up finding out about fraud
or abuse way after the fact (looking at audit trails) or not till it
turns up in the news than generally want to admit it.

What is built here can be viewed as a super-firewall which lives
just above each filesystem. It is as it must be network aware, but
there is minimal code below it which can be spoofed or made to
malfunction. It won't care if data on the net is encrypted. Rather
it will provide access protection and intrusion detection in a
sensitive, low-overhead way, and offer some unique services in the
bargain. The company which implements it will provide great value
to the industry, and will have extended both security and filesystem
technology in ways which are for the most part not even yet available
as vaporware.


Glenn C. Everhart

 
Appendix: Generalizing Directories in VMS

During the talk, I spoke of generalizing the notion of "directory" by
using a DBMS lookup instead of the normal directory lookup. This can
be accomplished for VMS by extending the intercept used in Safety
(which was roughly a man-year of effort {stretched over a longer
period} for resource sizing).

What is to be set up is a processing layer below RMS and above the
XQP which gets inserted transparently to the rest of VMS. This is
accomplished by intercepting FDT calls also for functions that
do directory reads and by parsing directory names. The same logic
used in Safety to perform soft links (i.e., resetting channel UCB
where a different device is needed and having underlying opens done
by File ID to actually open a file) allows access to existing files
to be handled. The conditional softlink operations in Safety already
do almost everything needed for this kind of thing. To create new
files, we must create the file on a selected volume and then
before returning to user we must enter it into our database.
Note this logic already exists in Safety sources (see Spring 1997 VMS
SIG tapes), though the user mode parts of create support are not
complete. In that code the idea was to allow the daemon that
handles directory dbms lookup create the file and pass its ID to
kernel code, which changes create to open existing for that file.
Since the intercept doesn't touch the daemon's I/O, this keeps it
straightforward to get it working, while a kernel thread such as
is used for open existing file can be constructed to reduce context
switches.

The functions which read directory entries are most painful to
alter for ODS-2 disks in VMS because of the many shortcuts added
since VMS 1.0. Fortunately, when Spiralog was written, a clean
interface to these functions was added. By declaring a file structure
to be that of Spiralog, we can get entries for reading directories or
directory entries which can be interecepted. (A difficulty which may
yet exist is the "direct call" interface which exists to Spiralog.)

In principle, only the open and create mods, using a real Spiralog
disk with empty files but pointers to other files (soft links), could
essentially allow a Spiralog directory structure to control many
ODS-2 or ODS-5 disks. This is however an aside; we want to add much
more content to directories to facilitate search and security needs.

It should be added that the perturbations to this design to get it
running on Windows NT are not large; the directory interface there
is already clean, and file system intercepts are defined and used
by various Microsoft code; their presence makes it unlikely the
basic interface will be altered. Signalling will need to use
special ioctl primitives instead of invented fdt-time function
calls, but the overall logic remains the same. For unix flavors,
softlink and read-directory primitives are already present, so
need not be invented as they need to be for NT (and as they were
for VMS). The soft-link operation amounts to a close and open new
file, which would hopefully not be overly difficult to do in NT
just as it can be in VMS. In Unix systems, the obstacle is more in
those filesystems which lack open or close.

Note that the user identity in these systems is presumed to be defined
in a common way, so that each system can know users without conflict.
This can either be set up by hand, by another scheme that ensures
all systems understand users in common, or by a distributed
autentication system. When building a distributed access control
system it is most reasonable to include authentication information
as well, but distributed authentication is outside the scope of this
paper. It suffices to mention that it is the most reasonable implementation
strategy.

What we construct in this way can be viewed as a sort of virtual
firewall, located just above the defining point of file abstractions
(or by extension of network ones) so that the amount of code below
it is minimal. As VPN technology causes more and more signals on
networks to be encrypted, it will be clear that this kind of host
based strategy will become more and more the only usable one for
controlling access. That it also provides vast increases in the
ability of machines to help users find information is a benefit
which may make it more feasible to develop such systems.