Adding Credential Checks to 4.4BSD

The Problem

A commonly asked Unix security question is "how do I prevent my users from executing telnet?". The typical answers to the question are "remove the telnet binary" or "place the user in a restricted shell".

Unfortunately, neither of these solutions is really adequate. Removing "telnet" from the system (or moving it, or changing permissions on it) doesn't work at all; a user can just bring her own copy of "telnet" onto the system and run it. Restricted shells are hard to implement right, hard to manage, overly confining, and are based on fundamentally flawed design (any aspect of any allowed program that allows a user to execute an arbitrary program breaks the restricted shell).

As it turns out, there is no real solution to this problem using vanilla BSD. The system simply doesn't provide operators with facilities to restrict networking facilities from users; the fact that a user can write her own telnet program by using unprotected network system calls means that virtually anything done in userland to address this problem will be little more than a hack.

The Solution

The answer, then, is to modify the kernel to provide the facilities needed to restrict networking from users. To many people, this may seem like a daunting task; the kernel is a mysterious lump of convoluted C code upon which the entire operating system depends. Fortunately, modifying the kernel is easier than it seems; the fact of the matter is, mysterious and convoluted as it may be, the kernel really is just a big C program, and, like any other C program, if you can make sense of it, you can make it do whatever you want.

There are many ways to solve the problem of providing access-control for networking in BSD Unix. The method we're going to look at here --- credential checks based on "sysctl" variables --- is among the simpler ways to approach the problem, and it can be applied to many different scenarios. Additionally, after learning how to add variables to the kernel, and provide knobs for them to userland, it becomes easy to expiriment with kernel hacking by adding knobs and buttons to any area of the kernel that becomes interesting.

Any project that involves modifying the kernel presents some safety problems. The entire operating system depends on the kernel to run; break the kernel, and you will break your entire system. More seriously, the integrity of your data depends on the kernel as well; break the kernel, and you may very well break your filesystems and lose data. You can mitigate these risks in two ways:

Always keep a backup copy of your kernel source and your original, known-good working kernel (in the root directory!).
Always test kernel mods on a non-critical machine.

What is a Credential Check?

In order to prevent selected users from executing "telnet" on our system, we're going to modify the kernel to add a credential check in the networking code. To do this, it would help to know what a credential is.

Every process on a Unix system is represented by an entry in the process table. Each process table entry is an instance of struct proc, which is defined in "/usr/include/sys/proc.h". A proc structure defines a process, including it's open files, resource limits, process group, CPU usage, and credentials.

A process' credentials define the user running it. We're already familiar with the mechanisms Unix uses to identify users --- UIDs (user identifiers) and GIDs (group identifiers).

BSD keeps track of the UIDs and GIDs of a process in a ucred structure. A ucred structure simply contains the current effective UID and GIDs of a process. Each process has a ucred structure associated with it, along with a set of special UID and GID identifiers representing the "real" UID and GID (the original executor of an SUID program, for instance), and the "saved" UID and GID (used when selectively enabling and disabling credentials in privileged code).

A "credential check" is simply a point in the kernel where the current process' UIDs and GIDs are examined to determine if an operation is permitted. In this case, we're going to add a credential check that ensures a user has (or does not have) a certain UID or GID in order to open up a network socket.

Where Do We Add the New Check?

The easiest way to prevent users from accessing the network is to prevent them from opening sockets. In order to send and receive data on the network, a user (usually) must open a network socket and use it as a descriptor for I/O calls. Programs like "telnet" rely on sockets to access the network, and if sockets cannot be obtained, the program will fail. To control access to "telnet", we'll simply cause the "socket" system call to fail for unauthorized users.

It's important to remember that implementing kernel network access control in this manner is a shortcut. "socket()" isn't the only way to obtain a socket descriptor, and socket descriptors aren't the only way to access the network. Other users can use credential passing (a facility offered by the kernel that allows one process to pass a descriptor to another) to obtain sockets and pass them to unauthorized users. Privileged programs may obtain sockets while running with superuser privileges (for instance, "rlogin" will not be affected by this modification). Finally, other kernel facilities (like the portal filesystem) may allow processes to access the network without a socket.

Most of these problems can be addressed outside the kernel. Unauthorized users can only steal socket descriptors if they already control an authorized account, or if they can find a bug in a privileged program. A quick "chmod" on rlogin makes it inaccessible to unauthorized users --- users can't copy their own rlogin over, since it's SUID. Compile portalfs out of the kernel, don't mount it, or use "chmod" to restrict access to the mount point.

How Do We Add the New Check?

The 4.4BSD kernel is a rather large piece of code. It would take quite a bit of time to completely understand the entire thing. Fortunately, we don't need to completely understand the kernel to make minor changes like this one.

The first thing we need to do is find out where the code that handles the "socket" system call is. This is easy.

Look at "init_sysent.c" in "sys/kern". This C file contains the kernel system call switch table, which maps system call numbers to function pointers. Find the system call number for "socket" and look at it's switch table entry. Each entry consists of the number of arguments taken by the system call, and a pointer to the function that handles it. In "socket"'s case, we see that the socket system call is, strangely enough, handled by a function called "socket".

Whipping out trust "grep", we see that "socket()" is defined in "uipc_syscalls.c". Pull up a copy of the file and look at the "socket" function.

The first thing you should notice is that socket(), like all system call handlers, takes two arguments. The first is process table entry of the calling process, and the second is the arguments taken by the system call.

If we wanted to turn "socket" off outright, we could do so simply by adding a credential check at the beginning of the function. To do this, we need to know the UID/GID we want to permit to call socket(), and then, by looking at the proc structure passed to socket(), see if the calling process has these credentials.


#define NETWORKUID	10
#define NETWORKGID	10

	int socket(struct proc *p, struct socket_args *uap) {

		...

		if(p->p_ucred->cr_uid != NETWORKUID &&
			!groupmember(NETWORKGID, p->p_ucred)
			return(EPERM);

		...
	}

The current UID of the calling process is stored in p->p_ucred->cr_uid; checking it is simply a matter of comparing it. Groups present a trickier problem, since a process can belong to multiple groups at once. Although writing a function to check group credentials is trivial, the kernel already provides "groupmember()", which returns nonzero if the credentials of a process indicate that it is a member of a group.

The cleanest way to implement these kinds of checks is to deny access to the calling process if it doesn't bear the right creds, and continue processing otherwise. To deny access, we simply return an error condition from the system call handler; the error we pass will be placed in the "errno" variable of the calling process, and the socket() function will return "-1" to it.

Getting More Specific

We probably don't want to restrict access to the entire socket() system call; socket() is used for other things besides network access, such as local IPC (via Unix domain sockets). In this case, we want to restrict access to networking programs like "telnet", without breaking the rest of the system. It's fairly easy to do this.

The thing to remember is that socket() gives a user an IP network socket only when the "domain" argument to the system call is "AF_INET". To restrict access only to IP network sockets, we might perform access control only when socket() is called with an "AF_INET" domain argument.


		...

		if(uap->domain == AF_INET)
			if(p->p_ucred->cr_uid != NETWORKUID &&
				!groupmember(NETWORKGID, p->p_ucred)
				return(EPERM);

		...

As you can see, we get the "domain" argument to socket() by pulling it out of the "socket_args" struct passed as the second argument to the system call handler.

If you read the system call handler for socket(), you find that it is basically a wrapper around another function (not a system call handler) called "socreate()", which creates new socket descriptor. "socreate()" is really the guts of the socket() system call, and it is used by multiple system calls to create new sockets. It might be a good idea to perform our credential check inside of socreate() rather than in socket(), so that all system calls that create sockets will perform the same access control check.

Finding "socreate()" is easy; just grep for it in "sys/kern". It's implemented in "uipc_socket.c", and it's the first function in the file.

The first argument to socreate() is the domain of the new socket to create. Fortunately for us, socreate() also takes the proc structure of the calling process as an argument, so performing the access control check here is as easy as it is in socket().


#define NETWORK_UID 	10
#define NETWORK_GID	10

	int socreate(int dom, struct socket **aso, 
			int type, int proto, struct proc *p) {

		...

		if(dom == AF_INET) 
			if(p->p_ucred->cr_uid != NETWORK_UID &&
				!groupmember(NETWORK_GID, p->p_ucred))
				return(EPERM);

		...
	}

Two important points here. The first is that we can't just assume that we can "deny access" by returning EPERM in arbitrary kernel functions; it happens that this works here, as socreate() passes an error value through the system call handler directly. Other kernel functions might not do this; it would be an exceptionally bad idea to return EPERM from a function that normally returns a pointer. You'll need to read the functions you're modifying and figure out how they signal error conditions in order to perform credential checks in the guts of the kernel.

The next issue is that socreate() is not as low as we can go into the bowels of the system to perform access control. Indeed, if you read socreate(), you find that much of the work it does is actually performed by another function, which is linked to the type of socket being created (socreate() looks up the "protocol switch" for the type of socket being created, which is a structure containing functions to implement network facilities for a given protocol) --- in this case, socreate() calls the "attach" function for the given protocol.

We could hunt down the "attach" functions for each of the IP protocols we want to control access to. In this manner, we could restrict access to UDP, but not TCP. If we did this, we would find other access control checks, such as the one that prevents anyone but the superuser from creating a raw socket.

Unfortunately, the more specific you get with access control checks that you add to someone else's code, the more likely it is that you'll fail to check an important case, and your check will be evadeable. A balance needs to be struck between limiting as little as possible (and thus keeping the system flexible), and making the access control check as reliable as possible. Checks need to be applied deep enough into the kernel that multiple system calls will have consistant semantics, but not so deep that the check will not apply to all cases.

Making the Check Configurable

We now know how to add access control to arbitrary system calls inside the kernel; we could use the code we've already seen to implement as much control as needed over the system. However, we're missing a fairly important piece --- configurability.

In order to change the UID or GID needed to pass our socket() check, we'd need to edit kernel source, recompile the kernel, and reboot the system into the new kernel. Needless to say, this is not practical for mission-critical systems. Fortunately, 4.4BSD provides us with a facility designed to allow system operators to tune kernel parameters; we can use this to change the credentials needed to access the network, and to selectively enable and disable access control entirely.

This facility is called "sysctl". 4.4BSD sysctl defines a MIB-like table of values that can be retrieved and altered by the system operator; these values correspond to different pieces of kernel state information. Each entry in the sysctl MIB is tied down to an actual C variable inside the kernel; by adding new entries to the MIB, we can allow admins to change selected kernel variables from userland.

The code that implements sysctl() is sick and twisted. Fortunately, we don't need to understand any of it to use it's basic facilities; the header file "sys/sysctl.h" defines macros that can be used to tie global variables in the kernel to sysctl entries.

For our credential check, we need to allow the operator to choose which UID can call socket(), which GID can call socket(), which GID cannot call socket(), and whether to enable or disable socket() access control. Each of these configuration parameters will correspond to an integer variable inside the kernel, which we will tie to the systcl MIB using SYSCTL* macros.

The first step in doing this is to define the actual variables we want to use to configure the kernel. These must be global variables (so that the sysctl() system call handler can access them), although we can place them in any of the kernel modules we'd like. We'll put ours in the same file as the credential check:


	int netinet_uid = 0;
	int netinet_gid = 0;
	int netinet_restrict = 0;
	int nonetinet_gid = 0;

Now we need to export the variables to userland via sysctl. To do this, we will use a macro defined in "sysctl.h" named SYSCTL_INT. SYSCTL_INT takes 7 arguments: the point in the sysctl MIB to add the variable at, the numeric ID for the sysctl variable, it's name, the access permitted to the variable, a pointer to the integer to tie the variable to, the value of the integer, and a description for it.

We don't need to understand all these arguments to use the macro; most of them will remain constant every time we call SYSCTL_INT. The first argument tells us what point on the tree to add the variable to; it's a good idea to look for other SYSCTL_INT calls in the same module and use the same point they do (in this case, "_kern_ipc"). The next argument is the OID number for this variable; we can set this to "OID_AUTO" and have the kernel figure out for us. The "name" is simply the name we want to give this variable; it can be any arbitrary string. Since we're adding these variables as configuration parameters, we need to provide read and write access to them, so the fourth argument is "CTLFLAG_RW". The next argument is simply a pointer to the integer we want to set, and the last 2 arguments are always "0" and "", respectively. So, to export our variables to userland, we simply do:

	int netinet_uid = 0;
	int netinet_gid = 0;
	int netinet_restrict = 0;
	int nonetinet_gid = 0;

	SYSCTL_INT(_kern_ipc, OID_AUTO, netinet_uid, CTLFLAG_RW,
			&netinet_uid, 0, "");
	SYSCTL_INT(_kern_ipc, OID_AUTO, netinet_gid, CTLFLAG_RW,
			&netinet_gid, 0, "");
	SYSCTL_INT(_kern_ipc, OID_AUTO, netinet_restrict, CTLFLAG_RW,
			&netinet_restrict, 0, "");
	SYSCTL_INT(_kern_ipc, OID_AUTO, nonetinet_uid, CTLFLAG_RW,
			&nonetinet_uid, 0, "");

We can then use the variables directly in our credential check:


	int socreate(int dom, struct socket **aso, 
			int type, int proto, struct proc *p) {

		...

		if(dom == AF_INET && netinet_restrict) { 
			if(groupmember(nonetinet_gid, p->p_ucred))
				return(EPERM);

			if(p->p_ucred->cr_uid != netinet_uid &&
				!groupmember(netinet_gid, p->p_ucred))
				return(EPERM);
		}

		...
	}

Our credential check is now enabled only when "netinet_restrict" is set nonzero, and is based off sysctl variables rather than source code constants.

Going Further

This page describes only the basics of modifying the 4.4BSD kernel to provide enhanced configurability and access control. There are many other things that can be done to tailor your kernel to your application; hopefully, the techniques described here will provide a decent starting point for more work.

For example, the "rsh" and "ping" programs present Unix security problems because they need to run as "root" to access protected network resources (privileged ports and raw sockets, respectively). This is unfortunate, because it means that a security problem in either "rsh" or "ping" will allow an attacker to gain "root".

We can limit the scope of a security problem in either of these programs by modifying the kernel to allow "rsh" to bind a privileged port without being root, and allow "ping" to get a raw socket without being "root", by replacing the static credential checks for "root" with sysctl-configurable checks. You can find places where static "root" credential checks are being performed by searching for the "suser()" function, which returns nonzero if the current process is the superuser.

Another Unix security problem is chroot(). It is desireable to limit the use of some system calls (such as ptrace(), which allows a process to take over another process, and mknod(), which allows a process to create a new device) while a process is chroot()'d, because these system calls can allow an attacker to escape chroot().

We can easily determine whether the current process is chroot()'d by looking at p->p_fd->fd_rdir (the vnode pointer to the root directory); this pointer is NULL if the process isn't chroot'd. Instead of checking credentials in system calls, you can add a chroot check, and disable the system call if it is dangerous inside of chroot().

We've seen only a small part of the sysctl interface, which can be used to set strings and tables as well. The sysctl MIB can be extended to add new branches, off of which other variables can hang. We could add a branch containing a variable for each system call number, which could be set to "0" to disable that system call in a given situation (for instance, we could implement "kern.chroot.mknod" to turn off mknod() inside of chroot). Of course, for this to work, you'll need to find the root of the system call handler code, and perform the check before the actual system call handler function is called. Finding the right place to add this check is left as an (interesting) excercize.

A working set of patches to implement the "turn off IP network access for selected users" on FreeBSD 3.0 is available here.