A Programming Fusion Technique for Windows NT by Greg Hoglund Tue Dec 07 Tue Dec 07 1999 1999 A Programming Fusion Technique For Windows NT A Programming Fusion Using c/c++ and assembly together under Windows NT to stackguard, Technique for boobytrap, and otherwise get your hands dirty. Windows NT Part One Thu Nov 18 1999 -Greg Hoglund, 1999 ( http://www.rootkit.com ) Interpreting Copyright Security-Focus.com 1999 Network Traffic: A Introduction Network Intrusion Detectors Look Assembly language is a useful weapon. I must admit that assembly is at Suspicious complicated for many. SO many of us program in 'c' or c++, and yearn for Events leverage of assembly. Think about patching the interrupt table under NT. Think about writing steel-belted inner loops. Think about stack-guarding Tue Nov 02 and boobytraps. All of this granted by assembly. The difficult part comes 1999 with the development environment. Do you have TASM 5 installed? How about Implementing a SoftIce? As it turns out, you don't need to for some tasks. Right there in Secure Network your MS Dev Studio - with VC++ - you can brandish the power of machine language. Through a few simple tricks you can begin down a deep and secret Tue Oct 19 path. And to go there, you don't always need SoftIce - or TASM. All you 1999 need is VC++ and some time to forge ahead. THE TRINITY OF A QUALITY INFORMATION The power of C SECURITY PROGRAM v2 Higher level languages give you an incredible performance tool. Coding in C Wed Oct 06 or C++ can take far less time than debugging and writing code in assembly. 1999 If your a hard case, you CAN write in 100% assembly. The point is that you The Last Line don't *have* to go there. You can still benefit from assembly. Using VC++, of Defense, you can exploit the powers of your runtime libraries, your graphics Broken libraries, and even PHAT codebases like MFC. I personally would rather use the MFC window classes than re-write my own from scratch. Sure, it's Tue Sep 21 bigger, slower - but those aren't the parts of my program that *need* 1999 speed. There is a trade-off and it's based on time. Think about this - the Auditing Your faster you can crank out apps - the more you will make - the faster your Firewall Setup armies will spread - the more people who will download *your* tool instead of tomorrow's tool. Thu Aug 26 1999 Using this Fusion technique you can still take advantage of OpenGL macros How to Get A and libraries, MS-DevStudio app-wizards, MFC, C++ Classes, DCOM, and the Real Security Standard Template Library. Using Fusion technique you can stack-guard your Budget subroutines, patch interrupts, hook system calls, overwrite system tables, make your own system calls, use undocumented functions, exploit systems, Wed Aug 25 write solid inner loops - Use it or lose it. 1999 Cautionary A simple 'hello fusion' Tales: Stealth Coordinated The following code demonstrates a simple use of __declspec( naked ) and Attack HOWTO shows how you can write your own code, unhindered by the VC code generator. The benefit is a crux of control. Like many engineers - I can't *stand it* Mon Aug 23 when I can't control my code. Here is the source: 1999 Why void _func1(void); Crypto-Control Will Fail void main(void) { Wed Aug 11 _func1(); 1999 } The Internet Auditing void __declspec(naked) _func1(void) Project { __asm [ more ] { ret } } As you can see, you must supply your own 'ret' instruction - which means "return". Since we have declared the function naked, we must manually tell the processor to return. The normal, invisible code that makes that happen for us is *not* generated when we declare a function naked. Everything is manual. Lets look at what the assembly language looks like for this code (output as shown by MSDev debugger - hit ALT-8 when debugging to see this): 12: void main(void) 13: { 0040A440 55 push ebp 0040A441 8B EC mov ebp,esp 14: _func1(); 0040A443 E8 C2 6B FF FF call @ILT+10(?_func1@@YAXXZ)(0x0040100a) 15: } 0040A448 5D pop ebp 0040A449 C3 ret 16: 17: void __declspec(naked) _func1(void) 18: { 0040A44A C3 ret 19: __asm 20: { 21: ret 22: } 23: } At first, the program sets up for the main() call - that is the ebp, esp stuff in the first two lines. Next, we actually call our function - that is the 'call' statement. As you can see, you function is exactly 1-BYTE long ('C3') and says 'ret' - return from the function. Trace this yourself in the debugger to witness how this works. The next two lines simple setup the main() function for exit. How does this differ from a normal, everyday call? Well, lets look! A normal call would *not* be declared naked. It would look like this: void _func3(int a); void main(void) { _func3(1); } void _func3(int a) { a = 0; } Note that our function takes one argument. For simplicity, we are simply going to zero it out. Lets look now at the corresponding assembly language: 12: void main(void) 13: { 0040A440 55 push ebp 0040A441 8B EC mov ebp,esp 14: _func3(1); 0040A443 6A 01 push 1 0040A445 E8 C5 6B FF FF call @ILT+15(?_func3@@YAXH@Z)(0x0040100f) 0040A44A 83 C4 04 add esp,4 15: } 0040A44D 5D pop ebp 0040A44E C3 ret 33: void _func3(int a) 34: { 0040A451 55 push ebp 0040A452 8B EC mov ebp,esp 35: a = 0; 0040A454 C7 45 08 00 00 00 00 mov dword ptr [a],0 36: } 0040A45B 5D pop ebp 0040A45C C3 ret For the most part it looks the same as our previous example. Because we are passing an argument we must first 'push' the value on the stack. Note the 'push' directly before the 'call'. Also, because we aren't using the 'naked' directive, the compiler has created a stack frame for our function. Note that the base pointer (ebp) is pushed onto the stack and set to the current stack pointer (esp). The 'mov' instruction corresponds to the "a = 0". When the work is all done, we must pop the base pointer back off of the stack and return. Finally, the compiler corrects the stack position for us - note the add esp, 4 - which moves the stack pointer back to it's original position prior to the call. All of this was done automatically. Now, you ask, why would anyone wish to use 'naked'? Well, lets say you want to pass all of your arguments in registers. This can be done for performance reasons - and using a naked function you can do this. For example: /* takes argument in eax, adds 6, and returns result in eax */ void __declspec( naked ) _function(void) { __asm { add eax, 6 ret } } void main(void) { int result; __asm mov eax, 10 result = _function(); } For more complicated functions, or when calling a function series, this can be very useful. You can track buffer position in one register, while storing a function pointer in another, and storing a heap pointer in a third. FLAGS could be stored in yet another register. If calling several hundred functions in a row on a single dataset, this removes all of the function prelude - epilog and function stack. This can increase your performance quite a bit. The following function takes a pointer to a structure in ebx and dereferences it. It assumes that a pointer to a destination buffer has already been set up in edi. It's functions like these that can be ganged together to work on large datasets. Keep it in the registers! struct _function { void *mRunFunction; void *mHeapData; /* stores state in each instance */ struct _function *next; }; void __declspec(naked) f_charseq(void){ __asm { call [ebx] ; do some work mov esi, [ebx + 4] ; mHeapData mov eax, [esi] ; move a number mov [edi], eax ; mov to destination and ; increment destination pointer inc edi ret } } Pulling some neat tricks Now that you know the technique, lets use it for some useful tricks. One trick I recently developed is stack-guarding your function calls. A stack-overflow depends, usually, on the ability to overflow a locally defined variable (sometimes called an 'automatic variable'). These variables are allocated on the stack. If they are overflowed, then it follows that the stack can become corrupted. Obviously, then, the return address can be overwritten and the buffer-overflow delivers it's payload. Let's explore a trick to prevent the stack-smash from working. The following function stores an extra copy of the return-address onto the stack. Because of the way VC++ sets up for a function call, our *manual* 'push' places a copy of the return address LAST on the stack. This is important because any buffers local to this function come BEFORE this value, and will be filled in the *opposite* direction (away from our saved return address). What this means is, no matter how much of an overflow is cast, this last value we pushed cannot be overwritten. The code: void _func2(int a, int b) { // stack guarded function // note that this 'push eax' places copy of eip // *LAST* on the stack, and therefore safe from // any local buffer that may be overflowed __asm { push [ebp + 4] ; eip } char s[10]; a = 5; b = 10; strcpy(s, "XXXXXXXXXXXX"); // pop saved EIP and check, throw // debug interrupt if things aren't cool. __asm { pop eax cmp eax, [ebp + 4] je BUFFER_OK int 3 BUFFER_OK: } } At the end of the call, we pop our saved value and check it against the real return-address. If they do not match we throw a debug-break (interrupt 3) and the program halts w/o any damage to the system. The hacker will have only succeeded in a DoS attack. Of course, we could throw an exception and gracefully exit the program. Additionally, we could gracefully restart the program also - or even handle the exception internally. I leave these as exercises for the engineer, as these are fairly design-specific decisions. The technique is sound. If you want to make your code look cleaner, use the following MACRO's: #define START_GUARD __asm push [ebp + 4] #define END_GUARD __asm \ { \ __asm pop eax \ __asm cmp eax, [ebp + 4] \ __asm je BOK \ __asm int 3 \ __asm BOK: \ } void _func2(int a, int b) { START_GUARD char s[10]; a = 5; b = 10; strcpy(s, "XXXXXXXXXXXX"); END_GUARD } Lets take another step and overwrite the system service table. Because kernel-memory (the upper few gigs of address space) is off-limits to the lowly user-process, you must write a driver or native application. The native application runs in ring-0 and has access to kernel structures. As a note, leveraging the SE_DEBUG privilege to write to kernel-memory doesn't work from user-mode. As soon as you try to query a remotely interesting page of memory, you are denied access. VirtualQuery() simply fails to return anything. Try to query anything up in 0x80000000 and you'll see what I mean. This doesn't leave us in the dust however - SE_DEBUG *will* allow us to inject code into other processes - and this can be interesting. Hooking the interrupt table under NT Inline assembly within VC++ turns out not to be so fruitful. A simple use of the 'sidt' instruction, in a feeble attempt to load the interrupt table address, returns me the following scrub: fatal error C1001: INTERNAL COMPILER ERROR (compiler file 'E:\utc\src\P2\x86\inasm.c', line 471) Oh dear Microsoft, thank you for your help. It turns out the we actually *can* use this instruction, of course, but only from the DDK build utility - not from VC++ itself. Also, TASM will not compile that instruction either. TASM returns the following error: error** maingui.asm(322) Illegal instruction for currently selected processor(s) This complicates things a bit and means you must have the Windows NT DKK installed. Writing a driver is rather simple, however, so let's explore patching the interrupt table under NT. The following code is a basic driver: #include "ntddk.h" #include "stdarg.h" #include "stdio.h" // print macro that only turns on when checked builds are on #if DBG #define DbgPrint(arg) DbgPrint arg #else #define DbgPrint(arg) #endif NTSTATUS DriverEntry( IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath ) { DbgPrint ("Entering DriverEntry\n"); /* for now all dispatches point to the same place */ DriverObject->MajorFunction[IRP_MJ_READ] = DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverObject->MajorFunction[IRP_MJ_CLOSE] = DriverObject->MajorFunction[IRP_MJ_FLUSH_BUFFERS] = DriverObject->MajorFunction[IRP_MJ_CLEANUP] = DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = OnDispatchGeneral; return STATUS_SUCCESS; } NTSTATUS OnDispatchGeneral( IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp ) { PIO_STACK_LOCATION currentIrpStack = IoGetCurrentIrpStackLocation(Irp); PIO_STACK_LOCATION nextIrpStack = IoGetNextIrpStackLocation(Irp); /* Default to success. */ Irp->IoStatus.Status = STATUS_SUCCESS; Irp->IoStatus.Information = 0; return STATUS_SUCCESS; } This code will compile under the NT DDK. I won't go into detail here on making a correct 'SOURCES' file, or 'Makefile' - there is plenty of sample code in the DDK that can help you build an environment. Lets explore something a little more exciting - patching the interrupt table. The interrupt table on x86 processors is pointed to by the IDT register. There are two instructions for dealing with the IDT register, 'lidt' and 'sidt', which load and save the value of the register, respectively. Using a driver, which is running in ring-0, will allow us to use these instructions. User-mode (ring-3) does not have the privilege of loading a value into the IDT register. The IDT register contains a 6-byte value. The first 32 bits contain the base address of the IDT itself. The last 16 bits contain the size of the IDT. The IDT under NT contains 256 'entries'. Each entry corresponds directly with an interrupt. Each entry in the IDT (when running under NT, hence protected-mode) is exactly 8 bytes long, therefore the IDT under NT is exactly 2KB in length. These entries are called segment-descriptors. Let's talk about these. IDT entries, or 'segment-descriptors', can one of three types. The first and most important is called an 'interrupt-gate'. There is also a 'task-gate' and a 'trap-gate', but I am going to focus only on the interrupt-gate for now. When an interrupt occurs, the corresponding interrupt-descriptor is read. Within the descriptor are two values - a code-segment and an offset. Under NT, the code-segment is almost always set to 0x08. The processor will look up the corresponding code-segment in the GDT (Global Descriptor Table) to find out where it starts in memory. Code-segment 0x08 starts at memory location 0x00000000 so this is fairly easy. Then, the processor will add the offset and jump to the corresponding code. So, when debugging around, you can almost always simply jump to the offset without worrying so much about the GDT. You can explore all of these structures using SoftIce with the 'gdt' and 'idt' commands. The structures for the IDT register and the IDT descriptors can be defined as follows: #pragma pack(1) typedef struct { WORD offset_lo; WORD selector; BYTE reserved_lsb; unsigned char reserved_msb:5; unsigned char DPL:2; unsigned char SEGMENT_PRESENT:1; WORD offset_hi; } IDT_DESCRIPTOR; typedef struct { WORD size; WORD base_lo; WORD base_hi; } IDT_REGISTER; #pragma() In our driver, let's go ahead and install our own hook into the IDT. We can hook any interrupt we choose, but for demonstration purposes I am choosing interrupt 2Eh. Interrupt 2Eh is the System Service interrupt and is used by user mode programs to call any function contained within NTDLL or NTOSKRNL. As you realize, this can be fairly powerful. The code to actually hook the IDT is as follows: struct IDT_REGISTER gRegister; struct IDT_DESCRIPTOR *gHeadDescriptorP = NULL; struct IDT_DESCRIPTOR *g_2E_DescriptorP = NULL; void *gSystemCallPtr = NULL; __asm lea eax, gRegister __asm sidt [eax] ; load the IDT register in gRegister gHeadDescriptorP = MAKELONG( gRegister.base_lo, gRegister.base_hi); Find the interrupt 2E descriptor: g_2E_Descriptor = &gHeadDescriptor[0x2E]; Get the function address that is stored within the descriptor: gSystemCallPtr = MAKELONG( g_2E_Descriptor->offset_lo, g_2E_Descriptor->offset_hi); Replace the address stored within the descriptor: __asm { cli ; disable interrupts lea eax, MyHookFunction mov ebx, g_2E_Descriptor mov [ebx], ax shr eax, 16 mov [ebx+6], ax sti ; re-enable interrupts } Note that we must disable interrupts while messing around with the IDT. We wouldn't want an interrupt to be services while we have half-loaded the descriptor with a new address! ;-) Also note that we are replacing the address in the descriptor with that of a function called 'MyHookFunction'. Let's explore that now. To hook the interrupt we want to inject our own code that must be run. When our code is finished, we want to call the original code. To do this we are going to revisit our old friend __declspec( naked ). __declspec(naked) MyHookFunction() { __asm { // call number is in eax // do something here... jmp gSystemCallPtr; } } Obviously we could pull off a number of tricks here. We could, for instance, determine the process-ID of the caller. Easily, we could alter the parameters that are being passed. We could even 'add' functionality to certain system calls. We could even add our own system calls! (And this without actually adding them to the system service call table ... rather stealthy eh?). Virii and Internet worms will have a field day with such technique. We could hide all sorts of information from user-mode. We could hide processes, files, and even redirect requests based on process-id. Malicious yes, but there are many good & legitimate uses for this. For example, when designing a host-based IDS recently, I was able to profile system-call usage based on process. This enabled me to build an anomaly detector. Furthermore, there are several good books on neural networks on the market (with source code) - you could easily build a neural network process-profiler with this technology. On a simpler note, you could simply watch to see what files are being opened or touched with NtCreateFile(). You wouldn't need to have NT auditing turned on - it just works. Obviously debugging is another use for a technique such as this. Other interesting ideas include enforcing independent access-controls on files (think B1 security), user-profiling, and tracking resource usage. Getting the process ID is a bit more tricky than the hook itself. While reversing NTOSKRNL, I find that the following location is called when interrupt 2E occurs (unassembled in SoftIce): :u 805b3e31 L 40 0008:805b3e31 push 8013cbd0 0008:805b3e36 jmp 8055f2ab 0008:805b3e3b xor [eax],eax 0008:805b3e3d add eax,fs:[eax+66290468] 0008:805b3e44 sub cl,5a 0008:805b3e47 xor bh,al 0008:805b3e49 push dword ptr [ebx] 0008:805b3e4b add [eax+ebx-80],ah 0008:805b3e4f push 80033044 0008:805b3e54 jmp 8053681c 0008:805b3e59 xor al,00 0008:805b3e5b add eax,fs:[eax+66726468] 0008:805b3e62 sub cl,e9 0008:805b3e65 sub eax,edi 0008:805b3e67 invalid 0008:805b3e6a add eax,fs:[eax+13c37868] As it turns out, this location is within SoftIce itself - so I cannot set a breakpoint here. SoftIce is up to something fishy - but it's a debugger and could be up to anything. I followed this a little further and ended up in the following code (within NTOSKRNL): --------------------------------------------------------------------------- 8013CBD0 8013CBD0 loc_8013CBD0: ; DATA XREF: INIT:801C7A50o 8013CBD0 push 0 8013CBD2 push ebp 8013CBD3 push ebx 8013CBD4 push esi 8013CBD5 push edi 8013CBD6 push fs 8013CBD8 mov ebx, 30h ; in GDT at FFDFF000h 8013CBDD db 66h 8013CBDD mov fs, bx save old value. Usually -1, but sometimes a pointer: 8013CBE0 push dword ptr ds:0FFDFF000h put -1 in place: 8013CBE6 mov dword ptr ds:0FFDFF000h, 0FFFFFFFFh setup pointer to struct: 8013CBF0 mov esi, ds:0FFDFF124h save the old value of ___ : 8013CBF6 push dword ptr [esi+137h] make room on the stack: 8013CBFC sub esp, 48h 8013CBFF mov ebx, [esp+6Ch] ;some variable 8013CC03 and ebx, 1 new value into struct: 8013CC06 mov [esi+137h], bl prep for call: 8013CC0C mov ebp, esp mov from struct to automatic @ 3Ch: 8013CC0E mov ebx, [esi+128h] 8013CC14 mov [ebp+3Ch], ebx 8013CC17 mov [esi+128h], ebp 8013CC1D cld 8013CC1E test byte ptr [esi+2Ch], 0FFh 8013CC22 jnz loc_8013CB48 It looks like the FS register (a data segment register) is being loaded with 0x30 - checking the GDT reveals that the data segment 0x30 points to memory location 0FFDFF000h - just like the code shows - imagine that. A couple of offsets are being checked from that data-segment so let's dump them: at offset 124 there is a pointer to some structure, we store that in esi: mov esi, ds:0FFDFF124h Offset into the structure, at 137, is something interesting: push dword ptr [esi+137h] also offset into the structure at 128 is something: mov ebx, [esi+128h] finally, I note another tidbit at 2Ch test byte ptr [esi+2Ch], 0FFh Digging around in this pile isn't too fruitful - but the process id ends up being at offset 0x01E0 from esi - so it is possible to now learn the process responsible for the system call: __asm { push edx push fs mov bx, 0x30 mov fs, bx mov edx, fs:[0x1E0] ; edx has process id // do something with this data pop fs pop edx } Okay, enough said. Next week we will go back to user-mode and take a look at leveraging the SE_DEBUG privilege to inject code into other processes. To Be Continued... [ Post a reply ] [Image] Discussion No comments have been posted. copyright Interested in advertising with us?