Since the dawn of KVA Shadowing (KVAS), similar to Linux’s KPTI, which was developed by Microsoft to mitigate Meltdown vulnerabilities, hooking syscalls among other potentially malicious things has become increasingly difficult in Windows. Upon updating my virtualization toolset which utilizes syscall hooking strategies to assist in control flow analysis, I had trouble when trying to add support for any Windows version with KVAS enabled. This is due to Windows mapping the syscall handler KiSystemCall64Shadow
to the kernel shadow page tables. So upon attempting to hook system calls using the LSTAR MSR, I found that the only way to do so was by manually adding my custom LSTAR system call handler to the shadow page tables using MmCreateShadowMapping
. This worked well up until the Windows 10 1809 update. Since the 1809 update, the pages of the shadow mapping code in the PAGE
section of the kernel are discarded shortly after initialization.
After brainstorming possible solutions, I decided to take a shot at hooking using the Extended Feature Enable Register (EFER) in order to exit on each SYSCALL and subsequent SYSRET instruction and emulate their operations (you can find the definition of the EFER MSR in the Intel Software Developer’s Manual, Volume 3A, under section 2.2.1 Extended Feature Enable Register
). Now you’re probably thinking, how is that possible? But the possibilities are nearly endless when you have a subverted processor on your hands!
When setting the appropriate bits in the MSR Bitmap, you can control and mask the value of the SYSCALL Enable
(or SCE bit) of the EFER MSR. Referencing the Intel Software Developer’s Manual, Volume 2B, under section 4.3 INSTRUCTIONS (M-U)
, we can clearly see how the SYSCALL instruction operates and notice we can take advantage of the EFER SCE bit (the AMD64 Architecture Programmer’s Manual V3 r3.26 has a practically equivalent instruction reference on page 419 which some may find easier to follow).
Taking from the Intel SDM, the SYSCALL instruction operation is as follows:
IF (CS.L ≠ 1 ) or (IA32_EFER.LMA ≠ 1) or (IA32_EFER.SCE ≠ 1) (* Not in 64-Bit Mode or SYSCALL/SYSRET not enabled in IA32_EFER *) THEN #UD; FI; RCX ← RIP; (* Will contain address of next instruction *) RIP ← IA32_LSTAR; R11 ← RFLAGS; RFLAGS ← RFLAGS AND NOT(IA32_FMASK); CS.Selector ← IA32_STAR[47:32] AND FFFCH (* Operating system provides CS; RPL forced to 0 *) (* Set rest of CS to a fixed value *) CS.Base ← 0; (* Flat segment *) CS.Limit ← FFFFFH; (* With 4-KByte granularity, implies a 4-GByte limit *) CS.Type ← 11; (* Execute/read code, accessed *) CS.S ← 1; CS.DPL ← 0; CS.P ← 1; CS.L ← 1; (* Entry is to 64-bit mode *) CS.D ← 0; (* Required if CS.L = 1 *) CS.G ← 1; (* 4-KByte granularity *) CPL ← 0; SS.Selector ← IA32_STAR[47:32] + 8; (* SS just above CS *) (* Set rest of SS to a fixed value *) SS.Base ← 0; (* Flat segment *) SS.Limit ← FFFFFH; (* With 4-KByte granularity, implies a 4-GByte limit *) SS.Type ← 3; (* Read/write data, accessed *) SS.S ← 1; SS.DPL ← 0; SS.P ← 1; SS.B ← 1; (* 32-bit stack segment *) SS.G ← 1; (* 4-KByte granularity *)
We can see the first line of conditions that cause an Undefined Opcode Exception (#UD) contains a conditional check of the EFER SCE bit. Knowing that if EFER SCE is cleared, we can cause a #UD exception, we now know we can VM-exit on every SYSCALL instruction using the Exception Bitmap.
Though with every SYSCALL instruction there should be a subsequent SYSRET instruction inside the system call handler in order to resume execution back to the previous context. SYSRET operates similarly to the SYSCALL instruction, and can think of it as the little cousin of the IRET instruction.
Taking from the Intel SDM again, the SYSRET instruction operation is as follows:
IF (CS.L ≠ 1 ) or (IA32_EFER.LMA ≠ 1) or (IA32_EFER.SCE ≠ 1) (* Not in 64-Bit Mode or SYSCALL/SYSRET not enabled in IA32_EFER *) THEN #UD; FI; IF (CPL ≠ 0) OR (RCX is not canonical) THEN #GP(0); FI; IF (operand size is 64-bit) THEN (* Return to 64-Bit Mode *) RIP ← RCX; ELSE (* Return to Compatibility Mode *) RIP ← ECX; FI; RFLAGS ← (R11 & 3C7FD7H) | 2; (* Clear RF, VM, reserved bits; set bit 2 *) IF (operand size is 64-bit) THEN CS.Selector ← IA32_STAR[63:48]+16; ELSE CS.Selector ← IA32_STAR[63:48]; FI; CS.Selector ← CS.Selector OR 3; (* RPL forced to 3 *) (* Set rest of CS to a fixed value *) CS.Base ← 0; (* Flat segment *) CS.Limit ← FFFFFH; (* With 4-KByte granularity, implies a 4-GByte limit *) CS.Type ← 11; (* Execute/read code, accessed *) CS.S ← 1; CS.DPL ← 3; CS.P ← 1; IF (operand size is 64-bit) THEN (* Return to 64-Bit Mode *) CS.L ← 1; (* 64-bit code segment *) CS.D ← 0; (* Required if CS.L = 1 *) ELSE (* Return to Compatibility Mode *) CS.L ← 0; (* Compatibility mode *) CS.D ← 1; (* 32-bit code segment *) FI; CS.G ← 1; (* 4-KByte granularity *) CPL ← 3; SS.Selector ← (IA32_STAR[63:48]+8) OR 3; (* RPL forced to 3 *) (* Set rest of SS to a fixed value *) SS.Base ← 0; (* Flat segment *) SS.Limit ← FFFFFH; (* With 4-KByte granularity, implies a 4-GByte limit *) SS.Type ← 3; (* Read/write data, accessed *) SS.S ← 1; SS.DPL ← 3; SS.P ← 1; SS.B ← 1; (* 32-bit stack segment*) SS.G ← 1; (* 4-KByte granularity *)
We can see the first line of conditions that cause a #UD exception are the same as the SYSCALL instruction. At this point we know we’re good to start causing VM-exits and emulating system calls, but let’s recap everything we know we have to do:
- Enable VMX.
- Setup VM-entry controls in VMCS to load the EFER MSR on VM entry.
- Setup VM-exit controls in VMCS to save the EFER MSR on VM exit.
- Setup MSR Bitmap in VMCS to exit on reads and writes to the EFER MSR.
- Setup Exception Bitmap in VMCS to exit on #UD exceptions.
- Set the SCE bit on EFER MSR Read VM-exits.
- Clear (mask off) the SCE bit on EFER MSR Write VM-exits.
- Handle the #UD instruction to emulate either the SYSCALL or SYSRET instruction.
The next problem is detecting whether the #UD was caused by a SYSCALL or SYSRET instruction. For the sake of simplicity, reading opcodes from RIP is sufficient to determine what instruction caused the #UD. KVAS slightly complicates things however so we need to handle this a little differently if the CR3 PCID indicates a user mode directory table base. There is of course more optimal methods than reading the instruction opcodes (e.g. hook the interrupt table itself, or use a toggle or counter to switch between handling syscall or sysret if its safe to assume nothing else will cause a #UD).
Emulating the SYSCALL and SYSRET instructions is as easy as just following the instruction operations outlined in the manual. The following code is just a basic emulation, I have purposely left out handling of compatibility and protected mode and the SYSRET #GP exception for simplicity:
// // SYSCALL instruction emulation routine // static BOOLEAN VmmpEmulateSYSCALL( IN PVIRTUAL_CPU VirtualCpu ) { X86_SEGMENT_REGISTER Cs, Ss; UINT64 MsrValue; // // Save the address of the instruction following SYSCALL into RCX and then // load RIP from MSR_LSTAR. // MsrValue = ReadMSR( MSR_LSTAR ); VirtualCpu->Context->Rcx = VirtualCpu->Context->Rip; VirtualCpu->Context->Rip = MsrValue; VmcsWrite( VMCS_GUEST_RIP, VirtualCpu->Context->Rip ); // // Save RFLAGS into R11 and then mask RFLAGS using MSR_FMASK. // MsrValue = ReadMSR( MSR_FMASK ); VirtualCpu->Context->R11 = VirtualCpu->Context->Rflags; VirtualCpu->Context->Rflags &= ~(MsrValue | X86_FLAGS_RF); VmcsWrite( VMCS_GUEST_RFLAGS, VirtualCpu->Context->Rflags ); // // Load the CS and SS selectors with values derived from bits 47:32 of MSR_STAR. // MsrValue = ReadMSR( MSR_STAR ); Cs.Selector = (UINT16)((MsrValue >> 32) & ~3); // STAR[47:32] & ~RPL3 Cs.Base = 0; // flat segment Cs.Limit = (UINT32)~0; // 4GB limit Cs.Attributes = 0xA9B; // L+DB+P+S+DPL0+Code VmcsWriteSegment( X86_REG_CS, &Cs ); Ss.Selector = (UINT16)(((MsrValue >> 32) & ~3) + 8); // STAR[47:32] + 8 Ss.Base = 0; // flat segment Ss.Limit = (UINT32)~0; // 4GB limit Ss.Attributes = 0xC93; // G+DB+P+S+DPL0+Data VmcsWriteSegment( X86_REG_SS, &Ss ); return TRUE; }
// // SYSRET instruction emulation routine // static BOOLEAN VmmpEmulateSYSRET( IN PVIRTUAL_CPU VirtualCpu ) { X86_SEGMENT_REGISTER Cs, Ss; UINT64 MsrValue; // // Load RIP from RCX. // VirtualCpu->Context->Rip = VirtualCpu->Context->Rcx; VmcsWrite( VMCS_GUEST_RIP, VirtualCpu->Context->Rip ); // // Load RFLAGS from R11. Clear RF, VM, reserved bits. // VirtualCpu->Context->Rflags = (VirtualCpu->Context->R11 & ~(X86_FLAGS_RF | X86_FLAGS_VM | X86_FLAGS_RESERVED_BITS)) | X86_FLAGS_FIXED; VmcsWrite( VMCS_GUEST_RFLAGS, VirtualCpu->Context->Rflags ); // // SYSRET loads the CS and SS selectors with values derived from bits 63:48 of MSR_STAR. // MsrValue = ReadMSR( MSR_STAR ); Cs.Selector = (UINT16)(((MsrValue >> 48) + 16) | 3); // (STAR[63:48]+16) | 3 (* RPL forced to 3 *) Cs.Base = 0; // Flat segment Cs.Limit = (UINT32)~0; // 4GB limit Cs.Attributes = 0xAFB; // L+DB+P+S+DPL3+Code VmcsWriteSegment( X86_REG_CS, &Cs ); Ss.Selector = (UINT16)(((MsrValue >> 48) + 8) | 3); // (STAR[63:48]+8) | 3 (* RPL forced to 3 *) Ss.Base = 0; // Flat segment Ss.Limit = (UINT32)~0; // 4GB limit Ss.Attributes = 0xCF3; // G+DB+P+S+DPL3+Data VmcsWriteSegment( X86_REG_SS, &Ss ); return TRUE; }
You can simply call the SYSCALL and SYSRET emulation routines from your #UD handler, which also does the detection of what instruction caused the exception. Here is a quick example including code supporting KVAS:
#define IS_SYSRET_INSTRUCTION(Code) \ (*((PUINT8)(Code) + 0) == 0x48 && \ *((PUINT8)(Code) + 1) == 0x0F && \ *((PUINT8)(Code) + 2) == 0x07) #define IS_SYSCALL_INSTRUCTION(Code) \ (*((PUINT8)(Code) + 0) == 0x0F && \ *((PUINT8)(Code) + 1) == 0x05) static BOOLEAN VmmpHandleUD( IN PVIRTUAL_CPU VirtualCpu ) { UINTN GuestCr3; UINTN OriginalCr3; UINTN Rip = VirtualCpu->Context->Rip; // // Due to KVA Shadowing, we need to switch to a different directory table base // if the PCID indicates this is a user mode directory table base. // GuestCr3 = VmxGetGuestControlRegister( VirtualCpu, X86_CTRL_CR3 ); if ((GuestCr3 & PCID_MASK) != PCID_NONE) { OriginalCr3 = ReadCr3( ); WriteCr3( PsGetCurrentProcess( )->DirectoryTableBase ); if (IS_SYSRET_INSTRUCTION( Rip )) { WriteCr3( OriginalCr3 ); goto EmulateSYSRET; } if (IS_SYSCALL_INSTRUCTION( Rip )) { WriteCr3( OriginalCr3 ); goto EmulateSYSCALL; } WriteCr3( OriginalCr3 ); return FALSE; } else { if (IS_SYSRET_INSTRUCTION( Rip )) goto EmulateSYSRET; if (IS_SYSCALL_INSTRUCTION( Rip )) goto EmulateSYSCALL; return FALSE; } // // Emulate SYSRET instruction. // EmulateSYSRET: LOG_DEBUG( "SYSRET instruction => 0x%llX", Rip ); return VmmpEmulateSYSRET( VirtualCpu ); // // Emulate SYSCALL instruction. // EmulateSYSCALL: LOG_DEBUG( "SYSCALL instruction => 0x%llX", Rip ); return VmmpEmulateSYSCALL( VirtualCpu ); }
If it has been determined that a SYSCALL or SYSRET instruction has caused the #UD exception, then just skip injecting the exception into the guest as the exception has been caused intentionally, and resume back to the guest gracefully. Example:
case X86_TRAP_UD: // INVALID OPCODE FAULT LOG_DEBUG( "VMX => #UD Rip = 0x%llX", VirtualCpu->Context->Rip ); // // Handle the #UD, checking if this exception was intentional. // if (!VmmpHandleUD( VirtualCpu )) { // // If this #UD was found to be unintentional, inject a #UD interruption into the guest. // VmxInjectInterruption( VirtualCpu, InterruptVectorType, VMX_INTR_NO_ERR_CODE ); } // continued code flow then return back to guest....
So how can we use this effectively? Well in the SYSCALL emulation handler, we have access to the guest registers which contains the system call index, and associated parameters according to the x64 ABI in use, so we have free reign to do whatever we want with this!
11 thoughts on “Syscall Hooking via Extended Feature Enable Register (EFER)”
Can I monitor syscall calls without enable vt, can the EFER register be modified directly in kernel without VT?
If you disabled PG and hooked the #UD interrupt handler you could clear the SCE bit and handle syscalls in the #UD handler, but in reality that would be a ton of work and I don’t see anyone reasonably doing this.
Yes, EFER can be modified, however you cannot trap #UD without modifying IDT. That’s why this is easier with a hypervisor, you can use the exception bitmap to trap on #UD exceptions and perform the operations necessary to do syscall hooking.
Hello, Could you upload all code to github or this website? Thanks.
Nice tutorial, thank you. But I encountered a problem when I try implement your design on my project. After I setup VM-entry controls in VMCS to load the EFER MSR on VM entry. I initialized the efer in VMCS to 0xd00(disabled SCE), the vm always exited with InvalidGuestState and BSOD. I looked up the Intel SDM and cannot find any reference. Do you known why it happened?
I’d have to see a dump of register/non-register state for it to give an accurate answer. Does it function fine without the SYSCALL hooking method implemented? How are you setting up the VM-entry controls to load/save EFER MSR?
Hello, this was a great read. I have 2 questions:
1. Can you emulate the syscall on MSR read exits?
2. If you pretend that the syscalls are enabled on MSR reads, wouldn’t that avoid #UD exception?
I am having trouble understanding the difference between GuestCr3, OriginalCr3 and PsGetCurrentProcess( )->DirectoryTableBase.