Opening Thoughts
Over the past couple months I have gone through the Zero2Automated malware RE course, and the Malware Development course from MalDev Academy.
The malware RE course is pretty easy to practice as there are plenty of samples and challenges online to reverse, but finding a challenge for malware development is a bit trickier.
Fortunately, I ran into some problems while studying for the OSCP exam and doing HackTheBox.
Many of the boxes running Windows 10 were able to detect msfvenom
, mimikatz, and other malicous powershell scripts which made it very difficult to get a reverse shell on what should have been very simple boxes.
The OSCP teaches to use Shellter
, but even this tool was failing to bypass Defender.
At this point I figured it’d be a good investment to build my own loader so that I wouldn’t ever have to worry about this issue again.
I had actually done this project once in C++, but hindsight is 20/20 and I arrived at the conclusion that writing it in C is the best option.
My reasoning is that a loader isn’t super complicated, so the organization and object oriented approach that C++ offers doesn’t really offer any benefit.
And the added overhead from C++ just makes the final executable larger.
Since I’ll be working on boxes where the network bandwidth isn’t great, a small and portable executable is better.
Design
I was talking to this one person who’s been developing game cheats for over 10 years and he told me that all you really need is memory injection and streaming relocations.
Streaming relocations is a bit overkill for this project, so I’ll skip that but the injection is definitely optimal.
In the first version I had used APC injection, and ran into issues with my shell dying once the original process terminated.
I’m sure I could fix it, but since this is supposed to be an improvement over my 1.0
loader I want to go with remote APC injection, aka Early Bird APC injection.
I will also need a quick and easy way of hashing strings, and encrypting my msfvenom
payloads.
For these requirments I’ll just use the same code snippets as taught in the MalDev course since crypto isn’t one of my strengths.
Hell’s Gate
Window’s syscall’s are the API’s that carry out the actions when a WinAPI is called.
The example given in the course is that VirtualAlloc
and VirtualAllocEx
use the NtAllocateVirtualMemory
to carry out their actions.
All the syscalls return an NTSTATUS
value that indicates an error code.
We can use the following docs since most syscalls aren’t documented by Microsoft:
The main advantage to using syscalls is that we can evade hooked WinAPI. Syscalls will have the following structure:
|
|
SSN referes to the syscall service number that the kernel uses to distinguish one syscall from another. It’s important to note that these values will differ for the same syscall across different OS and OS versions.
Hell’s Gate is a technique that can read through ntdll.dll
to find and execute syscalls.
It’s a pretty complex technique that I won’t go over here since that isn’t the purpose of this post.
But you can read about it here.
We can follow the guidance in that paper to get the technique working.
Long story short, we will add a VX_TABLE_ENTRY
to our VX_TABLE
for every syscall we want to use:
|
|
This table will be populated via the GetVxTableEntry(...)
function given in the paper.
Then HellsGate
function just loads up the correct SSN to be called, and then HellDescent
actually makes the call:
|
|
Early Bird APC Injection
Asynchronous Procedure Calls (APC’s) are functions that execute async in the context of a specific thread. We can queue an APC to a thread, and the next time the thread is scheduled, it will run the APC function. In order to run an APC generated by an application, the thread we use must be in an alertable state, which just means that it is in a “wait” state. The early bird part of this technique just refers to a remote process instead of the local one. So we would do the following:
- Spawn process in suspended state.
- Write payload to address space of suspended process.
- Get a handle to the suspended thread.
- Pass in address of payload new address, and handle to
QueueUserAPC
. - Resume thread and wait for payload to run.
Something interesting to note is that normally you’d spawn the process like this:
|
|
However I am using the syscall NtCreateUserProcess
, so I need to use a slightly different calling convention.
|
|
To learn more about how NtCreateUserProcess
works I recommend this blog.
Now we just write our payload into the memory space of this process. For this, we need to update our syscall table structure with 3 more syscalls:
|
|