Saturday, July 20, 2013

How to make an old Delphi application DEP compatible

The problem


We have an application, our core application, that used to do a couple of things that resulted on DEP violations (read more about it here Data Execution Prevention ).
This two things are:
  • Self patch framework procedures, functions or class methods
  • Generate code during runtime thru scripting engines that perform just-in-time compilation

The solution


Self patching code


Normally developers rely on self patching code when the framework they are utilizing doesn't conform to something they consider should be proper and default behavior, to extend a framework otherwise impossible to modify or to fix a bug on the framework. There might be other reasons I'm omitting here, but these are the most common I've seen. On our case, we pretty much have self-patching code that follow those three descriptions.

Most self patching code uses one variation or another of the same technique, which implies overwriting the first bytes of a procedure to perform a JMP operation of some kind to the new code that replaces the old code.

This is an example of a procedure we use to self-patch code:

procedure PatchMemory(p : Pointer; DataSize : Integer; Data : Pointer; OldData : pointer); 
{$IFNDEF DELPHIXE2}
type
  SIZE_T = DWORD;
{$ENDIF}  
var
  OldProtect : DWORD;
  BytesWritten : SIZE_T;
begin
  VirtualProtect (p, DataSize, PAGE_EXECUTE_READWRITE, OldProtect);
  Move (p^, OldData^, DataSize);
  WriteProcessMemory(GetCurrentProcess, p, Data, DataSize, BytesWritten);
  VirtualProtect (p, DataSize, OldProtect, OldProtect);
end;


The key pieces of the code above are the calls to VirtualProtect() and WriteProcessMemory(). 

Before you overwrite a piece of memory in the code segment, you MUST unprotect the memory using a call to VirtualProtect() with new protection option PAGE_EXECUTE_READWRITE.

The second thing you have to take into consideration is *how* you overwrite the memory on the code segment. I've seen implementations that simply do a move() call with the source and target memory addresses only to see the code fail with DEP violation, even when VirtualProtect() was properly called before.
It's interesting to note that on Microsoft WriteProcessMemory() spec page there's not special note to the fact it's the only way I know of to overwrite a piece of memory on the code segment without getting a DEP violation error.

With a function like the one above, as long as you follow the basic premise of unprotect the memory first and then do patch the memory using WriteProcessMemory() you will be pretty much covered for the typical issue of overwriting the code segment in any other way.


Script engine JIT compilers


Most extensible applications/frameworks rely on some kind of scripting language. From simple yet powerful "configuration" dialects to full blow scripting languages. We use a couple of scripting languages on our programs, both being dialects of Pascal. 
One of this scripting languages is newer and does its job the right way, by allocating the memory for the JIT generated code using VirtualAlloc() passing memory protection attribute PAGE_EXECUTE_READWRITE. That makes the code generated on the heap executable by simply jumping into it.

The older of our scripting engines, simply created a Delphi TMemoryStream object, and wrote into it the generated code. After it was done compiling, it tried to jump into the code generated and that of course failed with a DEP violation.
The problem in this case is that the memory allocated by the Delphi's default memory allocator doesn't use PAGE_EXECUTE_READWRITE, but PAGE_READWRITE. This is fine, and you don't want to change this default behavior

The solution for this particular scripting engine was to replace the default TMemoryStream class, which allocates normal Delphi heap memory, with a descendant of TCustomMemoryStream class which implements it's own memory allocation approach by calling VirtualAlloc() directly with memory protection attribute PAGE_EXECUTE_READWRITE.

Gist for the class: TWinVMMemoryStream


Conclusion


Just by attacking these two problems we made our application DEP compliant.
If you are having a hard time identifying where your non-compliant code might be, my suggestion will be to first try to narrow down when the violations happen.

If violations happen when starting the app, it's likely there self-patching code violations being invoked on the initialization section of unit/modules.

If it happens later down the road, once it's up and running, it's more likely there's some JIT compiler as the culprit.

Anyway, once you know what are the two tricks you have to do:
  • Always call VirtualProtect() before self-patching code, and do it using WriteProcessMemory().
  • Make sure to allocate memory for JIT generated code using VirtualAlloc() with memory protection attribute PAGE_EXECUTE_READWRITE
You will get to DEP compliance in a breeze.

Happy coding!

Friday, July 19, 2013

DLLs deadlocking when getting unloaded if attempting to exit threads

The Problem

At Convey, we use many different languages to construct our solutions, one of them and probably the most commonly used today for a lot of our back-end services is Delphi. 
As any developer with even minor knowledge of Delphi knows, applications based on it are broken up on Units and Units have a "initialization" block of code, and a "finalization" block of code.
Typically these to blocks are utilized to initialize and finalize globals used by each module. These two blocks are guaranteed to be called upon startup/first use of a module and when the module it's going out of scope either by a containing library being unloaded or a program finishing.
Traditionally, developers using other languages such as C rely on explicit calls to initialize or finalize resources on modules... but not in Delphi.

So, this was the root of our problem.

We had code that when compiled and run as part of a standalone EXE or a BPL (Borland Package Library) worked as a charm. Programs started, used the code with no issue and unloaded themselves with no problems. BUT... when the same modules where linked as a part of a DLL, it simply "locked" the program when trying to unload and it was necessary to kill the process for the outside.

Because of this, we ended up relying on all kind of dirty tricks, from leaving memory leaks by prevent freeing resources that "seemed" to cause the freeze to incorporating "auto-kill" code on DLLs that when detected that an app was trying to shutdown it will simply kill the process from the inside.

A while ago I read an article by Chris Wenham ( Signs you are a bad programmer ) and decided that it was time to clean the house to be less of a "bad programmer" according to his definitions. I took on an old thread based timer I wrote many years ago when there no such facility on Windows, but over the years Microsoft added decent timer support. The refactoring resulted on the dreaded DLL deadlocking upon unload of a library which contained the newly refactored timer code.

After some time researching and scratching my head, I came across this article:

Exit thread upon deleting static object during unload DLL causes deadlock?

I decided to go to MSDN to read more about DLL entry point mechanics ( DllMain entry point ) and found this piece of text:
Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.
Now, that was the first glaring warning sign that I was trying to do something I was not supposed to. The fact I was trying to terminate a job thread on the finalization section of the unit "smelled" to me as "communicating" with other thread on some way.

First thing I tried after that was leaving a resource leak (the actual job threads used by the timer) and that proved to prevent the deadlock from happening.

A second finding as I was experimenting with the finalization section of this unit was that a call to CoUninitialize()causes the same deadlock behavior. And if you read the specs of the function you find this:
Because there is no way to control the order in which in-process servers are loaded or unloaded, do not call CoInitializeCoInitializeEx, or CoUninitializefrom the DllMain function.
Shamefully, without relying on an explicit call from the host application, the only "solution" for the CoUninitialize() limitation is simply to detect the scenario of FreeLibrary() being called and avoid calling the function on the finalization section.

The Solution

So, what do to here? One recommendation will be to make every module or DLL have a couple of exported procedures to Init and Finalize global resources on the DLL. Then you can free your global threads on those procedures. The problem with this approach is that in many cases it might not be possible to modify the host application to adhere to this new protocol, and the second issue I see is that for probably all Delphi developers it's good practice to write code on the finalization section to free global resources without special consideration about all the limitations that DllMain imposes. As said before, under standalone EXE or when using BPLs, no limitations on the finalization code are present that I'm aware of. 

So, I decided to go for something I considered an "elegant hack" (free to interpretation here if there's such a thing as "elegant hacks"...) that will make applications that try to finish a thread on the finalization section of a unit, compatible with DLLs without further modifications other than linking a particular unit that contains the hack on it.

The Code


The first thing I did was to create my own installable custom DllMain handler. This is easy to do with Delphi using the global variable DllProc.

interface
...

var
  ShuttingDownDll : Boolean; // Use this flag to know when a DLL is in DETACH mode
implementation
...

{$IFNDEF DELPHI2007}
type
  THookedDllProc = procedure (Reason: DWORD);
{$ENDIF}

var
  {$IFDEF DELPHI2007}
  OldDllProc : TDLLProc;
  {$ELSE}
  OldDllProc : Pointer;
  {$ENDIF}

// Hooked DllProc used to flag when DLL is being detached
procedure HookedDllProc(Reason: DWORD);
begin
  if not ShuttingDownDll then
    ShuttingDownDll := Reason = DLL_PROCESS_DETACH;
  if assigned(OldDllProc) then
    {$IFNDEF DELPHI2007}THookedDllProc({$ENDIF}OldDllProc{$IFNDEF DELPHI2007}){$ENDIF}(Reason);
end;
initialization
  OldDllProc := DllProc;
  DllProc := @HookedDllProc;
...
finalization
  ...
  DllProc := OldDllProc;
end.

With this now we have the flag ShuttingDownDll set to True when the DLL receives the DLL_PROCESS_DETACH signal.

The second part is the real hack. For this I decided to change the semantics of Delphi EndThread() system procedure. For this I did a simple hack well known on the Delphi community, which involves overwriting the first bytes of the actual code with a relative JMP to the new code.

For this to work you need this:
implementation
...
type
  PJump = ^TJump;
  TJump = packed record
    OpCode:byte;
    Distance:integer;
  end;

var
  OldCode : TJump;
  NewCode : TJump;

procedure HookedEndThread(ExitCode: Integer);
begin
  {$IFDEF DELPHI2007}
  if Assigned(SystemThreadEndProc) then
    SystemThreadEndProc(ExitCode);
  {$ENDIF}
  if (not IsLibrary) or (not ShuttingDownDll) then
    ExitThread(ExitCode)
  else TerminateThread(GetCurrentThread, ExitCode); // Forceful termination of thread if library mode and DLL_PROCESS_DETACH mode
end;

procedure PatchEndThread;
begin
  NewCode.Distance := Integer(@HookedEndThread) - (Integer(@EndThread) + 5);
  PatchMemory (@EndThread, 5, @NewCode, @OldCode);
  FlushInstructionCache (GetCurrentProcess, @EndThread, 5);
end;

procedure UnPatchEndThread;
begin
  PatchMemory (@EndThread, 5, @OldCode);
  FlushInstructionCache (GetCurrentProcess, @EndThread, 5);
end;
initialization
  ...
  NewCode.OpCode := $E9;
  NewCode.Distance := 0;
  PatchEndThread;
  ...
finalization
  ...
  UnPatchEndThread;
  ...
end.

Something to note here is that TerminateThread() doesn't cause the deadlock that ExitThread() causes. It can be assumed that TerminateThread() doesn't attempt to "communicate" with the target thread to be terminated. Of course TerminateThread() is not the same as a clean ExitThread() call, but at least we can get as far as possible following the normal path of execution of the program.

Finally, this is a typical implementation of PatchMemory():


procedure PatchMemory(p : Pointer; DataSize : Integer; Data : Pointer; OldData : pointer); 
{$IFNDEF DELPHIXE2}
type
  SIZE_T = DWORD;
{$ENDIF}  
var
  OldProtect : DWORD;
  BytesWritten : SIZE_T;
begin
  VirtualProtect (p, DataSize, PAGE_EXECUTE_READWRITE, OldProtect);
  Move (p^, OldData^, DataSize);
  WriteProcessMemory(GetCurrentProcess, p, Data, DataSize, BytesWritten);
  VirtualProtect (p, DataSize, OldProtect, OldProtect);
end;


Notice the call to WriteProcessMemory() instead of Delphi's standard Move() procedure. This is key to avoid being caught by Windows DEP protection. Even tough we called VirtualProtect() to make memory writable, DEP doesn't like a process writing anything to the code segment unless it's done using WriteProcessMemory(). If you use that function, you can pretty much overwrite any piece of the code segment as long as you unprotect the memory first.

A caveat with this example is that it's not 64 bits compatible. Obvious things that need to be adjusted are pointer arithmetics and potentially the relative jump used to overwrite ExitThread() ( JMP - Jump ).

Happy coding!