Friday, July 19, 2013

DLLs deadlocking when getting unloaded if attempting to exit threads

The Problem

At Convey, we use many different languages to construct our solutions, one of them and probably the most commonly used today for a lot of our back-end services is Delphi. 
As any developer with even minor knowledge of Delphi knows, applications based on it are broken up on Units and Units have a "initialization" block of code, and a "finalization" block of code.
Typically these to blocks are utilized to initialize and finalize globals used by each module. These two blocks are guaranteed to be called upon startup/first use of a module and when the module it's going out of scope either by a containing library being unloaded or a program finishing.
Traditionally, developers using other languages such as C rely on explicit calls to initialize or finalize resources on modules... but not in Delphi.

So, this was the root of our problem.

We had code that when compiled and run as part of a standalone EXE or a BPL (Borland Package Library) worked as a charm. Programs started, used the code with no issue and unloaded themselves with no problems. BUT... when the same modules where linked as a part of a DLL, it simply "locked" the program when trying to unload and it was necessary to kill the process for the outside.

Because of this, we ended up relying on all kind of dirty tricks, from leaving memory leaks by prevent freeing resources that "seemed" to cause the freeze to incorporating "auto-kill" code on DLLs that when detected that an app was trying to shutdown it will simply kill the process from the inside.

A while ago I read an article by Chris Wenham ( Signs you are a bad programmer ) and decided that it was time to clean the house to be less of a "bad programmer" according to his definitions. I took on an old thread based timer I wrote many years ago when there no such facility on Windows, but over the years Microsoft added decent timer support. The refactoring resulted on the dreaded DLL deadlocking upon unload of a library which contained the newly refactored timer code.

After some time researching and scratching my head, I came across this article:

Exit thread upon deleting static object during unload DLL causes deadlock?

I decided to go to MSDN to read more about DLL entry point mechanics ( DllMain entry point ) and found this piece of text:
Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.
Now, that was the first glaring warning sign that I was trying to do something I was not supposed to. The fact I was trying to terminate a job thread on the finalization section of the unit "smelled" to me as "communicating" with other thread on some way.

First thing I tried after that was leaving a resource leak (the actual job threads used by the timer) and that proved to prevent the deadlock from happening.

A second finding as I was experimenting with the finalization section of this unit was that a call to CoUninitialize()causes the same deadlock behavior. And if you read the specs of the function you find this:
Because there is no way to control the order in which in-process servers are loaded or unloaded, do not call CoInitializeCoInitializeEx, or CoUninitializefrom the DllMain function.
Shamefully, without relying on an explicit call from the host application, the only "solution" for the CoUninitialize() limitation is simply to detect the scenario of FreeLibrary() being called and avoid calling the function on the finalization section.

The Solution

So, what do to here? One recommendation will be to make every module or DLL have a couple of exported procedures to Init and Finalize global resources on the DLL. Then you can free your global threads on those procedures. The problem with this approach is that in many cases it might not be possible to modify the host application to adhere to this new protocol, and the second issue I see is that for probably all Delphi developers it's good practice to write code on the finalization section to free global resources without special consideration about all the limitations that DllMain imposes. As said before, under standalone EXE or when using BPLs, no limitations on the finalization code are present that I'm aware of. 

So, I decided to go for something I considered an "elegant hack" (free to interpretation here if there's such a thing as "elegant hacks"...) that will make applications that try to finish a thread on the finalization section of a unit, compatible with DLLs without further modifications other than linking a particular unit that contains the hack on it.

The Code


The first thing I did was to create my own installable custom DllMain handler. This is easy to do with Delphi using the global variable DllProc.

interface
...

var
  ShuttingDownDll : Boolean; // Use this flag to know when a DLL is in DETACH mode
implementation
...

{$IFNDEF DELPHI2007}
type
  THookedDllProc = procedure (Reason: DWORD);
{$ENDIF}

var
  {$IFDEF DELPHI2007}
  OldDllProc : TDLLProc;
  {$ELSE}
  OldDllProc : Pointer;
  {$ENDIF}

// Hooked DllProc used to flag when DLL is being detached
procedure HookedDllProc(Reason: DWORD);
begin
  if not ShuttingDownDll then
    ShuttingDownDll := Reason = DLL_PROCESS_DETACH;
  if assigned(OldDllProc) then
    {$IFNDEF DELPHI2007}THookedDllProc({$ENDIF}OldDllProc{$IFNDEF DELPHI2007}){$ENDIF}(Reason);
end;
initialization
  OldDllProc := DllProc;
  DllProc := @HookedDllProc;
...
finalization
  ...
  DllProc := OldDllProc;
end.

With this now we have the flag ShuttingDownDll set to True when the DLL receives the DLL_PROCESS_DETACH signal.

The second part is the real hack. For this I decided to change the semantics of Delphi EndThread() system procedure. For this I did a simple hack well known on the Delphi community, which involves overwriting the first bytes of the actual code with a relative JMP to the new code.

For this to work you need this:
implementation
...
type
  PJump = ^TJump;
  TJump = packed record
    OpCode:byte;
    Distance:integer;
  end;

var
  OldCode : TJump;
  NewCode : TJump;

procedure HookedEndThread(ExitCode: Integer);
begin
  {$IFDEF DELPHI2007}
  if Assigned(SystemThreadEndProc) then
    SystemThreadEndProc(ExitCode);
  {$ENDIF}
  if (not IsLibrary) or (not ShuttingDownDll) then
    ExitThread(ExitCode)
  else TerminateThread(GetCurrentThread, ExitCode); // Forceful termination of thread if library mode and DLL_PROCESS_DETACH mode
end;

procedure PatchEndThread;
begin
  NewCode.Distance := Integer(@HookedEndThread) - (Integer(@EndThread) + 5);
  PatchMemory (@EndThread, 5, @NewCode, @OldCode);
  FlushInstructionCache (GetCurrentProcess, @EndThread, 5);
end;

procedure UnPatchEndThread;
begin
  PatchMemory (@EndThread, 5, @OldCode);
  FlushInstructionCache (GetCurrentProcess, @EndThread, 5);
end;
initialization
  ...
  NewCode.OpCode := $E9;
  NewCode.Distance := 0;
  PatchEndThread;
  ...
finalization
  ...
  UnPatchEndThread;
  ...
end.

Something to note here is that TerminateThread() doesn't cause the deadlock that ExitThread() causes. It can be assumed that TerminateThread() doesn't attempt to "communicate" with the target thread to be terminated. Of course TerminateThread() is not the same as a clean ExitThread() call, but at least we can get as far as possible following the normal path of execution of the program.

Finally, this is a typical implementation of PatchMemory():


procedure PatchMemory(p : Pointer; DataSize : Integer; Data : Pointer; OldData : pointer); 
{$IFNDEF DELPHIXE2}
type
  SIZE_T = DWORD;
{$ENDIF}  
var
  OldProtect : DWORD;
  BytesWritten : SIZE_T;
begin
  VirtualProtect (p, DataSize, PAGE_EXECUTE_READWRITE, OldProtect);
  Move (p^, OldData^, DataSize);
  WriteProcessMemory(GetCurrentProcess, p, Data, DataSize, BytesWritten);
  VirtualProtect (p, DataSize, OldProtect, OldProtect);
end;


Notice the call to WriteProcessMemory() instead of Delphi's standard Move() procedure. This is key to avoid being caught by Windows DEP protection. Even tough we called VirtualProtect() to make memory writable, DEP doesn't like a process writing anything to the code segment unless it's done using WriteProcessMemory(). If you use that function, you can pretty much overwrite any piece of the code segment as long as you unprotect the memory first.

A caveat with this example is that it's not 64 bits compatible. Obvious things that need to be adjusted are pointer arithmetics and potentially the relative jump used to overwrite ExitThread() ( JMP - Jump ).

Happy coding!

1 comment: