2 years ago

#75783

test-img

Sébastien Emond

How can two threads execute code inside the same critical section, when this should never happen?

I am analyzing an application crash dump on Windows 7 SP1. I can see two threads that are running code inside the same CRITICAL_SECTION at the time of the crash, including the crashing thread. As far as I know, this is never supposed to happen.

Here is what I have been able to determine so far:

  • The critical section in question is a data member in one of our C++ classes. It is initialized in the constructor with InitializeCriticalSection() (first operation in the constructor code block), and destroyed in the destructor with DeleteCriticalSection() (last operation in the destructor).
  • Both threads are running code on the same instance of the class, therefore using the same critical section.
  • We use two-step initialization for this object, with an initialize method separate from the constructor that can be called at a later time. Both threads are inside that initialization method at the time of the crash. See the code sample below.
  • Looking at the CRITICAL_SECTION data in the debugger, it appears that it is NOT locked (LockCount equal to -2, which is consistent with an unlocked critical section since the behavor was redesiged in Windows Vista). I don't see how that could be.
  • I have also confirmed that all calls to EnterCriticalSection() and LeaveCriticalSection() in the class are balanced; there is no possibility that one thread would have left the critical section early.

Here is what the initialization method looks like (not the actual code, but the same relevant structure):

bool MyClass::Initialize() 
{
    EnterCriticalSection(&m_Lock);

    if (!m_Initialized)
    {

        // Lengthy initialization code here...
        // Both threads are somewhere in here when the crash happens.

        m_Initialized = true;
    }

    LeaveCriticalSection(&m_Lock);

    return m_Initialized;
}

This code has not changed in the past 15 years, and was running fine until that recent crash.

A few things that may be different on the crashing systems compared to the other systems where it usually runs:

  • The crash occurred on a virtual machine. Systems using that code are usually deployed on a dedicated physical machine.
  • The code itself is in an unmanaged dll, and the crashed happened in a managed application that uses it; i.e. in a mixed environment (managed code calling unmanaged code). This is fairly recent (a few years at most), for most of its life, that code was used only by unmanaged applications.

EDIT: As requested in the comment, here is what the constructor and destructor look like:

MyClass::MyClass()
{
    InitializeCriticalSection(&m_Lock);
}

MyClass::~MyClass()
{
    Uninitialize();
    DeleteCriticalSection(&m_Lock);
}

Note also that m_Lock is a data member of MyClass (not static).

I have also looked at the registers for the stack frame of the call to the initialize method, and traced the values back to the method entry point in disassembly view. I was able to confirm that the value of the "this" pointer, which appears in RDI at the point of the crash, was moved there from RCX at the beginning of the method. This is consistent with "this" being the implicit first argument passed to the method according to the x64 __fastcall calling convention.

I also traced the point where the Initialize method is called in both threads (this is in another project that uses our dlls as dependencies, so I had to find the pdbs and pulled the code from another repository).

It is done directly into a thread function, i.e. passed as the thread entry point to a CreateThread() call. It looks like this:

static DWORD MyThread(void* a_pParam)
{
    ThreadParamContainer* pContainer = static_cast<ReturnContainer*>(a_pParam);

    pContainer->pReturnObject = new MyClass;
    
    if (pContainer->pReturnObject->Initialize() == false)
    {
        delete pContainer->pReturnObject;
        pContainer->pReturnObject = NULL;
    }

    return 0;
}

So the goal of this thread is simply to create the object, initialize it, and return it to the thread's creator. As the initialization of that particular object can take some time (it involves connecting to a remote server, among other things), the code that spawns the thread can do other things while this initialization takes place.

If two threads were created with the same ThreadParamContainer, there could be a memory leak, since the second thread to call "new" would overwrite the pointer. But this still does not explain how both threads could end up in the same critical section.

This is still all in unmanaged code.

c++

windows

multithreading

winapi

clr

0 Answers

Your Answer

Accepted video resources