Microsoft’s Tlbimp creates leaky BSTR signatures


This one confounded me when I first discovered it, and I’ve recently been reminded about it. For the sake of remembering the details, and hopefully helping someone else out I’m going to document it here.

The problem is this. When you have a COM library that you need to use from a C# app, you import it as a reference. In the background the Microsoft.NET wizards do their magic by running the Tlbimp.exe to generate a managed DLL with all of the objects and interfaces from the COM library. You proceed to use the code that Microsoft so conveniently converted for you fully confident that all is well.

But it’s not. See, suppose your COM library has a method that returns a BSTR via an [out] parameter, or perhaps it defines an interface for a listener your managed code must implement. Suddenly there is the potential for a serious memory leak!

See, a BSTR in unmanaged code aren’t just any normal string. BSTR‘s are allocated by the system by calling SysAllocString and subsequently released by calling SysFreeString. This poses a problem for managed code if you aren’t careful. Take the following listener interface for example.

*Interface names and GUID’s changed to protect the innocent

interface IImplementMeListener : IDispatch{
[
id(0x000000C9)
]
HRESULT _stdcall notify([in] TEventType eventType, [in] BSTR data );
};

The Tlbimp.exe generates a managed assembly with the following signature for this same method.

[ComImport, TypeLibType((short) 0x10c0), Guid("00000000-0000-0000-0000-000000000000")]
public interface IImplementMeListener
{
    [MethodImpl(MethodImplOptions.InternalCall, MethodCodeType=MethodCodeType.Runtime), DispId(0xc9)]
    void notify([In, ComAliasName("ExampleLib.TEventType")] TEventType eventType,
        [In, MarshalAs(UnmanagedType.BStr)] string data);
}

Which means when you implement this interface in your managed class, you’ll just have String as the data type for the second parameter. Which might look like this.

class MyManagedListener : ExampleLib.IImplementMeListener{
    public void notify(ExampleLib.TEventType eventType, String data)
    {
        DoSomethingWithData(data);
    }
}

So this is what happens.

1) Your COM library allocates the string to pass into your listener using SysAllocString.

2) Your COM library passes the newly allocated string into your managed app by calling the notify method of your listener.

3) Your managed app does whatever it’s going to do with the string, then returns.

Normally in a fully managed app this would be no problem, when the reference count to the string finally reaches 0, the garbage collector sweeps it up and the memory is reclaimed. However, in this case we have a problem. The COM library allocated the string, and passed it into your managed app, and it’s responsibility for that string ends there. The expectation is that the client will release the BSTR by calling SysFreeString. Clearly we can’t explicitly do that to the managed String type.

So what do we do? We rewrite the part of the assembly that Tlbimp.exe made for us, and adjust our listener implementation slightly.

This is how I did it, though there may be better ways.

1) Use a disassembler to view the code of the Tlbimp.exe generated assembly for your COM library. I used, Lutz’s Reflector.

2) Copy the code for the entire library into a *.cs file, then change just the signature of the method you’re concerned with.

The new signature should look like this.

[ComImport, Guid("00000000-0000-0000-0000-000000000000"), TypeLibType((short) 0x10c0)]
public interface IImplementMeListener
{
[MethodImpl(MethodImplOptions.InternalCall, MethodCodeType=MethodCodeType.Runtime), DispId(0xc9)]
    void notify([In, ComAliasName("ExampleLib.TEventType")] TEventType eventType,
        [In] IntPtr data);
}

And your implementing class changes to this.

class MyManagedListener : ExampleLib.IImplementMeListener{
    public void notify(ExampleLib.TEventType eventType, IntPtr data)
    {
        String dataStr = Marshal.PtrToStringBSTR(data);
        DoSomethingWithData(dataStr);
        Marshal.FreeBSTR(data);
    }
}

This is not particularly tricky wizardry. All we’re doing is marshaling the input value from the library as an IntPtr instead of a managed String. This allows us to explicitly release it using the System.Runtime.InteropServices.Marshal.FreeBSTR method, just like the library expects us to do.

Hopefully, this will save you some hastle, and avoid a potentially large memory leak.

, , , ,

  • sinaweChange

    this is so not true. what would happen if there are more than one clients (listeners) and one of them randomly release memory for one of its arguments. what would happen for next listener?

    responsibility of releasing the allocated memory in this case is 100% on the caller. and further more, caller should have used something like CComBSTR to use RAII resource management rather than directly calling SysAlloc and Dealloc.

  • http://www.nslms.com RyanG

    And that’s what I told the developer that made the COM library I was consuming. He disagreed adamantly. ;-)

    Since I couldn’t change the library I was consuming, I had to resolve the issue on my side, and this is how I did it.

    If I owned both the library, and the consumer I’d make the change in the library as well, but I just didn’t have that luxury. You do what ya gotta do, eh?

  • Luke

    Although this is very old: I agree with the first comment. This is not a problem of tlbimp, it’s a problem of the design of your library.

    The memory leak would as well occur in C++ / COM if the client of the library does not know that he is responsible for releasing the memory. Microsoft cannot fix other people’s design errors.