Introduction
In our Datatog continuous .NET profiler implementation, we collect the call stack of a thread when something interesting happens such as an exception is thrown for example. In addition to the method name we would like to figure out at what line in which source code file this method is implemented.
This information is usually stored in the program database (.pdb) file that is generated by the compiler when the assembly is generated from the source code. The type and the name of the method are stored in the metadata of the assembly itself but I already told this story before. The .NET compilers support two formats of .pdb: the Portable format for .NET Core and the Windows format for .NET Framework.
I’ve explained how to use the DIA API and it is now time to show how to leverage the DbgHelp API that is available on all Windows machines (even though it is recommended to always install the latest release via the Debugging Tools for Windows.
This time, my goal is to extract from a Windows .pdb file the source code and line information for a given managed method. You can find the corresponding source code of this DumpLine tool in my Github repository.
Starting with DbgHelp
There are two major ways to get access to symbols with DbgHelp: either from a running process (i.e. to map the currently loaded .dll to their associated .pdb files) or from a tool that would explicitly load a .dll or a .pdb file.
Before anything, you tell DbgHelp which options you want by calling SymSetOptions:
| |
This is where I ask that line number information should be collected.
The next step is to call SymInitialize to setup DbgHelp environment. The first parameter expects a process handle (returned by GetCurrentProcess in my case). You could pass a path where to find the .pdb files for your dlls as a second parameter. In my case, since I will provide a .pdb file path, I don’t need it, and NULL will be passed. It means that, if needed, DbgHelp will use the current folder and the path set in _NT_SYMBOL_PATH and _NT_ALTERNATE_SYMBOL_PATH environment variables.
The last boolean parameter tells DbgHelp if you want that SymLoadModule64 to be called for each and every loaded .dll in the given process. Definitively not what I want so I’m passing FALSE.
| |
At that point, I’m ready to load a .pdb file.
Loading a .pdb file
The API is straightforward: just call SymLoadModuleEx :
| |
The important parameters are the process handle (same as for SymInitialize) and the path of the .pdb file. I’ve lost some time trying to understand why my code was not working due to a weird behavior of this function. You know that it succeeds when the returned address is not 0. Well… This is not 100% correct. If the path you provide does not exist, you won’t get 0 but the base address that you also provide. Even worth, when you call the functions I’ll detail later on, no error will happen but nothing will work as expected. So, I simply check that the file exists:
| |
Note that it is possible to unload the symbols of a given loaded module by calling SymUnloadModule with the same process handle and its base address: this will reduce the memory consumption if you don’t need the symbols anymore.
In case of deferred load symbols, it is needed to call SymGetModuleInfo64 before trying to access the symbols:
| |
In addition, this will fill up an IMAGEHLP_MODULE64 structure with possibly interesting details:

The .pdb signature and age could be useful to build urls to communicate with symbol servers; but this is another story:
| |
You also know if line numbers are available or not thanks to the LineNumbers field.
Note that if you asked for deferred symbols option, you won’t get any interesting details:

Only the module name and path are provided but nothing else.
Enumerating the methods
It is now time to iterate on the symbols in the loaded .pdb thanks to SymEnumSymbols:
| |
In addition to the obvious parameters, this function expects a callback function that will be called for each symbol in the module specified by the process handle and the base address. Note that you can pass any context as the last parameter. In my case, the instance of my DbgHelpParser class is passed to be able to store the methods in a dedicated _methods field:
| |
The “*!*” mask tells DbgHelp to look for symbols in all modules. This might sound counter intuitive, but the syntax is similar to what you find in WinDBG or Visual Studio:
The job of the callback function is to detect the symbols you are interested in from the SYMBOL_INFO structure passed for each matching symbol:
| |
The Tag field contains a value from SymTagEnum but, for a managed .pdb file, you will only get SymTagFunction. Also, the Flags field should contain SYMFLAG_CLR_TOKEN and SYMFLAG_METADATA because we are only interested in managed methods.
Next, you get the name, address and size from other fields before looking for the source file and line details by calling SymGetLineFromAddr64:
| |
The callback returns TRUE to continue the enumeration. You could return FALSE if you would look for specific symbols and wanted to speed up the processing.
The managed side of the story
When the tool is run on a managed assembly, you get the following kind of output:

The name of the method does not match at all with the names in my test assembly! Why do all methods have this generic Method.#
Well… the number corresponds to the RID of the corresponding method in the metadata of the assembly. Let’s have a look at what the MethodDef metadata table of this assembly looks like in ILSpy:

The RID column corresponds to the number in the name in the tool output. So, the Method.#3 is the get_Records property getter in PDBFormatReaderTest.cs:28. And this is exactly what I can see in the test source code:

You can also check that the lines look correct compared to what is listed by the tool:
- FilePath getter at line 19
- FilePath setter at line 20
- Records getter at line 28
- Records setter at line 29
Feel free to look at the source code from my Github repository.
DbgHelp provides many more services to look into symbols but that’s all for today!
