[{"content":" In my previous posts, I explained how to use DIA and DbgHelp to map a method to its line in source code. I forgot to mention that it was correct for .NET Core but not for the “old” .NET Framework Windows PDB format. Instead of encoding the method token in the name, the symbol file contains the name of the methods. So, how to do the mapping for .NET Framework assemblies? You will find the answer (plus some tricks) in this article.\nWhen I started to work on the support of the old Windows PDB format, I looked at what existed to parse the raw format and… I decided to try DbgHelp instead. With this first implementation, I realized that some token where missing and source code information was not retrieved for most of the methods.\nSo, I looked for another API to use and I found ISymUnmanagedReader. The usage philosophy is totally different from DIA or DbgHelp.\nA little bit of magic This interface is implemented in diasymreader.dll that comes with every .NET Framework installation. But you need to do COM magic to get it. After having called CoInitialize to setup COM, you ask for an instance of ISymUnmanagedBinder from CLSID_CorSymBinder_SxS:\n1 2 3 4 5 6 7 8 CComPtr\u0026lt;ISymUnmanagedBinder\u0026gt; pBinder; hr = CoCreateInstance( CLSID_CorSymBinder_SxS, NULL, CLSCTX_INPROC_SERVER, IID_ISymUnmanagedBinder, (void**)\u0026amp;pBinder ); From the binder, you can get the ISymUnmanagedReader interface corresponding to the assembly you are interested in with GetReaderForFile. However, there are two tiny details to consider.\nFirst, one parameter expects the path to the assembly, not to the .pdb file. That symbol file has to be stored in the same folder but note that the documentation states that you could have more flexible search with ISymUnmanagedBinder2::GetReaderForFile2 but I did not test it.\nThe second detail is the first parameter: an instance of IMetaDataImport for the same assembly. The steps to get it are… complicated.\nHosting the CLR The idea is to host the .NET Framework and get the corresponding ICLRMetaHost interface:\n1 2 CComPtr\u0026lt;ICLRMetaHost\u0026gt; pMetaHost; HRESULT hr = CLRCreateInstance(CLSID_CLRMetaHost, IID_ICLRMetaHost, (void**)\u0026amp;pMetaHost); Calling the CLRCreateInstance API allows you to get an instance of ICLRMetaHost from which you could enumerate installed version of .NET Framework. In my case, I know which version I want:\n1 2 3 // Get the installed .NET Framework runtime (v4.0+) CComPtr\u0026lt;ICLRRuntimeInfo\u0026gt; pRuntimeInfo; hr = pMetaHost-\u0026gt;GetRuntime(L\u0026#34;v4.0.30319\u0026#34;, IID_ICLRRuntimeInfo, (void**)\u0026amp;pRuntimeInfo); The ICLRRuntimeInfo interface allows you to get access to runtime services via GetInterface:\n1 2 CComPtr\u0026lt;IMetaDataDispenser\u0026gt; pDispenser; hr = pRuntimeInfo-\u0026gt;GetInterface(CLSID_CorMetaDataDispenser, IID_IMetaDataDispenser, (void**)\u0026amp;pDispenser); The service I’m interested in is the IMetadataDispenser interface that allows you to “open a scope” on the assembly you are interested in:\n1 2 3 4 5 6 hr = pDispenser-\u0026gt;OpenScope( wModulePath.c_str(), ofRead, IID_IMetaDataImport, (IUnknown**)\u0026amp;_pMetaDataImport ); Note that the first parameter is the path to the assembly not to the .pdb file. The scope is abstracted by an IMetadataImport interface I have already described and that is needed to call GetReaderForFile: and get the ISymUnmanagedReader:\nhr = pBinder-\u0026gt;GetReaderForFile(_pMetaDataImport, wModulePath.c_str(), nullptr, \u0026amp;_pReader);\nThe road to get symbol details for a method The ISymUnmanagedReader interface implements GetMethod to get details about a given method token via an ISymUnmanagedMethod interface. So, the next question is how to get these tokens. If you remember the previous article, these tokens are from the 06 MethodDef table in the assembly metadata; starting from 06000001 to the last one.\nThis means that you could write a simple loop starting from 1 up to a hardcoded maximum value, call TokenFromRid(index, mdtMethodDef) to get the corresponding token. However, since you are a professional developer, you would search for the exact number of tokens from IMetadataTables retrieved from IMetadataImport:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ULONG cRows = 0; // Get IMetaDataTables interface to query the MethodDef table CComPtr\u0026lt;IMetaDataTables\u0026gt; pTables; hr = _pMetaDataImport-\u0026gt;QueryInterface(IID_IMetaDataTables, (void**)\u0026amp;pTables); if (FAILED(hr) || pTables == nullptr) { cRows = LAST_METHODDEF_TOKEN; } else { // Get the number of rows in the MethodDef table (table index 0x06 = Method) hr = pTables-\u0026gt;GetTableInfo( 0x06, // MethodDef table NULL, // cbRow (not needed) \u0026amp;cRows, // pcRows (number of methods) NULL, // pcCols (not needed) NULL, // piKey (not needed) NULL // ppName (not needed) ); if (FAILED(hr)) { cRows = LAST_METHODDEF_TOKEN; } } Now that you have the number of rows (i.e. number of methods defined in the metadata), it is easy and safe to get method information from symbols:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 for (uint32_t i = 1; i \u0026lt;= cRows; i++) { mdMethodDef token = TokenFromRid(i, mdtMethodDef); CComPtr\u0026lt;ISymUnmanagedMethod\u0026gt; pMethod; hr = _pReader-\u0026gt;GetMethod(token, \u0026amp;pMethod); if (SUCCEEDED(hr) \u0026amp;\u0026amp; pMethod != nullptr) { MethodInfo info; if (GetMethodInfoFromSymbol(pMethod, info)) { _methods.push_back(info); } } } Note that GetMethod might fail (returning E_FAIL) for P/Invoked functions, abstract methods, or methods decorated with DebuggerHidden attribute.\nGive me line and source code! For the other methods with symbol information, you can get its token via the GetToken method. The ISymUnmanagedMethod interface allows low level access to line/column mapping that is beyond the scope of this article. At a high level, positions in source file are named sequence points. Call GetSequencePointCount to get… the number of sequence points for a given method.\nThe next step is to call GetSequencePoints with the number of points you want and the corresponding arrays of offsets, lines, columns, end lines, end columns and ISymUnmanagedDocument. In my case, I’m only interested in where the method starts so the first sequence point is good enough:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 // Get sequence points (source line information) ULONG32 cPoints = 0; hr = pMethod-\u0026gt;GetSequencePointCount(\u0026amp;cPoints); if (SUCCEEDED(hr) \u0026amp;\u0026amp; (cPoints \u0026gt; 0)) { cPoints = 1; // We only need the first sequence point for start line std::vector\u0026lt;ULONG32\u0026gt; offsets(cPoints); std::vector\u0026lt;ULONG32\u0026gt; lines(cPoints); std::vector\u0026lt;ULONG32\u0026gt; columns(cPoints); std::vector\u0026lt;ULONG32\u0026gt; endLines(cPoints); std::vector\u0026lt;ULONG32\u0026gt; endColumns(cPoints); std::vector\u0026lt;ISymUnmanagedDocument*\u0026gt; documents(cPoints); ULONG32 actualCount = 0; hr = pMethod-\u0026gt;GetSequencePoints( cPoints, \u0026amp;actualCount, \u0026amp;offsets[0], \u0026amp;documents[0], \u0026amp;lines[0], \u0026amp;columns[0], \u0026amp;endLines[0], \u0026amp;endColumns[0] ); if (SUCCEEDED(hr) \u0026amp;\u0026amp; (actualCount \u0026gt; 0)) { The source file is described by ISymUnmanagedDocument that provides its name when GetURL is called:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 // Get the first sequence point\u0026#39;s document and line ISymUnmanagedDocument* pDoc = documents[0]; if (pDoc != nullptr) { // Get document URL (file path) ULONG32 urlLen = 0; hr = pDoc-\u0026gt;GetURL(0, \u0026amp;urlLen, NULL); if (SUCCEEDED(hr) \u0026amp;\u0026amp; (urlLen \u0026gt; 0)) { std::vector\u0026lt;WCHAR\u0026gt; url(urlLen); hr = pDoc-\u0026gt;GetURL(urlLen, \u0026amp;urlLen, \u0026amp;url[0]); if (SUCCEEDED(hr)) { // Convert wide string to narrow string int len = WideCharToMultiByte(CP_UTF8, 0, \u0026amp;url[0], urlLen, NULL, 0, NULL, NULL); std::string narrowUrl(len, \u0026#39;\\0\u0026#39;); WideCharToMultiByte(CP_UTF8, 0, \u0026amp;url[0], urlLen, \u0026amp;narrowUrl[0], len, NULL, NULL); info.sourceFile = narrowUrl; } } // NOTE: 0xFEEFEE is a special value indicating hidden lines info.lineNumber = lines[0]; pDoc-\u0026gt;Release(); } } } The final interesting trick is that the line number might have the special 0xFEEFEE value. It means that the line is hidden. I have seen it for methods generated by the C# compiler such as MoveNext for async state machines or anonymous methods:\nThe source code is available from my Github repository.\nHappy coding!\nReferences Archived Microsoft-pdb repository DIA implementation article DbgHelp implementation article ","cover":"https://chrisnas.github.io/posts/2026-02-11_how-to-support-net/1_ZBXLm1oWWKp4pacMcX3g5Q.png","date":"2026-02-11","permalink":"https://chrisnas.github.io/posts/2026-02-11_how-to-support-net/","summary":"\u003chr\u003e\n\u003cp\u003eIn my previous posts, I explained how to use \u003ca href=\"/posts/2025-12-08_how-to-dump-function/\"\u003eDIA\u003c/a\u003e and \u003ca href=\"/posts/2026-01-16_but-where-is-my/\"\u003eDbgHelp\u003c/a\u003e to map a method to its line in source code. I forgot to mention that it was correct for .NET Core but not for the “old” .NET Framework Windows PDB format. Instead of encoding the method token in the name, the symbol file contains the name of the methods. So, how to do the mapping for .NET Framework assemblies? You will find the answer (plus some tricks) in this article.\u003c/p\u003e","title":"How to support .NET Framework PDB format and source line with ISymUnmanagedReader"},{"content":" Introduction In our Datatog continuous .NET profiler implementation, we collect the call stack of a thread when something interesting happens such as an exception is thrown for example. In addition to the method name we would like to figure out at what line in which source code file this method is implemented.\nThis information is usually stored in the program database (.pdb) file that is generated by the compiler when the assembly is generated from the source code. The type and the name of the method are stored in the metadata of the assembly itself but I already told this story before. The .NET compilers support two formats of .pdb: the Portable format for .NET Core and the Windows format for .NET Framework.\nI’ve explained how to use the DIA API and it is now time to show how to leverage the DbgHelp API that is available on all Windows machines (even though it is recommended to always install the latest release via the Debugging Tools for Windows.\nThis time, my goal is to extract from a Windows .pdb file the source code and line information for a given managed method. You can find the corresponding source code of this DumpLine tool in my Github repository.\nStarting with DbgHelp There are two major ways to get access to symbols with DbgHelp: either from a running process (i.e. to map the currently loaded .dll to their associated .pdb files) or from a tool that would explicitly load a .dll or a .pdb file.\nBefore anything, you tell DbgHelp which options you want by calling SymSetOptions:\n1 2 3 4 5 6 7 8 DWORD options = SymGetOptions(); options |= SYMOPT_DEBUG; options |= SYMOPT_LOAD_LINES; // Load line number information options |= SYMOPT_UNDNAME; // Undecorate symbol names //options |= SYMOPT_DEFERRED_LOADS; // Defer symbol loading options |= SYMOPT_EXACT_SYMBOLS; // Require exact symbol match options |= SYMOPT_FAIL_CRITICAL_ERRORS; // Don\u0026#39;t show error dialogs SymSetOptions(options); This is where I ask that line number information should be collected.\nThe next step is to call SymInitialize to setup DbgHelp environment. The first parameter expects a process handle (returned by GetCurrentProcess in my case). You could pass a path where to find the .pdb files for your dlls as a second parameter. In my case, since I will provide a .pdb file path, I don’t need it, and NULL will be passed. It means that, if needed, DbgHelp will use the current folder and the path set in _NT_SYMBOL_PATH and _NT_ALTERNATE_SYMBOL_PATH environment variables.\nThe last boolean parameter tells DbgHelp if you want that SymLoadModule64 to be called for each and every loaded .dll in the given process. Definitively not what I want so I’m passing FALSE.\n1 2 3 4 5 _hProcess = GetCurrentProcess(); if (!SymInitialize(_hProcess, NULL, FALSE)) { _hProcess = NULL; } At that point, I’m ready to load a .pdb file.\nLoading a .pdb file The API is straightforward: just call SymLoadModuleEx :\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 _baseAddress = SymLoadModuleEx( _hProcess, NULL, pdbFilePath.c_str(), NULL, 0x10000000, // arbitrary base address 0, NULL, 0 ); if (_baseAddress == 0) { return false; } The important parameters are the process handle (same as for SymInitialize) and the path of the .pdb file. I’ve lost some time trying to understand why my code was not working due to a weird behavior of this function. You know that it succeeds when the returned address is not 0. Well… This is not 100% correct. If the path you provide does not exist, you won’t get 0 but the base address that you also provide. Even worth, when you call the functions I’ll detail later on, no error will happen but nothing will work as expected. So, I simply check that the file exists:\n1 2 3 4 5 // BUG? : dbghelp does not fail if the .pdb file does not exist... if (GetFileAttributesA(pdbFilePath.c_str()) == INVALID_FILE_ATTRIBUTES) { return false; } Note that it is possible to unload the symbols of a given loaded module by calling SymUnloadModule with the same process handle and its base address: this will reduce the memory consumption if you don’t need the symbols anymore.\nIn case of deferred load symbols, it is needed to call SymGetModuleInfo64 before trying to access the symbols:\n1 2 3 4 5 6 IMAGEHLP_MODULE64 moduleInfo = { 0 }; moduleInfo.SizeOfStruct = sizeof(IMAGEHLP_MODULE64); if (!SymGetModuleInfo64(_hProcess, _baseAddress, \u0026amp;moduleInfo)) { return false; } In addition, this will fill up an IMAGEHLP_MODULE64 structure with possibly interesting details:\nThe .pdb signature and age could be useful to build urls to communicate with symbol servers; but this is another story:\n1 2 3 4 5 6 7 8 9 _age = moduleInfo.PdbAge; GUID guid = moduleInfo.PdbSig70; char strGUID[80]; sprintf_s(strGUID, 80, \u0026#34;%08x%04x%04x%02x%02x%02x%02x%02x%02x%02x%02x\u0026#34;, guid.Data1, guid.Data2, guid.Data3, guid.Data4[0], guid.Data4[1], guid.Data4[2], guid.Data4[3], guid.Data4[4], guid.Data4[5], guid.Data4[6], guid.Data4[7] ); _guid = strGUID; You also know if line numbers are available or not thanks to the LineNumbers field.\nNote that if you asked for deferred symbols option, you won’t get any interesting details:\nOnly the module name and path are provided but nothing else.\nEnumerating the methods It is now time to iterate on the symbols in the loaded .pdb thanks to SymEnumSymbols:\n1 2 3 4 5 6 7 8 9 10 if (!SymEnumSymbols( _hProcess, _baseAddress, \u0026#34;*!*\u0026#34;, // Mask (all symbols) EnumMethodSymbolsCallback, this // User context to store the methods in _methods instance field )) { return false; } In addition to the obvious parameters, this function expects a callback function that will be called for each symbol in the module specified by the process handle and the base address. Note that you can pass any context as the last parameter. In my case, the instance of my DbgHelpParser class is passed to be able to store the methods in a dedicated _methods field:\n1 2 3 4 5 6 7 8 9 struct MethodInfo { std::string name; uint64_t address; uint32_t size; std::string sourceFile; uint32_t lineNumber; }; std::vector\u0026lt;MethodInfo\u0026gt; _methods; The “*!*” mask tells DbgHelp to look for symbols in all modules. This might sound counter intuitive, but the syntax is similar to what you find in WinDBG or Visual Studio: !. This could be useful if you load more than one .pdb.\nThe job of the callback function is to detect the symbols you are interested in from the SYMBOL_INFO structure passed for each matching symbol:\n1 2 3 4 5 6 7 8 9 BOOL CALLBACK DbgHelpParser::EnumMethodSymbolsCallback(PSYMBOL_INFO pSymInfo, ULONG SymbolSize, PVOID UserContext) { DbgHelpParser* parser = reinterpret_cast\u0026lt;DbgHelpParser*\u0026gt;(UserContext); if ( (pSymInfo-\u0026gt;Tag == SymTagFunction) \u0026amp;\u0026amp; ((pSymInfo-\u0026gt;Flags \u0026amp; (SYMFLAG_CLR_TOKEN | SYMFLAG_METADATA)) == (SYMFLAG_CLR_TOKEN | SYMFLAG_METADATA)) ) { The Tag field contains a value from SymTagEnum but, for a managed .pdb file, you will only get SymTagFunction. Also, the Flags field should contain SYMFLAG_CLR_TOKEN and SYMFLAG_METADATA because we are only interested in managed methods.\nNext, you get the name, address and size from other fields before looking for the source file and line details by calling SymGetLineFromAddr64:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 MethodInfo info; info.name = pSymInfo-\u0026gt;Name; info.address = pSymInfo-\u0026gt;Address; info.size = pSymInfo-\u0026gt;Size; // Try to get source file and line information IMAGEHLP_LINE64 line = { 0 }; line.SizeOfStruct = sizeof(IMAGEHLP_LINE64); DWORD displacement = 0; if (SymGetLineFromAddr64(parser-\u0026gt;_hProcess, pSymInfo-\u0026gt;Address, \u0026amp;displacement, \u0026amp;line)) { info.sourceFile = line.FileName ? line.FileName : \u0026#34;\u0026#34;; info.lineNumber = line.LineNumber; } else { info.sourceFile = \u0026#34;\u0026#34;; info.lineNumber = 0; } parser-\u0026gt;_methods.push_back(info); } return TRUE; // Continue enumeration } The callback returns TRUE to continue the enumeration. You could return FALSE if you would look for specific symbols and wanted to speed up the processing.\nThe managed side of the story When the tool is run on a managed assembly, you get the following kind of output:\nThe name of the method does not match at all with the names in my test assembly! Why do all methods have this generic Method.# format?\nWell… the number corresponds to the RID of the corresponding method in the metadata of the assembly. Let’s have a look at what the MethodDef metadata table of this assembly looks like in ILSpy:\nThe RID column corresponds to the number in the name in the tool output. So, the Method.#3 is the get_Records property getter in PDBFormatReaderTest.cs:28. And this is exactly what I can see in the test source code:\nYou can also check that the lines look correct compared to what is listed by the tool:\nFilePath getter at line 19 FilePath setter at line 20 Records getter at line 28 Records setter at line 29 Feel free to look at the source code from my Github repository.\nDbgHelp provides many more services to look into symbols but that’s all for today!\n","cover":"https://chrisnas.github.io/posts/2026-01-16_but-where-is-my/1_KW21HTdZwwDwwKI69wlvUg.png","date":"2026-01-16","permalink":"https://chrisnas.github.io/posts/2026-01-16_but-where-is-my/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn our Datatog continuous .NET profiler implementation, we collect the call stack of a thread when something interesting happens such as an exception is thrown for example. In addition to the method name we would like to figure out at what line in which source code file this method is implemented.\u003c/p\u003e\n\u003cp\u003eThis information is usually stored in the \u003cem\u003eprogram database\u003c/em\u003e (.pdb) file that is generated by the compiler when the assembly is generated from the source code. The type and the name of the method are stored in the metadata of the assembly itself but \u003ca href=\"/posts/2021-09-06_dealing-with-modules-assemblie/\"\u003eI already told this story before\u003c/a\u003e. The .NET compilers support two formats of .pdb: the Portable format for .NET Core and the Windows format for .NET Framework.\u003c/p\u003e","title":"But where is my method code? DbgHelp comes to the rescue"},{"content":" During this R\u0026amp;D week at Datadog, I wanted to implement a tool accepting a .pdb file and generate a .sym file listing functions symbols with their address, size, name with signature and if they are public or private. This post dig into the implementation details of using Microsoft Debug Interface Access (DIA) COM API to achieve these objectives. If you want to see what my vibe coding experience in Cursor was, read this other post instead.\nOne self-contained tool please! I would like the tool to be self-contained but since DIA is based on a COM server, it would require registering msdia40.dll on the machine. Not a good idea. In case the dll is in the same folder as the tool, one could “emulate” the magic done by CoCreateInstance to get an instance of IDiaDataSource (more on this interface soon) by:\nCall LoadLibrary to load the dll in memory Call GetProcAddress to get the DllGetClassObject implementation Call this function to get the IClassFactory implementation Call its CreateInstance method to get an object implementing IDiaDataSource Here is the corresponding code (without error checking for readability)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // Create DIA data source without registration using DLL loading HRESULT PdbSymbolExtractor::NoRegCoCreate(const std::wstring\u0026amp; dllPath, REFCLSID rclsid, REFIID riid, void** ppv) { HMODULE hDll = LoadLibraryW(dllPath.c_str()); typedef HRESULT(__stdcall* DllGetClassObjectFunc)(REFCLSID, REFIID, LPVOID*); DllGetClassObjectFunc pDllGetClassObject = (DllGetClassObjectFunc)GetProcAddress(hDll, \u0026#34;DllGetClassObject\u0026#34;); CComPtr\u0026lt;IClassFactory\u0026gt; pClassFactory; HRESULT hr = pDllGetClassObject(rclsid, IID_IClassFactory, (void**)\u0026amp;pClassFactory); hr = pClassFactory-\u0026gt;CreateInstance(NULL, riid, ppv); // Note: We intentionally don\u0026#39;t call FreeLibrary here because the DLL needs to stay loaded // The COM object references will keep it alive return S_OK; This function is called with the path of msdia40.dll and the UUID of the expected IDiaDataSource:\n1 2 // Create DIA data source without registration hr = NoRegCoCreate(dllPath, CLSID_DiaSource, __uuidof(IDiaDataSource), (void**)\u0026amp;_pDiaDataSource); But still, I don’t want to have two binaries!\nThe trick is to embed the msdia40.dll inside the tool as a Windows resource. In the .rc file, add an RCDATA entry that points to the dll:\n1 IDR_MSDIA_DLL RCDATA \u0026#34;x64\\\\Release\\\\msdia140.dll\u0026#34; You should see it in the Resource View in Visual Studio:\nHere is the code that extracts it as a file on disk is straightforward (error checking has been removed for readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // Extract embedded msdia140.dll from resources bool PdbSymbolExtractor::ExtractEmbeddedDll(const std::wstring\u0026amp; outputPath) { // Find the resource HMODULE hModule = GetModuleHandle(NULL); HRSRC hResource = FindResource(hModule, MAKEINTRESOURCE(IDR_MSDIA_DLL), RT_RCDATA); // Load the resource HGLOBAL hLoadedResource = LoadResource(hModule, hResource); // Lock the resource to get a pointer to the data LPVOID pResourceData = LockResource(hLoadedResource); // Get the size of the resource DWORD resourceSize = SizeofResource(hModule, hResource); // Write the DLL to disk std::ofstream outFile(outputPath, std::ios::binary); outFile.write(static_cast\u0026lt;const char*\u0026gt;(pResourceData), resourceSize); outFile.close(); return true; } Where are my function symbols? After calling NoRegCoCreate(), _pDiaDataSource stores a reference to the entry point into the DIA APIs. Here are the steps to follow before being able to list the symbols:\n1 2 3 4 5 6 7 8 9 10 11 12 HRESULT PdbSymbolExtractor::ExtractSymbolsFromPdb(const std::wstring\u0026amp; pdbPath, std::vector\u0026lt;FunctionSymbol\u0026gt;\u0026amp; symbols) { // Load the PDB file HRESULT hr = _pDiaDataSource-\u0026gt;loadDataFromPdb(pdbPath.c_str()); // Open a session CComPtr\u0026lt;IDiaSession\u0026gt; pSession; hr = _pDiaDataSource-\u0026gt;openSession(\u0026amp;pSession); // Get the global scope CComPtr\u0026lt;IDiaSymbol\u0026gt; pGlobal; hr = pSession-\u0026gt;get_globalScope(\u0026amp;pGlobal); Now, you have the global scope of the symbols, you can ask for an enumerator for the type of symbols you are interested in; SymTagFunction in my case:\n1 2 3 4 5 6 7 // Enumerate all function symbols CComPtr\u0026lt;IDiaEnumSymbols\u0026gt; pEnumSymbols; hr = pGlobal-\u0026gt;findChildren(SymTagFunction, NULL, nsNone, \u0026amp;pEnumSymbols); LONG count = 0; pEnumSymbols-\u0026gt;get_Count(\u0026amp;count); std::wcout \u0026lt;\u0026lt; L\u0026#34;Found \u0026#34; \u0026lt;\u0026lt; count \u0026lt;\u0026lt; L\u0026#34; function symbols\u0026#34; \u0026lt;\u0026lt; std::endl; The pEnumSymbols iterator allows you to loop on each SymTagFunction symbol and get its name:\n1 2 3 4 5 6 7 8 9 while (SUCCEEDED(pEnumSymbols-\u0026gt;Next(1, \u0026amp;pSymbol, \u0026amp;celt)) \u0026amp;\u0026amp; celt == 1) { FunctionSymbol func; // Get function name BSTR bstrName; if (pSymbol-\u0026gt;get_name(\u0026amp;bstrName) == S_OK) { func.name = bstrName; SysFreeString(bstrName); } Note that each symbol details are stored in a FunctionSymbol instance:\n1 2 3 4 5 6 7 8 struct FunctionSymbol { std::wstring name; ... std::wstring signature; // Function signature (parameters only, no return type) DWORD rva; // Relative Virtual Address ULONGLONG length; bool isPublic; }; with the rest of the code in the while() loop:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // Get function signature (parameters only, no return type) func.signature = ExtractFunctionSignature(pSymbol); // Get relative virtual address DWORD rva; if (pSymbol-\u0026gt;get_relativeVirtualAddress(\u0026amp;rva) == S_OK) { func.rva = rva; } // Get function length ULONGLONG length; if (pSymbol-\u0026gt;get_length(\u0026amp;length) == S_OK) { func.length = length; } // Determine if function is public or private // Check access level - default to private func.isPublic = false; DWORD access; if (pSymbol-\u0026gt;get_access(\u0026amp;access) == S_OK) { func.isPublic = (access == CV_public); } pSymbol.Release(); I did not have the time to do more trial for private/public state, but I should have tried by enumerating SymTagPublicSymbol or SymTagExport that could be considered as public.\nBetter with a signature The final step is to figure out the signature of each function. This is where the genericity of DIA could be confusing because so many things are represented by IDiaSymbol: a symbol, a function, the type of a function, or the type of a parameter…\nSo, the type of the function is retrieved as an IDiaSymbol by calling getType() on the function symbol. From that IDiaSymbol, findChildren() lets you iterate on the parameters:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // Extract function signature (parameters only, no return type) std::wstring PdbSymbolExtractor::ExtractFunctionSignature(IDiaSymbol* pSymbol) { if (!pSymbol) { return L\u0026#34;()\u0026#34;; } // Get function type CComPtr\u0026lt;IDiaSymbol\u0026gt; pFunctionType; if (pSymbol-\u0026gt;get_type(\u0026amp;pFunctionType) != S_OK || !pFunctionType) { return L\u0026#34;()\u0026#34;; } // Enumerate function arguments CComPtr\u0026lt;IDiaEnumSymbols\u0026gt; pEnumArgs; if (FAILED(pFunctionType-\u0026gt;findChildren(SymTagFunctionArgType, NULL, nsNone, \u0026amp;pEnumArgs))) { return L\u0026#34;()\u0026#34;; } LONG argCount = 0; pEnumArgs-\u0026gt;get_Count(\u0026amp;argCount); if (argCount == 0) { return L\u0026#34;()\u0026#34;; } Now, the same Next() method is called on the enumerator to iterate on each parameter:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // Build signature string std::wstring signature = L\u0026#34;(\u0026#34;; CComPtr\u0026lt;IDiaSymbol\u0026gt; pArg; ULONG argCelt = 0; bool first = true; while (SUCCEEDED(pEnumArgs-\u0026gt;Next(1, \u0026amp;pArg, \u0026amp;argCelt)) \u0026amp;\u0026amp; argCelt == 1) { if (!first) { signature += L\u0026#34;, \u0026#34;; } first = false; // Get the argument type CComPtr\u0026lt;IDiaSymbol\u0026gt; pArgType; if (pArg-\u0026gt;get_type(\u0026amp;pArgType) == S_OK \u0026amp;\u0026amp; pArgType) { signature += GetTypeName(pArgType); } else { signature += L\u0026#34;?\u0026#34;; } pArg.Release(); } signature += L\u0026#34;)\u0026#34;; return signature; } The final step is to get the name of the type from the IDiaSymbol returned by get_type(). If it is a custom type, call get_name() like any other symbol. Otherwise, for basic types, call get_baseType() and get_length() as shown by the code below:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 std::wstring PdbSymbolExtractor::GetTypeName(IDiaSymbol* pType) { if (!pType) { return L\u0026#34;?\u0026#34;; } // Try to get type name directly BSTR bstrTypeName; if (pType-\u0026gt;get_name(\u0026amp;bstrTypeName) == S_OK \u0026amp;\u0026amp; bstrTypeName \u0026amp;\u0026amp; wcslen(bstrTypeName) \u0026gt; 0) { std::wstring typeName = bstrTypeName; SysFreeString(bstrTypeName); return typeName; } // For basic types or unnamed types, try getting basic type info DWORD baseType = 0; ULONGLONG length = 0; if (pType-\u0026gt;get_baseType(\u0026amp;baseType) == S_OK) { pType-\u0026gt;get_length(\u0026amp;length); // Map basic types to names switch (baseType) { case btVoid: return L\u0026#34;void\u0026#34;; case btChar: return L\u0026#34;char\u0026#34;; case btWChar: return L\u0026#34;wchar_t\u0026#34;; case btBool: return L\u0026#34;bool\u0026#34;; case btInt: case btLong: if (length == 1) return L\u0026#34;char\u0026#34;; else if (length == 2) return L\u0026#34;short\u0026#34;; else if (length == 4) return L\u0026#34;int\u0026#34;; else if (length == 8) return L\u0026#34;__int64\u0026#34;; else return L\u0026#34;int\u0026#34; + std::to_wstring(length * 8); case btUInt: case btULong: if (length == 1) return L\u0026#34;unsigned char\u0026#34;; else if (length == 2) return L\u0026#34;unsigned short\u0026#34;; else if (length == 4) return L\u0026#34;unsigned int\u0026#34;; else if (length == 8) return L\u0026#34;unsigned __int64\u0026#34;; else return L\u0026#34;uint\u0026#34; + std::to_wstring(length * 8); case btFloat: if (length == 4) return L\u0026#34;float\u0026#34;; else if (length == 8) return L\u0026#34;double\u0026#34;; else return L\u0026#34;float\u0026#34; + std::to_wstring(length * 8); default: return L\u0026#34;?\u0026#34;; } } return L\u0026#34;?\u0026#34;; } This is a “simple” implementation that does not take pointers, addresses, arrays, and more into account. For a more complete solution, I would recommend looking at the PrintType() implementation in the DIA2Dump code sample that is installed with Visual Studio.\nI hope this will get your foot in the door of symbol parsing and make you want to dig further into DIA.\nReferences Corresponding source code is available in my github repository. Archived Microsoft documentation/implementation of .pdb format including a symbol dumper code. DIA2Dump Visual Studio code sample. ","cover":"https://chrisnas.github.io/posts/2025-12-08_how-to-dump-function/1_-GYj2ovADruJxXqRMF930A.png","date":"2025-12-08","permalink":"https://chrisnas.github.io/posts/2025-12-08_how-to-dump-function/","summary":"\u003chr\u003e\n\u003cp\u003eDuring this R\u0026amp;D week at Datadog, I wanted to implement a tool accepting a .pdb file and generate a .sym file listing functions symbols with their address, size, name with signature and if they are public or private. This post dig into the implementation details of using \u003ca href=\"https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk??WT.mc_id=DT-MVP-5003325\"\u003eMicrosoft Debug Interface Access (DIA) COM API\u003c/a\u003e to achieve these objectives. If you want to see what my vibe coding experience in Cursor was, read \u003ca href=\"/posts/2025-12-08_vibe-coding-pdb-dumper/\"\u003ethis other post\u003c/a\u003e instead.\u003c/p\u003e","title":"How to dump function symbols from a .pdb file"},{"content":" During this R\u0026amp;D week at Datadog, I wanted to implement a tool accepting a .pdb file and generate a .sym file listing functions symbols with their address, size, name with signature and if they are public or private. This post dig into the implementation details of using Microsoft Debug Interface Access (DIA) COM API to achieve these objectives. If you want to see what my vibe coding experience in Cursor was, read this other post instead.\nOne self-contained tool please! I would like the tool to be self-contained but since DIA is based on a COM server, it would require registering msdia40.dll on the machine. Not a good idea. In case the dll is in the same folder as the tool, one could “emulate” the magic done by CoCreateInstance to get an instance of IDiaDataSource (more on this interface soon) by:\nCall LoadLibrary to load the dll in memory Call GetProcAddress to get the DllGetClassObject implementation Call this function to get the IClassFactory implementation Call its CreateInstance method to get an object implementing IDiaDataSource Here is the corresponding code (without error checking for readability)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // Create DIA data source without registration using DLL loading HRESULT PdbSymbolExtractor::NoRegCoCreate(const std::wstring\u0026amp; dllPath, REFCLSID rclsid, REFIID riid, void** ppv) { HMODULE hDll = LoadLibraryW(dllPath.c_str()); typedef HRESULT(__stdcall* DllGetClassObjectFunc)(REFCLSID, REFIID, LPVOID*); DllGetClassObjectFunc pDllGetClassObject = (DllGetClassObjectFunc)GetProcAddress(hDll, \u0026#34;DllGetClassObject\u0026#34;); CComPtr\u0026lt;IClassFactory\u0026gt; pClassFactory; HRESULT hr = pDllGetClassObject(rclsid, IID_IClassFactory, (void**)\u0026amp;pClassFactory); hr = pClassFactory-\u0026gt;CreateInstance(NULL, riid, ppv); // Note: We intentionally don\u0026#39;t call FreeLibrary here because the DLL needs to stay loaded // The COM object references will keep it alive return S_OK; This function is called with the path of msdia40.dll and the UUID of the expected IDiaDataSource:\n1 2 // Create DIA data source without registration hr = NoRegCoCreate(dllPath, CLSID_DiaSource, __uuidof(IDiaDataSource), (void**)\u0026amp;_pDiaDataSource); But still, I don’t want to have two binaries!\nThe trick is to embed the msdia40.dll inside the tool as a Windows resource. In the .rc file, add an RCDATA entry that points to the dll:\n1 IDR_MSDIA_DLL RCDATA \u0026#34;x64\\\\Release\\\\msdia140.dll\u0026#34; You should see it in the Resource View in Visual Studio:\nHere is the code that extracts it as a file on disk is straightforward (error checking has been removed for readability):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // Extract embedded msdia140.dll from resources bool PdbSymbolExtractor::ExtractEmbeddedDll(const std::wstring\u0026amp; outputPath) { // Find the resource HMODULE hModule = GetModuleHandle(NULL); HRSRC hResource = FindResource(hModule, MAKEINTRESOURCE(IDR_MSDIA_DLL), RT_RCDATA); // Load the resource HGLOBAL hLoadedResource = LoadResource(hModule, hResource); // Lock the resource to get a pointer to the data LPVOID pResourceData = LockResource(hLoadedResource); // Get the size of the resource DWORD resourceSize = SizeofResource(hModule, hResource); // Write the DLL to disk std::ofstream outFile(outputPath, std::ios::binary); outFile.write(static_cast\u0026lt;const char*\u0026gt;(pResourceData), resourceSize); outFile.close(); return true; } Where are my function symbols? After calling NoRegCoCreate(), _pDiaDataSource stores a reference to the entry point into the DIA APIs. Here are the steps to follow before being able to list the symbols:\n1 2 3 4 5 6 7 8 9 10 11 12 HRESULT PdbSymbolExtractor::ExtractSymbolsFromPdb(const std::wstring\u0026amp; pdbPath, std::vector\u0026lt;FunctionSymbol\u0026gt;\u0026amp; symbols) { // Load the PDB file HRESULT hr = _pDiaDataSource-\u0026gt;loadDataFromPdb(pdbPath.c_str()); // Open a session CComPtr\u0026lt;IDiaSession\u0026gt; pSession; hr = _pDiaDataSource-\u0026gt;openSession(\u0026amp;pSession); // Get the global scope CComPtr\u0026lt;IDiaSymbol\u0026gt; pGlobal; hr = pSession-\u0026gt;get_globalScope(\u0026amp;pGlobal); Now, you have the global scope of the symbols, you can ask for an enumerator for the type of symbols you are interested in; SymTagFunction in my case:\n1 2 3 4 5 6 7 // Enumerate all function symbols CComPtr\u0026lt;IDiaEnumSymbols\u0026gt; pEnumSymbols; hr = pGlobal-\u0026gt;findChildren(SymTagFunction, NULL, nsNone, \u0026amp;pEnumSymbols); LONG count = 0; pEnumSymbols-\u0026gt;get_Count(\u0026amp;count); std::wcout \u0026lt;\u0026lt; L\u0026#34;Found \u0026#34; \u0026lt;\u0026lt; count \u0026lt;\u0026lt; L\u0026#34; function symbols\u0026#34; \u0026lt;\u0026lt; std::endl; The pEnumSymbols iterator allows you to loop on each SymTagFunction symbol and get its name:\n1 2 3 4 5 6 7 8 9 while (SUCCEEDED(pEnumSymbols-\u0026gt;Next(1, \u0026amp;pSymbol, \u0026amp;celt)) \u0026amp;\u0026amp; celt == 1) { FunctionSymbol func; // Get function name BSTR bstrName; if (pSymbol-\u0026gt;get_name(\u0026amp;bstrName) == S_OK) { func.name = bstrName; SysFreeString(bstrName); } Note that each symbol details are stored in a FunctionSymbol instance:\n1 2 3 4 5 6 7 8 struct FunctionSymbol { std::wstring name; ... std::wstring signature; // Function signature (parameters only, no return type) DWORD rva; // Relative Virtual Address ULONGLONG length; bool isPublic; }; with the rest of the code in the while() loop:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // Get function signature (parameters only, no return type) func.signature = ExtractFunctionSignature(pSymbol); // Get relative virtual address DWORD rva; if (pSymbol-\u0026gt;get_relativeVirtualAddress(\u0026amp;rva) == S_OK) { func.rva = rva; } // Get function length ULONGLONG length; if (pSymbol-\u0026gt;get_length(\u0026amp;length) == S_OK) { func.length = length; } // Determine if function is public or private // Check access level - default to private func.isPublic = false; DWORD access; if (pSymbol-\u0026gt;get_access(\u0026amp;access) == S_OK) { func.isPublic = (access == CV_public); } pSymbol.Release(); I did not have the time to do more trial for private/public state, but I should have tried by enumerating SymTagPublicSymbol or SymTagExport that could be considered as public.\nBetter with a signature The final step is to figure out the signature of each function. This is where the genericity of DIA could be confusing because so many things are represented by IDiaSymbol: a symbol, a function, the type of a function, or the type of a parameter…\nSo, the type of the function is retrieved as an IDiaSymbol by calling getType() on the function symbol. From that IDiaSymbol, findChildren() lets you iterate on the parameters:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // Extract function signature (parameters only, no return type) std::wstring PdbSymbolExtractor::ExtractFunctionSignature(IDiaSymbol* pSymbol) { if (!pSymbol) { return L\u0026#34;()\u0026#34;; } // Get function type CComPtr\u0026lt;IDiaSymbol\u0026gt; pFunctionType; if (pSymbol-\u0026gt;get_type(\u0026amp;pFunctionType) != S_OK || !pFunctionType) { return L\u0026#34;()\u0026#34;; } // Enumerate function arguments CComPtr\u0026lt;IDiaEnumSymbols\u0026gt; pEnumArgs; if (FAILED(pFunctionType-\u0026gt;findChildren(SymTagFunctionArgType, NULL, nsNone, \u0026amp;pEnumArgs))) { return L\u0026#34;()\u0026#34;; } LONG argCount = 0; pEnumArgs-\u0026gt;get_Count(\u0026amp;argCount); if (argCount == 0) { return L\u0026#34;()\u0026#34;; } Now, the same Next() method is called on the enumerator to iterate on each parameter:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // Build signature string std::wstring signature = L\u0026#34;(\u0026#34;; CComPtr\u0026lt;IDiaSymbol\u0026gt; pArg; ULONG argCelt = 0; bool first = true; while (SUCCEEDED(pEnumArgs-\u0026gt;Next(1, \u0026amp;pArg, \u0026amp;argCelt)) \u0026amp;\u0026amp; argCelt == 1) { if (!first) { signature += L\u0026#34;, \u0026#34;; } first = false; // Get the argument type CComPtr\u0026lt;IDiaSymbol\u0026gt; pArgType; if (pArg-\u0026gt;get_type(\u0026amp;pArgType) == S_OK \u0026amp;\u0026amp; pArgType) { signature += GetTypeName(pArgType); } else { signature += L\u0026#34;?\u0026#34;; } pArg.Release(); } signature += L\u0026#34;)\u0026#34;; return signature; } The final step is to get the name of the type from the IDiaSymbol returned by get_type(). If it is a custom type, call get_name() like any other symbol. Otherwise, for basic types, call get_baseType() and get_length() as shown by the code below:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 std::wstring PdbSymbolExtractor::GetTypeName(IDiaSymbol* pType) { if (!pType) { return L\u0026#34;?\u0026#34;; } // Try to get type name directly BSTR bstrTypeName; if (pType-\u0026gt;get_name(\u0026amp;bstrTypeName) == S_OK \u0026amp;\u0026amp; bstrTypeName \u0026amp;\u0026amp; wcslen(bstrTypeName) \u0026gt; 0) { std::wstring typeName = bstrTypeName; SysFreeString(bstrTypeName); return typeName; } // For basic types or unnamed types, try getting basic type info DWORD baseType = 0; ULONGLONG length = 0; if (pType-\u0026gt;get_baseType(\u0026amp;baseType) == S_OK) { pType-\u0026gt;get_length(\u0026amp;length); // Map basic types to names switch (baseType) { case btVoid: return L\u0026#34;void\u0026#34;; case btChar: return L\u0026#34;char\u0026#34;; case btWChar: return L\u0026#34;wchar_t\u0026#34;; case btBool: return L\u0026#34;bool\u0026#34;; case btInt: case btLong: if (length == 1) return L\u0026#34;char\u0026#34;; else if (length == 2) return L\u0026#34;short\u0026#34;; else if (length == 4) return L\u0026#34;int\u0026#34;; else if (length == 8) return L\u0026#34;__int64\u0026#34;; else return L\u0026#34;int\u0026#34; + std::to_wstring(length * 8); case btUInt: case btULong: if (length == 1) return L\u0026#34;unsigned char\u0026#34;; else if (length == 2) return L\u0026#34;unsigned short\u0026#34;; else if (length == 4) return L\u0026#34;unsigned int\u0026#34;; else if (length == 8) return L\u0026#34;unsigned __int64\u0026#34;; else return L\u0026#34;uint\u0026#34; + std::to_wstring(length * 8); case btFloat: if (length == 4) return L\u0026#34;float\u0026#34;; else if (length == 8) return L\u0026#34;double\u0026#34;; else return L\u0026#34;float\u0026#34; + std::to_wstring(length * 8); default: return L\u0026#34;?\u0026#34;; } } return L\u0026#34;?\u0026#34;; } This is a “simple” implementation that does not take pointers, addresses, arrays, and more into account. For a more complete solution, I would recommend looking at the PrintType() implementation in the DIA2Dump code sample that is installed with Visual Studio.\nI hope this will get your foot in the door of symbol parsing and make you want to dig further into DIA.\nReferences Corresponding source code is available in my github repository. Archived Microsoft documentation/implementation of .pdb format including a symbol dumper code. DIA2Dump Visual Studio code sample. ","cover":"https://chrisnas.github.io/posts/2025-12-08_vibe-coding-pdb-dumper/1_sSP8l9m3GJ4PDA3FPRywww.png","date":"2025-12-08","permalink":"https://chrisnas.github.io/posts/2025-12-08_vibe-coding-pdb-dumper/","summary":"\u003chr\u003e\n\u003cp\u003eDuring this R\u0026amp;D week at Datadog, I wanted to implement a tool accepting a .pdb file and generate a .sym file listing functions symbols with their address, size, name with signature and if they are public or private. This post dig into the implementation details of using \u003ca href=\"https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk??WT.mc_id=DT-MVP-5003325\"\u003eMicrosoft Debug Interface Access (DIA) COM API\u003c/a\u003e to achieve these objectives. If you want to see what my vibe coding experience in Cursor was, read \u003ca href=\"/posts/2025-12-08_vibe-coding-pdb-dumper/\"\u003ethis other post\u003c/a\u003e instead.\u003c/p\u003e","title":"Vibe coding a .pdb dumper or how I became a Product Manager"},{"content":" In the previous article, I presented what is needed (i.e. listen to WaitHandleWait events) to compute lock/wait durations and call stacks for Mutex, Semaphore, SemaphoreSlim, Manual/AutoResetEvent, ManualResetEventSlim, ReaderWriterLockSlim .NET synchronization constructs for a running process.\nHowever, since the application is already running, some JIT-related events are missing, and some frames of the call stacks cannot be symbolized. Also, it would be great to monitor an application’s startup to see if it could be faster.\nThis post will detail how to monitor a .NET application since the very beginning of its life and the issues you might face.\nPreparing a new .NET process to be monitored From .NET 5, the dotnet-trace CLI tool allows you to pass a command line to execute and trace it from startup. In a very interesting article, Olivier Coanet presented the gory details about how to tell the .NET runtime to start an application in a pseudo-suspended mode as shown in the following diagram:\nThe first step is to create a ReverseDiagnosticsServer instance with a specific port (i.e. dotnet-wait_1234 in the diagram). Next, the process to monitor is spawned with the DOTNET_DiagnosticPorts environment variable set to the same port (i.e. dotnet-wait_1234). Look at the Diagnostics documentation of the Diagnostic Ports with DOTNET_DiagnosticPorts environment variable for more details. The .NET runtime is the new process will listen to this port and… wait.\nWhen the tool is ready, it sends a resume command via a DiagnosticsClient: from that point in time, the CLR executes the normal flow of actions to run the application and… you will receive all events without missing one!\nGet my command line please Following the dotnet-trace example, my dotnet-wait tool accepts the command line of the child process in its final arguments that follow the** — **trigger. For example, dotnet-wait — dotnet foo.dll will start the program in foo.dll by using dotnet.exe. I’m reusing the code in ReversedServerHelper.cs to deal with arguments containing spaces:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 else if (current == \u0026#34;--\u0026#34;) // this is supposed to be the last one { i++; if (i \u0026lt; args.Length) { parameters.pathName = args[i]; // use the remaining arguments as the arguments for the child app to spawn i++; if (i \u0026lt; args.Length) { parameters.arguments = \u0026#34;\u0026#34;; for (int j = i; j \u0026lt; args.Length; j++) { if (args[j].Contains(\u0026#39; \u0026#39;)) { parameters.arguments += $\u0026#34;\\\u0026#34;{args[j].Replace(\u0026#34;\\\u0026#34;\u0026#34;, \u0026#34;\\\\\\\u0026#34;\u0026#34;)}\\\u0026#34;\u0026#34;; } else { parameters.arguments += args[j]; } if (j != args.Length) { parameters.arguments += \u0026#34; \u0026#34;; } } } // no need to look for more arguments break; } else { throw new InvalidOperationException($\u0026#34;Missing path name value...\u0026#34;); } } The code to spawn the child process is simple:\n1 2 3 4 5 6 7 8 9 // start the monitored app var psi = new ProcessStartInfo(pathName); if (!string.IsNullOrEmpty(arguments)) { psi.Arguments = arguments; } psi.EnvironmentVariables[\u0026#34;DOTNET_DiagnosticPorts\u0026#34;] = port; psi.UseShellExecute = false; var process = System.Diagnostics.Process.Start(psi); Here is an example with the following prompt:\n-- dotnet \u0026#34;C:\\CommandLineTest.dll\u0026#34; one two \u0026#34;t h r e e\u0026#34; four \u0026#39;five six\u0026#39; that generates the output (the test application is just listing its arguments):\ndotnet-wait v1.0.0.0 - List wait duration by Christophe Nasarre Press ENTER to exit... 6 arguments 1 | one 2 | two 3 | t h r e e 4 | four 5 | \u0026#39;five 6 | six\u0026#39; This test reminded me to never use simple quotes in prompts :^)\nIt’s my console! Once I implemented these steps, I immediately faced a very simple problem: my dotnet-wait tool and the test application are console applications. It means that they will share the same console for both input and output. For example, both are waiting for the RETURN key to (1) stop for the tool and (2) start for the test application: too bad for me because the tool will stop as soon the application starts…\nGoing back in time in my Windows memories, I remembered that the Win32 CreateProcess API accepts CREATE_NEW_CONSOLE as creation flag to automagically start the child process into its own new console. Unfortunately, it is not possible to pass this flag in .NET; maybe a limitation due to Linux support.\nOne simple solution could be to redirect the output of the tool or the application to a file: that would avoid mixing them in the console. Note that, by default, dotnet-trace discards output from the child process (by setting RedirectStandardOutput, RedirectStandardError and RedirectStandardInput to false and by ignoring the error and output streams) except if you pass — show-child-io on the command line. In this case, no output for dotnet-trace.\nI decided to do the opposite for dotnet-wait: by default, you also get the child output but you can redirect the output of the tool to a file with -o . Still, this does not solve the input problem in case of common expected keys.\nIf you remember the interactions between the tool and the monitored application, the latter is suspended until DiagnosticsClient::ResumeRuntime is called. So, why not starting the tool that spawns the application in one console and another instance of the tool in a new console that will resume the application? This is exactly what my friend Kevin Gosse imagined and how dotnet-wait works.\nAfter the timeout that you give to diagnosticsServer.AcceptAsync(cancellation.Token) has elapsed, the runtime in the child process will display the following message:\nThe runtime has been configured to pause during startup and is awaiting a Diagnostics IPC ResumeStartup command from a Diagnostic Port. DOTNET_DiagnosticPorts=\u0026#34;dotnet-wait_34296\u0026#34; DOTNET_DefaultDiagnosticPortSuspend=0 And this is exactly what the -r 34296 parameter will do!\nYou can now install dotnet wait and monitor the lock and wait contentions of your .NET9+ applications.\n","cover":"https://chrisnas.github.io/posts/2025-03-13_how-to-monitor-net/1_u9iVEg9L5Z_DVpn6jtcM2w.png","date":"2025-03-13","permalink":"https://chrisnas.github.io/posts/2025-03-13_how-to-monitor-net/","summary":"\u003chr\u003e\n\u003cp\u003eIn \u003ca href=\"/posts/2025-01-13_measuring-the-impact-of/\"\u003ethe previous article\u003c/a\u003e, I presented what is needed (i.e. listen to \u003cstrong\u003eWaitHandleWait\u003c/strong\u003e events) to compute lock/wait durations and call stacks for \u003cstrong\u003eMutex\u003c/strong\u003e, \u003cstrong\u003eSemaphore\u003c/strong\u003e, \u003cstrong\u003eSemaphoreSlim\u003c/strong\u003e, \u003cstrong\u003eManual\u003c/strong\u003e/\u003cstrong\u003eAutoResetEvent\u003c/strong\u003e, \u003cstrong\u003eManualResetEventSlim\u003c/strong\u003e, \u003cstrong\u003eReaderWriterLockSlim\u003c/strong\u003e .NET synchronization constructs for a running process.\u003c/p\u003e\n\u003cp\u003eHowever, since the application is already running, some JIT-related events are missing, and some frames of the call stacks cannot be symbolized. Also, it would be great to monitor an application’s startup to see if it could be faster.\u003c/p\u003e","title":"How to monitor .NET applications startup"},{"content":" Introduction In an old post, I detailed how to use ContentionStart and ContentionStop events to measure the lock contentions duration for a .NET application. In a .NET 9 pull request, a former Criteo’s colleague Grégoire Verdier has added new events to be notified when wait time similar to lock contention is happening for Mutex, Semaphore, Manual/AutoResetEvent. Read his post for more details about what he was trying to investigate.\nWith asynchronous and multi-threaded algorithms, it is essential to detect unexpected wait/locks in our applications. This post shows you how to leverage these events to measure the duration of these waits and get the call stack when the wait started:\nNew WaitHandleWait events These new events are emitted by the Microsoft-Windows-DotNETRuntime CLR provider when you enable the WaitHandle (= 0x40000000000) keyword with Verbose verbosity. Each time WaitOne is called on a waitable object and this object is already owned, a WaitHandleWaitStart event is emitted. When the object is released, a WaitHandleWaitStop event is emitted.\nFor example, the following code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 static Mutex mutex = new Mutex(); static void Main() { var owningThread = new Thread(OwningThread); owningThread.Start(); var mutexThread = new Thread(MutexThread); mutexThread.Start(); owningThread.Join(); mutexThread.Join(); } static void OwningThread() { Console.WriteLine($\u0026#34; [{GetCurrentThreadId(), 8}] Start to hold resources\u0026#34;); Console.WriteLine(\u0026#34;___________________________________________\u0026#34;); mutex.WaitOne(); Thread.Sleep(3000); // the wait should last ~3 seconds Console.WriteLine(\u0026#34; Release resources\u0026#34;); mutex.ReleaseMutex(); } static void MutexThread() { Console.WriteLine($\u0026#34; [{GetCurrentThreadId(), 8}] waiting for Mutex...\u0026#34;); mutex.WaitOne(); // events are emitted in the implementation when a contention happens mutex.ReleaseMutex(); Console.WriteLine(\u0026#34; \u0026lt;-- Mutex\u0026#34;); } generates a Start and Stop events pair:\n1 2 125980 | 00000000-0000-0000-0000-000000000000 \u0026gt; event 301 __ [ 1| Start] WaitHandleWait/Start 125980 | 00000000-0000-0000-0000-000000000000 \u0026gt; event 302 __ [ 2| Stop] WaitHandleWait/Stop There is no associated activity ID so you rely on the fact that the same waiter thread (125980 in the previous example) is emitting for both events.\nListening to the new Wait events As usual, you should rely on the TraceEvent nuget to start an EventPipe session with an already running .NET application. The last version already contains the definition of the keyword:\n1 keywords |= ClrTraceEventParser.Keywords.WaitHandle; // .NET 9 WaitHandle kind of contention and the C# events for Start and Stop:\n1 2 source.Clr.WaitHandleWaitStart += OnWaitHandleWaitStart; source.Clr.WaitHandleWaitStop += OnWaitHandleWaitStop; The handler’s implementation is straightforward. The start of the wait is recorded for the current thread:\n1 2 3 4 5 6 7 8 9 10 private void OnWaitHandleWaitStart(WaitHandleWaitStartTraceData data) { // get the contention info for the current thread ContentionInfo info = _contentionStore.GetContentionInfo(data.ProcessID, data.ThreadID); if (info == null) return; // keep track of the wait start info.ContentionStartRelativeMSec = data.TimeStampRelativeMSec; } When the wait ends, the duration is computed based on the recorded wait start because it is not provided in the payload like for ContentionStop:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 private void OnWaitHandleWaitStop(WaitHandleWaitStopTraceData data) { ContentionInfo info = _contentionStore.GetContentionInfo(data.ProcessID, data.ThreadID); if (info == null) return; // unlucky case when we start to listen just after the WaitHandleStart event if (info.ContentionStartRelativeMSec == 0) { return; } // Too bad the duration is not provided in the payload like in ContentionStop... var contentionDurationMSec = data.TimeStampRelativeMSec - info.ContentionStartRelativeMSec; info.ContentionStartRelativeMSec = 0; var duration = TimeSpan.FromMilliseconds(contentionDurationMSec); Console.WriteLine($\u0026#34;{e.ThreadId,7} | {e.Duration.TotalMilliseconds} ms\u0026#34;); } This is nice but it would be more useful if we could get the call stack of long waits.\nCall stacks with EventPipe In a previous post, I explained that it is possible to get the call stack when an event is emitted thanks to the ClrStackWalk event that follows the event you are interested in. Unfortunately, this is not more the case for .NET 5+ that is using EventPipe instead of ETW.\nAs Olivier Coanet presents in his post, you can get the call stack as an array of addresses from the hidden event record that is mapped by the TraceEvent parameter passed to each event handlers. This EVENT_RECORD structure contains a ExtendedData field that is an array of EVENT_HEADER_EXTENDED_DATA_ITEM:\n1 2 3 4 5 6 7 8 public struct EVENT_HEADER_EXTENDED_DATA_ITEM { public ushort Reserved1; public ushort ExtType; public ushort Reserved2; public ushort DataSize; public ulong DataPtr; } If the ExtType value is EVENT_HEADER_EXT_TYPE_STACK_TRACE64 (=6) then DataPtr points to a EVENT_EXTENDED_ITEM_STACK_TRACE64 structure:\n1 2 3 4 5 public struct EVENT_EXTENDED_ITEM_STACK_TRACE64 { public ulong MatchId; public unsafe fixed ulong Address[1]; } that contains an array of 64-bit addresses. The size of this array is given by DataSize — sizeof(ulong).\nFor 32-bit applications, you will get EVENT_HEADER_EXT_TYPE_STACK_TRACE32 (=5) as ExtType value and DataPtr will point to EVENT_EXTENDED_ITEM_STACK_TRACE32:\n1 2 3 4 5 public struct EVENT_EXTENDED_ITEM_STACK_TRACE32 { public ulong MatchId; public unsafe fixed uint Address[1]; } that stores an array of 32-bit addresses.\nKnowing that makes writing the code to get the call stacks as an array of 64-bit addresses (same with 32-bit applications for simplicity sake) pretty straightforward:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 public static EventPipeUnresolvedStack ReadStackUsingUnsafeAccessor(TraceEvent traceEvent) { return GetFromEventRecord(traceEvent.eventRecord); } private static EventPipeUnresolvedStack GetFromEventRecord(TraceEventNativeMethods.EVENT_RECORD* eventRecord) { if (eventRecord == null) return null; var extendedDataCount = eventRecord-\u0026gt;ExtendedDataCount; for (var dataIndex = 0; dataIndex \u0026lt; extendedDataCount; dataIndex++) { var extendedData = eventRecord-\u0026gt;ExtendedData[dataIndex]; if (extendedData.ExtType == TraceEventNativeMethods.EVENT_HEADER_EXT_TYPE_STACK_TRACE64) { var stackRecord = (TraceEventNativeMethods.EVENT_EXTENDED_ITEM_STACK_TRACE64*)extendedData.DataPtr; var addresses = \u0026amp;stackRecord-\u0026gt;Address[0]; var addressCount = (extendedData.DataSize - sizeof(UInt64)) / sizeof(UInt64); if (addressCount == 0) return null; var callStackAddresses = new ulong[addressCount]; for (var index = 0; index \u0026lt; addressCount; index++) { callStackAddresses[index] = addresses[index]; } return new EventPipeUnresolvedStack(callStackAddresses); } else if (extendedData.ExtType == TraceEventNativeMethods.EVENT_HEADER_EXT_TYPE_STACK_TRACE32) { var stackRecord = (TraceEventNativeMethods.EVENT_EXTENDED_ITEM_STACK_TRACE32*)extendedData.DataPtr; var addresses = \u0026amp;stackRecord-\u0026gt;Address[0]; var addressCount = (extendedData.DataSize - sizeof(UInt32)) / sizeof(UInt32); if (addressCount == 0) return null; var callStackAddresses = new ulong[addressCount]; // store the 32 addresses as 64 bit addresses for (var index = 0; index \u0026lt; addressCount; index++) { callStackAddresses[index] = addresses[index]; } return new EventPipeUnresolvedStack(callStackAddresses); } } return null; } Note that the last version of TraceEvent nuget provides a public access to the eventRecord field so it is no more needed to use the UnsafeAccessor attribute used by Olivier.\nSymbolize the call stack addresses Address is good but the corresponding method name is better. I won’t repeat what I’ve already detailed in an older post that shows how to get the name of a native and managed name from an instruction pointer address. Instead, I want to pinpoint a big limitation of this solution to listen to CLR provider MethodLoadVerbose/MethodDCStartVerboseV2 events. If the methods you are interested in are jitted BEFORE your tool attaches to the application, you will never get these events.\nYou could get the same mapping address span/method name via the other “Microsoft-Windows-DotNETRuntimeRundown” provider and its MethodDCEndVerbose event that contains the expected MethodStartAddress, MethodSize and MethodName in its payload. But I need this information before the end of the application…\nLooking at the documentation, it seems that the rundown provider accepts the StartRundownKeyword value to emit the DCStart events when the provider is enabled! Since .NET 9, it is possible to pass the keywords you want (before, the default value did not contain StartRundownKeyword) when creating the EventPipe session\n1 2 3 4 5 6 7 8 9 10 11 // V-- this is the default rundown keyword rundownKeywords = 0x80020139 | (long)ClrTraceEventParser.Keywords.StartEnumeration; var config = new EventPipeSessionConfiguration(GetProviders(), 256, rundownKeywords, true); using (var session = client.StartEventPipeSession(config)) { var source = new EventPipeEventSource(session.EventStream); RegisterListeners(source); // this is a blocking call source.Process(); } Note that you should not add the rundown provider to the list passed as parameter.\nUnfortunately, there is currently an issue in the runtime since September 2020 that pinpoints this exact problem. I even tried to create and close a session to get the DCStop events before recreating a new one, but I failed.\nThe next episode will talk about how it is possible to start a .NET application and get the events since its startup… with the problems that are happening.\n","cover":"https://chrisnas.github.io/posts/2025-01-13_measuring-the-impact-of/1_OTO7qWO5aYvNXprhaPvRoA.png","date":"2025-01-13","permalink":"https://chrisnas.github.io/posts/2025-01-13_measuring-the-impact-of/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn an \u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003eold post\u003c/a\u003e, I detailed how to use \u003cstrong\u003eContentionStart\u003c/strong\u003e and \u003cstrong\u003eContentionStop\u003c/strong\u003e events to measure the lock contentions duration for a .NET application. In a \u003ca href=\"https://github.com/DataDog/dd-trace-dotnet/issues/5814\"\u003e.NET 9 pull request\u003c/a\u003e, a former Criteo’s colleague \u003ca href=\"https://www.linkedin.com/in/gregoire-verdier\"\u003eGrégoire Verdier\u003c/a\u003e has added new events to be notified when wait time similar to lock contention is happening for Mutex, Semaphore, Manual/AutoResetEvent. Read \u003ca href=\"https://techblog.criteo.com/a-perfview-alternative-in-webassembly-f6833820b699\"\u003ehis post\u003c/a\u003e for more details about what he was trying to investigate.\u003c/p\u003e\n\u003cp\u003eWith asynchronous and multi-threaded algorithms, it is essential to detect unexpected wait/locks in our applications. This post shows you how to leverage these events to measure the duration of these waits and get the call stack when the wait started:\u003c/p\u003e","title":"Measuring the impact of locks and waits on latency in your .NET apps"},{"content":" In the previous post, I detailed how I used the undocumented events from the BCL to create the dotnet-http CLI tool to monitor your outgoing HTTP requests. After testing with older versions of .NET, I realized that the code needed to be updated and I’m sharing my findings in this post.\nThe main point is that url redirections could have a major impact on requests latency:\nAlways test supported versions… When I wrote the initial version of dotnet-http, I only tested it with .NET 8 and .NET 9 with limited formats of urls. Unfortunately, things went bad when I tried to monitor applications running on .NET 5 and .NET 6: no events are emitted by these versions of the BCL.\nSo, the next step was to test .NET 7 and the result was simple: crash! After investigating, I realized that some events I looked at in .NET 8 source code did not have the same payload in .NET 7; even no payload at all:\nEven though there is no version field in the events payload, it is easy to check its size such as the following:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 private void OnConnectionEstablished( DateTime timestamp, int threadId, Guid activityId, Guid relatedActivityId, byte[] eventData ) { ... EventSourcePayload payload = new EventSourcePayload(eventData); var versionMajor = payload.GetByte(); var versionMinor = payload.GetByte(); // in .NET 7, nothing else is available Int64 connectionId = 0; var scheme = \u0026#34;\u0026#34;; var host = \u0026#34;\u0026#34;; UInt32 port = 0; var path = \u0026#34;\u0026#34;; if (eventData.Length \u0026gt; 2) { connectionId = payload.GetInt64(); scheme = payload.GetString(); host = payload.GetString(); port = payload.GetUInt32(); path = payload.GetString(); } ... Another difference is that one even is not even emitted in .NET 7:\nExplain what a redirection is please! Because I based the code on this Redirect event, I needed to find another way to support .NET 7 even though I would not have a redirected url to display. But first, let’s see what I’m talking about in terms of HTTP communication.\nWhen you try to get the content / status code behind a url, the code is following different phases with the related events:\n**Start **RequestStart **DNS resolution **ResolutionStart ResolutionStop/Fail **Socket connection **ConnectStart ConnectStop **Security hand check (HTTPS only) **HandshakeStart HandshakeStop/Failed **Request/response **RequestHeadersStart RequestHeadersStop ResponseHeadersStart ResponseHeadersStop Redirect (.NET 8+) ResponseContentStart ResponseContentStop **Request stop **RequestStop/Failed Based on the received url, a server can decide to answer that another url should be used instead. For example, if you call github with http:// instead of https://, such a redirection will happen. Without invasive tools such as Wireshark, these redirections are impossible to detect and could cause unnecessary delay.\nFrom the client perspective, this can be detected in the ResponseHeadersStop event payload that provides a status code. If its value is 301, then it is a redirection. The other effect of a redirection is that the BCL code will change the flow of events because it needs to start over with the new url from step 2. to step 6. As you can see, instead of paying the cost of just one request, two are actually emitted and processed.\nImpact on the implementation In addition to the payload size checks addition, my initial implementation was not properly handling the redirection because the values (timestamps and durations) where overridden by the events related to the second redirected url.\nThe new implementation is splitting the request details into two classes. A base class that contains common fields to both parts of the request in case of redirection:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 private class HttpRequestInfoBase { public HttpRequestInfoBase(DateTime timestamp, string scheme, string host, uint port, string path) { StartTime = timestamp; if (scheme == string.Empty) { Url = string.Empty; } else { if (port != 0) { Url = $\u0026#34;{scheme}://{host}:{port}{path}\u0026#34;; } else { Url = $\u0026#34;{scheme}://{host}:{path}\u0026#34;; } } } public string Url { get; set; } public DateTime StartTime { get; set; } public DateTime ReqRespStartTime { get; set; } public double ReqRespDuration { get; set; } // DNS public double DnsWait { get; set; } public DateTime DnsStartTime { get; set; } public double DnsDuration { get; set; } // HTTPS public double HandshakeWait { get; set; } public DateTime HandshakeStartTime { get; set; } public double HandshakeDuration { get; set; } // socket connection public DateTime SocketConnectionStartTime { get; set; } public double SocketWait { get; set; } public double SocketDuration { get; set; } public DateTime QueueuingEndTime { get; set; } public double QueueingDuration { get; set; } } The second one inherits from the base class and contains addition details; including the details of the redirected url if any:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 private class HttpRequestInfo : HttpRequestInfoBase { public HttpRequestInfo(DateTime timestamp, string scheme, string host, uint port, string path) : base(timestamp, scheme, host, port, path) { } public HttpRequestInfoBase Redirect { get; set; } public UInt32 StatusCode { get; set; } // HTTPS public string HandshakeErrorMessage { get; set; } public string Error { get; set; } } A new instance of HttpRequestInfoBase is created when a 301 status code is received in HttpResponseHeaderStop handler:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 private void OnHttpResponseHeaderStop(object sender, HttpRequestStatusEventArgs e) { // used to detect redirection in .NET 8+ if (e.StatusCode != 301) { return; } // create a new request info for the redirected request // because .NET 7 does not emit a Redirect event, we need to create a new request info here // --\u0026gt; it means that the redirect url will be empty in .NET 7 var root = GetRoot(e.ActivityId); if (_requests.TryGetValue(root, out HttpRequestInfo info)) { info.Redirect = new HttpRequestInfoBase(e.Timestamp, \u0026#34;\u0026#34;, \u0026#34;\u0026#34;, 0, \u0026#34;\u0026#34;); // if you really want to have the duration of both original request + redirected request, // then do the following: // info.ReqRespDuration = (e.Timestamp - info.ReqRespStartTime).TotalMilliseconds; // However, I prefer to show the duration of the redirected request only to more easily // compute the cost of the initial redirected request = total duration - other durations } } For .NET 8+, the redirected url is provided to the Redirect handler and stored in the Url field of the instance created in the previous handler:\n1 2 3 4 5 6 7 8 9 private void OnHttpRedirect(object sender, HttpRedirectEventArgs e) { // since this is an Info event, the activityID is the root var root = ActivityHelpers.ActivityPathString(e.ActivityId); if (_requests.TryGetValue(root, out HttpRequestInfo info)) { info.Redirect.Url = e.RedirectUrl; } } In each handler of events that could be received for both initial and redirected requests,\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 private void OnHttpResponseContentStop(object sender, EventPipeBaseArgs e) { var root = GetRoot(e.ActivityId); if (!_requests.TryGetValue(root, out HttpRequestInfo info)) { return; } if (info.Redirect == null) { info.ReqRespDuration = (e.Timestamp - info.ReqRespStartTime).TotalMilliseconds; } else { info.Redirect.ReqRespDuration = (e.Timestamp - info.Redirect.ReqRespStartTime).TotalMilliseconds; } } The wait time and durations are now computed as the events are received and aggregated at the end of the request by adding the value of both parts (initial and redirected if any:\n1 2 3 4 5 6 7 8 9 10 double dnsDuration = info.DnsDuration + ((info.Redirect != null) ? info.Redirect.DnsDuration : 0); if (dnsDuration \u0026gt; 0) { double dnsWait = info.DnsWait + ((info.Redirect != null) ? info.Redirect.DnsWait : 0); Console.Write($\u0026#34;{dnsWait,9:F3} | {dnsDuration,9:F3} | \u0026#34;); } else { Console.Write($\u0026#34; | | \u0026#34;); } As a conclusion, you should try to monitor these redirections by using my dotnet-http CLI tool. Feel free to download it or install it with the following command line: dotnet tool install -g dotnet-http or update to the latest version with: dotnet tool update -g dotnet-http\nYou could also integrate some event listening code into your framework that simply handles the ResponseHeadersStop/Redirect events.\n","cover":"https://chrisnas.github.io/posts/2024-12-13_monitor-http-redirects-to/1_w_2f8F9E6hGmoiGLBCMqfw.png","date":"2024-12-13","permalink":"https://chrisnas.github.io/posts/2024-12-13_monitor-http-redirects-to/","summary":"\u003chr\u003e\n\u003cp\u003eIn the \u003ca href=\"/posts/2024-11-13_implementing-dotnet-http-to/\"\u003eprevious post\u003c/a\u003e, I detailed how I used the undocumented events from the BCL to create the \u003ca href=\"https://www.nuget.org/packages/dotnet-http\"\u003edotnet-http CLI tool\u003c/a\u003e to monitor your outgoing HTTP requests. After testing with older versions of .NET, I realized that the code needed to be updated and I’m sharing my findings in this post.\u003c/p\u003e\n\u003cp\u003eThe main point is that url redirections could have a major impact on requests latency:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2024-12-13_monitor-http-redirects-to/1_w_2f8F9E6hGmoiGLBCMqfw.png\"\u003e\u003c/p\u003e\n\u003ch2 id=\"always-test-supported-versions\"\u003eAlways test supported versions…\u003c/h2\u003e\n\u003cp\u003eWhen I wrote the initial version of dotnet-http, I only tested it with .NET 8 and .NET 9 with limited formats of urls. Unfortunately, things went bad when I tried to monitor applications running on .NET 5 and .NET 6: no events are emitted by these versions of the BCL.\u003c/p\u003e","title":"Monitor HTTP redirects to reduce unexpected latency"},{"content":" The previous episode detailed how to find the events dealing with network requests that are emitted by the BCL classes with their undocumented payload. It is now time to see how to listen to them and extract valuable insights such as what is happening when an HTTP request is sent to a server as an example. This is how my new dotnet-http CLI tool is implemented.\nYou are now able to see the cost of DNS, socket connection, security and redirection as shown in the following screenshot.\nAs you can see, the cost of an unexpected redirection could be high and hides security handshakes. In this example, using https://github.com/Maoni0 instead of http://github.com/Maoni0 divides by 2 the request duration!\nOnce the DNS checks are done, the socket connections are established, and the security handshakes are validated, the corresponding phases are no more needed for the next requests.\nListen to custom EventSource events As explained in previous posts, it is easy to listen to events emitted by .NET application thanks to the Microsoft TraceEvent nuget. You also need to use EventPipeClient from the Microsoft.Diagnostics.NETCore.Client nuget to connect to the running application. In my case, I’m relying on an older version that supports both ETW and EventPipe:\n1 2 3 4 5 6 7 var configuration = new SessionConfiguration( circularBufferSizeMB: 2000, format: EventPipeSerializationFormat.NetTrace, providers: GetProviders() ); var binaryReader = EventPipeClient.CollectTracing(_processId, configuration, out var sessionId); EventPipeEventSource source = new EventPipeEventSource(binaryReader); The configuration contains a list of providers that are emitting the events you are interested in with the right keyword and verbosity. Here is what is needed for the HTTP requests monitoring :\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 private static IReadOnlyCollection\u0026lt;Provider\u0026gt; GetProviders() { var providers = new Provider[] { new Provider( name: \u0026#34;System.Net.Http\u0026#34;, keywords: (ulong)(1), eventLevel: EventLevel.Verbose), new Provider( name: \u0026#34;System.Net.Sockets\u0026#34;, keywords: (ulong)(0xFFFFFFFF), eventLevel: EventLevel.Verbose), new Provider( name: \u0026#34;System.Net.NameResolution\u0026#34;, keywords: (ulong)(0xFFFFFFFF), eventLevel: EventLevel.Verbose), new Provider( name: \u0026#34;System.Net.Security\u0026#34;, keywords: (ulong)(0xFFFFFFFF), eventLevel: EventLevel.Verbose), new Provider( name: \u0026#34;System.Threading.Tasks.TplEventSource\u0026#34;, keywords: (ulong)(0x80), eventLevel: EventLevel.Verbose), }; return providers; } You might be surprised by the presence of the TplEventSource provider but it is required to get a correct ActivityID. This major subject is detailed later.\nThere are so many events to listen to that it is easier to listen to the source AllEvents C# event:\n1 source.AllEvents += OnEvents; This global handler passes the events to ParseEvent that forwards the important data to handlers for each provider:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 private static Guid NetSecurityEventSourceProviderGuid = Guid.Parse(\u0026#34;7beee6b1-e3fa-5ddb-34be-1404ad0e2520\u0026#34;); private static Guid DnsEventSourceProviderGuid = Guid.Parse(\u0026#34;4b326142-bfb5-5ed3-8585-7714181d14b0\u0026#34;); private static Guid SocketEventSourceProviderGuid = Guid.Parse(\u0026#34;d5b2e7d4-b6ec-50ae-7cde-af89427ad21f\u0026#34;); private static Guid HttpEventSourceProviderGuid = Guid.Parse(\u0026#34;d30b5633-7ef1-5485-b4e0-94979b102068\u0026#34;); private void ParseEvent( DateTime timestamp, int threadId, Guid activityId, Guid relatedActivityId, Guid providerGuid, string taskName, Int64 keywords, UInt16 id, byte[] eventData ) { if (providerGuid == NetSecurityEventSourceProviderGuid) { HandleNetSecurityEvent(timestamp, threadId, activityId, relatedActivityId, id, taskName, eventData); } else if (providerGuid == DnsEventSourceProviderGuid) { HandleDnsEvent(timestamp, threadId, activityId, relatedActivityId, id, taskName, eventData); } else if (providerGuid == SocketEventSourceProviderGuid) { HandleSocketEvent(timestamp, threadId, activityId, relatedActivityId, id, taskName, eventData); } else if (providerGuid == HttpEventSourceProviderGuid) { HandleHttpEvent(timestamp, threadId, activityId, relatedActivityId, id, taskName, eventData); } else { WriteLogLine(); } } The next step is to handle each event based on its id:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private void HandleNetSecurityEvent( DateTime timestamp, int threadId, Guid activityId, Guid relatedActivityId, ushort id, string taskName, byte[] eventData ) { switch (id) { case 1: // HandshakeStart OnHandshakeStart(timestamp, threadId, activityId, relatedActivityId, eventData); break; case 2: // HandshakeStop OnHandshakeStop(timestamp, threadId, activityId, relatedActivityId, eventData); break; case 3: // HandshakeFailed OnHandshakeFailed(timestamp, threadId, activityId, relatedActivityId, eventData); break; default: WriteLogLine(); break; } } Extract information from an event payload The payload of each event has been detailed in the previous post. For example, I need to extract the following fields from the RequestStart event payload:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 private void OnRequestStart( DateTime timestamp, int threadId, Guid activityId, Guid relatedActivityId, byte[] eventData ) { ... // string scheme // string host // int port // string path // byte versionMajor // byte versionMinor // enum HttpVersionPolicy I’ve implemented the EventSourcePayload class that provides strongly typed helpers to get the different fields one after the other:\n1 2 3 4 5 6 7 8 9 public class EventSourcePayload { private byte[] _payload; private int _pos = 0; public EventSourcePayload(byte[] payload) { _payload = payload; } It accepts the payload as the array of bytes received by each handler.\nA string information is serialized as a list of UTF-16 Unicode characters; each one stored in 2 bytes:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public string GetString() { StringBuilder builder = new StringBuilder(); while (_pos \u0026lt; _payload.Length) { var characters = UnicodeEncoding.Unicode.GetString(_payload, _pos, 2); _pos += 2; if (characters == \u0026#34;\\0\u0026#34;) { break; } builder.Append(characters); } return builder.ToString(); } The current position in the array is incremented character by character up to the final ‘\\0’.\nThe other helpers implementation is straightforward thanks to the BitConverter class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 public byte GetByte() { return _payload[_pos++]; } public UInt16 GetUnit16() { UInt16 value = BitConverter.ToUInt16(_payload, _pos); _pos += sizeof(UInt16); return value; } public UInt32 GetUInt32() { UInt32 value = BitConverter.ToUInt32(_payload, _pos); _pos += sizeof(UInt32); return value; } public UInt64 GetUInt64() { UInt64 value = BitConverter.ToUInt64(_payload, _pos); _pos += sizeof(UInt64); return value; } public Int64 GetInt64() { Int64 value = BitConverter.ToInt64(_payload, _pos); _pos += sizeof(UInt64); return value; } public double GetDouble() { double value = BitConverter.ToDouble(_payload, _pos); _pos += sizeof(double); return value; } } Again, the position in the array is incremented to reflect the size of the field being read.\nIf you look at the rest of the OnRequestStart handler, you will see how each field is extracted:\n1 2 3 4 5 6 7 EventSourcePayload payload = new EventSourcePayload(eventData); var scheme = payload.GetString(); var host = payload.GetString(); var port = payload.GetUInt32(); var path = payload.GetString(); var versionMajor = payload.GetByte(); var versionMinor = payload.GetByte(); My activity or not my activity: that is the question In the previous episode, I forgot (on purpose) to mention that the EventSource is keeping track of “activities” when emitting events:\n1 2 3 4 protected unsafe void WriteEventCore(int eventId, int eventDataCount, EventData* data) { WriteEventWithRelatedActivityIdCore(eventId, null, eventDataCount, data); } The WriteEventWithRelatedActivityIdCore method looks at the event metadata and if its opcode is Start then a new activity is created; if it is Stop then the current activity ends:\n1 2 3 4 5 6 7 8 if (opcode == EventOpcode.Start) { m_activityTracker.OnStart(m_name, metadata.Name, metadata.Descriptor.Task, ref activityId, ref relActivityId, metadata.ActivityOptions); } else if (opcode == EventOpcode.Stop) { m_activityTracker.OnStop(m_name, metadata.Name, metadata.Descriptor.Task, ref activityId); } These OnStart and OnStop methods are doing nothing if the TplEventSource is not enabled with Keywords.TasksFlowActivityIds (= 0x80) set. This explains the code in GetProviders listed earlier where this non-HTTP provider is enabled.\nWhen a request is created, the current global count managed by the ActivityTracker is incremented and it becomes the id of the current activity. Note that there is a “root” identifier before any activity gets created corresponding to the current AppDomain ID; starting at 1. If you think of an HTTP request, after a RequestStart event, each phase starts a new activity with, for example, ResolutionStart or ConnectStart. Informative events are emitted with the current request activity such as Redirect or ConnectionEstablished.\nHere is a simplified view of the events emitted for an HTTP request (without DNS nor security events):\nThread +-- Path \u0026gt;------- ID ---- Opcode -- Event ---------------------------------- 78568 | 1/1 \u0026gt; event 1 __ [ 1| Start] RequestStart | 32388 | 1/1/1 \u0026gt; event 1 __ [ 1| Start] ResolutionStart | 32388 | 1/1/1 \u0026gt; event 2 __ [ 2| Stop] ResolutionStop | 32388 | 1/1/2 \u0026gt; event 1 __ [ 1| Start] ConnectStart | 32388 | 1/1/2 \u0026gt; event 2 __ [ 2| Stop] ConnectStop | 32388 | 1/1 \u0026gt; event 4 __ [ 0| Info] ConnectionEstablished | 53324 | 1/1 \u0026gt; event 6 __ [ 0| Info] RequestLeftQueue | 53324 | 1/1/3 \u0026gt; event 7 __ [ 1| Start] RequestHeadersStart | 53324 | 1/1/3 \u0026gt; event 8 __ [ 2| Stop] RequestHeadersStop | 68024 | 1/1/4 \u0026gt; event 11 __ [ 1| Start] ResponseHeadersStart | 68024 | 1/1/4 \u0026gt; event 12 __ [ 2| Stop] ResponseHeadersStop | 68024 | 1/1/5 \u0026gt; event 13 __ [ 1| Start] ResponseContentStart | 68024 | 1/1/5 \u0026gt; event 14 __ [ 2| Stop] ResponseContentStop | 68024 | 1/1 \u0026gt; event 2 __ [ 2| Stop] RequestStop 200 \u0026lt;| As you can see, different threads are emitting events associated to the same request. At the ActivityTracker level, an ActivityInfo is stored in an async local so that each thread has its own storage that will be propagated by the methods of the Task Parallel Library (a.k.a. TPL) from task to task. This is why the very asynchronous code of the HTTP client implementation can go back to the current activity from different threads.\nThe activities are encoded into the 16 bytes of a GUID. In fact, only the first 12 bytes are used, and the final 4 bytes contain a checksum that includes the current process ID. Since the activity identifiers in the path do, most of the time, have a small value, the encoding is using 4 bits, also known as a nibble, to encode each of them. There was a bug in ActivityTracker.AddIdToGuid that will be fixed in .NET 10. Unfortunately, the decoding code in Perfview needs to take it into account so the last activity is not lost.\nComputing the different durations With this infrastructure in place, it is now possible to extract the “root” activity path corresponding to an HTTP request during each phase and update the corresponding state that is used to output the details when a request ends:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 private class HttpRequestInfo { public HttpRequestInfo(DateTime timestamp, Guid activityId, string scheme, string host, uint port, string path) { Root = ActivityHelpers.ActivityPathString(activityId); if (port != 0) { Url = $\u0026#34;{scheme}://{host}:{port}{path}\u0026#34;; } else { Url = $\u0026#34;{scheme}://{host}:{path}\u0026#34;; } StartTime = timestamp; } public string Root { get; set; } public string Url { get; set; } public string RedirectUrl { get; set; } public DateTime StartTime { get; set; } public DateTime ReqRespStartTime { get; set; } public double ReqRespDuration { get; set; } public UInt32 StatusCode { get; set; } // DNS public DateTime DnsStartTime { get; set; } public double DnsDuration { get; set; } // HTTPS public DateTime HandshakeStartTime { get; set; } public double HandshakeDuration { get; set; } public string HandshakeErrorMessage { get; set; } // socket connection public DateTime SocketConnectionStartTime { get; set; } public double SocketDuration { get; set; } } As you have seen earlier, the different phases of a request may be processed by different threads. It means that an available thread needs to be found in order to execute them. If the thread pool is busy, you can expect some wait time. This is shown in the different wait sections between the phases. They are easily computed using the timestamp of each event. Having long wait durations might be reduced by increasing the number of threads in the thread pool.\nFeel free to use the new dotnet-http CLI tool available from nuget.org or via dotnet tool install -g dotnet-http.\nThe corresponding sources are available from my github repository in case you would like to integrate this kind of analysis directly in your code or your monitoring pipeline.\nHappy monitoring!\n","cover":"https://chrisnas.github.io/posts/2024-11-13_implementing-dotnet-http-to/1_Xx8OW9oP34V6fVKKdxqZQA.png","date":"2024-11-13","permalink":"https://chrisnas.github.io/posts/2024-11-13_implementing-dotnet-http-to/","summary":"\u003chr\u003e\n\u003cp\u003eThe \u003ca href=\"/posts/2024-10-13_digging-into-the-undocumented/\"\u003eprevious episode\u003c/a\u003e detailed how to find the events dealing with network requests that are emitted by the BCL classes with their undocumented payload. It is now time to see how to listen to them and extract valuable insights such as what is happening when an HTTP request is sent to a server as an example. This is how my new \u003ca href=\"https://www.nuget.org/packages/dotnet-http\"\u003edotnet-http CLI tool\u003c/a\u003e is implemented.\u003c/p\u003e\n\u003cp\u003eYou are now able to see the cost of DNS, socket connection, security and redirection as shown in the following screenshot.\u003c/p\u003e","title":"Implementing dotnet-http to monitor your HTTP requests"},{"content":" I’ve presented in depth the events emitted by the CLR in many posts to get insightful details about how the .NET runtime is working (lock contention, GC, allocations, …). Some .NET features are not implemented at the runtime level but at the Base Class Library (a.k.a. BCL) level. For example, if you are using HttpClient, you might want to measure how long it takes to get the response to your HTTP requests.\nIn this new series, I will describe how the BCL is using EventSource to emit the events, how you can listen to them with TraceEvent; focusing on HTTP requests.\nWhere do these events come from? When you look for these BCL events in the documentation, you end up to this well-known event provides in .NET page and in the Framework libraries section. It lists the providers with their name and the emitted events with their keyword and verbosity. However, there is no detail about their payload! For example, for the “System.Net.Http” provider, you know that the RequestStart informal event is emitted each time “an HTTP request has started”. But you don’t know the corresponding url… Let me tell you how to get these tiny details.\nWithin the BCL, the classes responsible for emitting events derive from EventSource with the “Telemetry” suffix as a naming convention. They are decorated with an EventSource attribute to define their name (used as provider name by a listener) and their Guid that will be provided with each event. For example, here is the declaration of the class responsible for events related to sending HTTP requests:\n1 2 3 [EventSource(Name = \u0026#34;System.Net.Http\u0026#34;)] internal sealed partial class HttpTelemetry : EventSource { If a Guid is not provided, one is automatically computed (see the corresponding code in Perfview).\nNext, public helper methods decorated with a NonEvent attribute are provided to be used in the BCL code when it is needed to emit events. These methods are calling private methods decorated with an Event attribute to define their unique ID and their verbosity level:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [NonEvent] public void RequestStart(HttpRequestMessage request) { ... RequestStart( request.RequestUri.Scheme, request.RequestUri.IdnHost, request.RequestUri.Port, request.RequestUri.PathAndQuery, (byte)request.Version.Major, (byte)request.Version.Minor, request.VersionPolicy); } [Event(1, Level = EventLevel.Informational)] private void RequestStart(string scheme, string host, int port, string pathAndQuery, byte versionMajor, byte versionMinor, HttpVersionPolicy versionPolicy) { ... WriteEvent(eventId: 1, scheme, host, port, pathAndQuery, versionMajor, versionMinor, versionPolicy); } The different WriteEvent overloads are responsible for filling an array of EventData elements (each one contains a pointer to the data and its size) that is passed to EventSource.WriteEventCore. This helper methods dispatches the event to EventPipe/ETW pipelines.\nDocument the undocumented Unlike CLR events, their payload is not explicitly described in a text file but, instead, you have to look at the implementation of the different WriteEvent overloads. The rest of this section provides the details of the sources I’ve looked at and the corresponding events payload.\nHTTP Sources: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Http/src/System/Net/Http/HttpTelemetry.cs Name: System.Net.Http Guid: d30b5633–7ef1–5485-b4e0–94979b102068\nSockets Sources: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketsTelemetry.cs\nName: System.Net.Sockets Guid: d5b2e7d4-b6ec-50ae-7cde-af89427ad21f\nDNS Sources: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionTelemetry.cs\nName: System.Net.NameResolution Guid: 4b326142-bfb5–5ed3–8585–7714181d14b0\nNetwork Security Sources: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Security/src/System/Net/Security/NetSecurityTelemetry.cs\nName: System.Net.Security Guid: 7beee6b1-e3fa-5ddb-34be-1404ad0e2520\nThe next episode will describe how to listen to these events and extract their payload in C#.\n","cover":"https://chrisnas.github.io/posts/2024-10-13_digging-into-the-undocumented/1_3M-B2oQSv9q67xj-wkb_HA.png","date":"2024-10-13","permalink":"https://chrisnas.github.io/posts/2024-10-13_digging-into-the-undocumented/","summary":"\u003chr\u003e\n\u003cp\u003eI’ve presented in depth the events emitted by the CLR \u003ca href=\"https://github.com/chrisnas/ClrEvents\"\u003ein many posts\u003c/a\u003e to get insightful details about how the .NET runtime is working (lock contention, GC, allocations, …). Some .NET features are not implemented at the runtime level but at the Base Class Library (a.k.a. BCL) level. For example, if you are using \u003ca href=\"https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?WT.mc_id=DT-MVP-5003325\"\u003e\u003cstrong\u003eHttpClient\u003c/strong\u003e\u003c/a\u003e, you might want to measure how long it takes to get the response to your HTTP requests.\u003c/p\u003e","title":"Digging into the undocumented .NET events from the BCL"},{"content":" Testing the statistical results In parallel of the performance impact, it is important to validate the expected statistical distribution of the sampled allocations. Basically, I need to execute the same run of allocations multiple times in a row. Each run allocates the same number of instances of different types. For example, it is interesting to know if sampling instances of types with sizes proportional to a base value gives good results. Same question for totally different sized types or with Finalizers.\nI would like to pass the number of runs to execute and a given scenario to a C# runner program and listen to the emitted events in another C# listener.\nI’m facing 3 issues here:\nHow many instances are allocated to validate the upscaling algorithm (sampled vs real count) What are the types I want to focus on because I don’t want to hard code them in the listener application. When does each run start? It would be great if I could send the answer to these questions via events so the listener would know at runtime. Well… This is exactly what a class inherited from EventSource allows you to do!\nIn the runner application, I’ve defined the AllocationsRunEventSource that is decorated with the EventSource attribute to set its name that will be used as a provider name like Microsoft-Windows-DotNETRuntime for the .NET runtime provider.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [EventSource(Name = \u0026#34;Allocations-Run\u0026#34;)] public class AllocationsRunEventSource : EventSource { public static readonly AllocationsRunEventSource Log = new AllocationsRunEventSource(); The four implemented methods are defining which event ID for which verbosity will be emitted with which payload: [Event(600, Level = EventLevel.Informational)] public void StartRun(int iterationsCount, int allocationCount, string listOfTypes) { WriteEvent(eventId: 600, iterationsCount, allocationCount, listOfTypes); } [Event(601, Level = EventLevel.Informational)] public void StopRun() { WriteEvent(eventId: 601); } [Event(602, Level = EventLevel.Informational)] public void StartIteration(int iteration) { WriteEvent(eventId: 602, iteration); } [Event(603, Level = EventLevel.Informational)] public void StopIteration(int iteration) { WriteEvent(eventId: 603, iteration); } } To make the payload serialization and parsing easy, the list of types that will be allocated is passed as a string with the following format allocatedTypes = “Object24;Object48;Object72;Object32;Object64;Object96”.\nThe code of the runner calls these methods as expected at different moment of the execution:\n1 2 3 4 5 6 7 8 AllocationsRunEventSource.Log.StartRun(iterations, allocationsCount, allocatedTypes); for (int i = 0; i \u0026lt; iterations; i++) { AllocationsRunEventSource.Log.StartIteration(i); allocationsRun.Allocate(allocationsCount); AllocationsRunEventSource.Log.StopIteration(i); } AllocationsRunEventSource.Log.StopRun(); Instead of recording the events with dotnet-trace, this time I’m using TraceEvent and Microsoft.Diagnostics.NETCore.Client to code a listener application. The code is very similar to what was presented for my dotnet-fullgc CLI tool except that I’m enabling the AllocationsRun provider corresponding to the event source of the runner in addition to the .NET runtime one:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 public static void PrintEventsLive(int processId) { var providers = new List\u0026lt;EventPipeProvider\u0026gt;() { new EventPipeProvider( \u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, EventLevel.Verbose, // verbose is required for AllocationTick (long)0x80000000001 // new AllocationSamplingKeyword + GCKeyword ), new EventPipeProvider( \u0026#34;Allocations-Run\u0026#34;, EventLevel.Informational ), }; The custom events from that provider are received via the source.Dynamic.All C# event:\n1 2 3 4 5 6 7 8 9 10 11 12 13 var client = new DiagnosticsClient(processId); using (var session = client.StartEventPipeSession(providers, false)) { Console.WriteLine(); Task streamTask = Task.Run(() =\u0026gt; { var source = new EventPipeEventSource(session.EventStream); _source = source; ClrTraceEventParser clrParser = new ClrTraceEventParser(source); clrParser.GCAllocationTick += OnAllocationTick; source.Dynamic.All += OnEvents; ... Because TraceEvent is not already aware of the new AllocationSampled event emitted by the PR code, it will also be received via the same OnEvent handler:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 private static void OnEvents(TraceEvent eventData) { if (eventData.ID == (TraceEventID)303) { // AllocationSampled parsing ... return; } if (eventData.ID == (TraceEventID)600) // Start run { // keep track of the expected types and the number of allocated instances ... return; } if (eventData.ID == (TraceEventID)601) // Stop run { // show the results of the run ... return; } if (eventData.ID == (TraceEventID)602) // Start an iteration in a run { // reset for a new iteration ... return; } if (eventData.ID == (TraceEventID)603) // Stop an iteration in a run { // Show iteration results ... return; } } The parsing of the payload of the run related events is done the same way as for AllocationSampled by a dedicated xxxData class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 class AllocationsRunData { const int EndOfStringCharLength = 2; private TraceEvent _payload; public AllocationsRunData(TraceEvent payload) { _payload = payload; ComputeFields(); } public int Iterations; public int Count; public string AllocatedTypes; private void ComputeFields() { int offsetBeforeString = 4 + 4; Span\u0026lt;byte\u0026gt; data = _payload.EventData().AsSpan(); Iterations = BitConverter.ToInt32(data.Slice(0, 4)); Count = BitConverter.ToInt32(data.Slice(4, 4)); AllocatedTypes = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength)); } } By keeping track of this data, it is possible to show each iteration results:\n1 2 3 4 5 6 7 8 9 10 \u0026gt; starts 100 iterations allocating 1000000 instances 0| Tag SCount TCount SSize TSize UnitSize UpscaledSize UpscaledCount Name -------------------------------------------------------------------------------------------------- ST 247 384 5928 9216 24 24702711 1029279 Object24 ST 322 106 10304 3392 32 32205122 1006410 Object32 ST 435 509 20880 24432 48 43510266 906463 Object48 ST 587 776 37568 49664 64 58718825 917481 Object64 ST 747 481 53784 34632 72 74726662 1037870 Object72 ST 958 916 91968 87936 96 95845392 998389 Object96 that integrates AllocationTick numbers too.\nAt the end of the run, error distribution per type is also computed over the iterations:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Object72 ------------------------- 1 -10.5 % 2 -8.9 % 3 -8.4 % 4 -7.8 % 5 -6.4 % ... 49 0.2 % 50 0.2 % 51 0.2 % ... 96 6.8 % 97 6.8 % 98 7.7 % 99 8.6 % 100 10.0 % You could use the same mechanisms (a custom EventSource to emit additional information and an EventPipe listener to aggregate the data) for your own usage. This is a different way to use EventSource rather than emitting events for monitoring like what is done by the BCL.\nTesting the standalone GC In addition to the usual .NET GC, it is needed to validate that the changes are also working for the standalone GC. Long story short, it is possible to replace the existing .NET garbage collector by your own implementation. For the test, I needed to check that the standalone GC clrgcexp.dll generated by the .NET compilation generates the expected AllocationSampled events when the corresponding keyword with informational verbosity is enabled.\nDebugging NativeAOT scenarios The final step was to implement the feature for the NativeAOT scenario. When you build your C# application for NativeAOT, a lot happens behind the scenes, based on compilers known by Visual Studio corresponding to the official released version of the .NET runtime. In my case, I needed to use the brand-new code of my local branch and debug some simple C# applications. The steps to reach that goal are not that simple.\nFirst, you follow the steps given by the documentation to build AOT CLR in debug and libs in release:\n1 2 build clr.aot+lib -rc debug -lc release build -c release Then, open src\\coreclr\\tools\\aot\\ilc.sln\nthe repro project contains a program.cs file where you write the C# code you want to test and debug.\nWhen you build a NativeAOT application, you need to select if you want the runtime that emits events or not. This is done by setting the EventSourceSupport to true in the .csproj:\nAdd the following in the .csproj to get events:\n1 \u0026lt;EventSourceSupport\u0026gt;true\u0026lt;/EventSourceSupport\u0026gt; However, with the repro project, you need to change ILCompiler.csproj in a different way:\n1 \u0026lt;ReproResponseLines Include=\u0026#34;--feature:System.Diagnostics.Tracing.EventSource.IsSupported=true\u0026#34; /\u0026gt; Also, change the reproNative.vcxproj file to bind to eventpipe-enabled.lib instead of eventpipe-disabled.lib for the platform/configuration you want to debug:\n1 2 3 4 5 6 \u0026lt;ItemDefinitionGroup Condition=\u0026#34;\u0026#39;$(Configuration)|$(Platform)\u0026#39;==\u0026#39;Debug|x64\u0026#39;\u0026#34;\u0026gt; ... \u0026lt;Link\u0026gt; ... \u0026lt;AdditionalDependencies\u0026gt;...$(ArtifactsRoot)bin\\coreclr\\windows.x64.Debug\\aotsdk\\eventpipe-enabled.lib;...\u0026lt;/AdditionalDependencies\u0026gt; \u0026lt;/Link\u0026gt; Then, build repro in Debug x64\nNext, change the target for the ILCompiler project:\nBuild and run it to generate the .obj file corresponding to the repro project\nFinally, open src\\coreclr\\tools\\aot\\ILCompiler\\reproNative\\reproNative.vcxproj that will allow you to debug the program.cs you’ve just built!\n","cover":"https://chrisnas.github.io/posts/2024-09-13_unexpected-usage-of-eventsourc/1_GD25I4_RbCY1-6vEV7zetw.png","date":"2024-09-13","permalink":"https://chrisnas.github.io/posts/2024-09-13_unexpected-usage-of-eventsourc/","summary":"\u003chr\u003e\n\u003ch2 id=\"testing-the-statistical-results\"\u003eTesting the statistical results\u003c/h2\u003e\n\u003cp\u003eIn parallel of the performance impact, it is important to validate the expected statistical distribution of the sampled allocations. Basically, I need to execute the same run of allocations multiple times in a row. Each run allocates the same number of instances of different types. For example, it is interesting to know if sampling instances of types with sizes proportional to a base value gives good results. Same question for totally different sized types or with Finalizers.\u003c/p\u003e","title":"Unexpected usage of EventSource or how to test statistical results in CLR pull request"},{"content":" Introduction During the implementation of our .NET allocation profiler, we realized that the current sampling mechanism based on a fixed threshold did not provide a good enough statistical distribution. With the help of Noah Falk from the CLR Diagnostics team, I started to implement a randomized sampling based on a Bernoulli distribution model for .NET.\nWith this kind of changes, you need to ensure that you don’t break any existing code, the impact on performance is limited and the mathematical results map the expected mathematical distribution.\nThe rest of this blog series details the different tests I wrote and the corresponding tips and tricks that could be reused when you write C# code.\nTesting the basics From a high-level view, the code change does something simple: each time an allocation context is needed to fulfill an allocation, the code checks if it should be sampled. In that case, a new AllocationSampled event is emitted with the same information as the existing AllocationTick event plus an additional field. So, the first level of testing is to validate that the events are emitted when the keyword and verbosity are enabled for the .NET runtime provider.\nThe runtime has already some tests in place to validate that some events are emitted under the** \\src\\tests\\tracing\\eventpipe** folder. Here is the code of my XUnit test that mimics the existing ones such as simpleruntimeeventvalidation:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [Fact] public static int TestEntryPoint() { // check that AllocationSampled events are generated and size + type name are correct var ret = IpcTraceTest.RunAndValidateEventCounts( new Dictionary\u0026lt;string, ExpectedEventCount\u0026gt;() { { \u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, -1 } }, _eventGeneratingActionForAllocations, // AllocationSamplingKeyword (0x80000000000): 0b1000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000 new List\u0026lt;EventPipeProvider\u0026gt;() { new EventPipeProvider(\u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, EventLevel.Informational, 0x80000000000) }, 1024, _DoesTraceContainEnoughAllocationSampledEvents, enableRundownProvider: false); if (ret != 100) return ret; return 100; } The IpcTraceTest.RunAndValidateEventCounts helper method accepts:\nThe list of providers to enable with which keyword and verbosity level. How many events are expected (using -1 in my case because I can’t predict how many random events will be generated A callback with the code that will generate events (allocating a lot of instances of a custom type in my case) A callback that looks at emitted events The last callback code relies on TraceEvent to listen to emitted events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private static Func\u0026lt;EventPipeEventSource, Func\u0026lt;int\u0026gt;\u0026gt; _DoesTraceContainEnoughAllocationSampledEvents = (source) =\u0026gt; { int AllocationSampledEvents = 0; int Object128Count = 0; source.Dynamic.All += (eventData) =\u0026gt; { if (eventData.ID == (TraceEventID)303) // AllocationSampled is not defined in TraceEvent yet { AllocationSampledEvents++; AllocationSampledData payload = new AllocationSampledData(eventData, source.PointerSize); // uncomment to see the allocation events payload // Logger.logger.Log($\u0026#34;{payload.HeapIndex} - {payload.AllocationKind} | ({payload.ObjectSize}) {payload.TypeName} = 0x{payload.Address}\u0026#34;); if (payload.TypeName == \u0026#34;Tracing.Tests.SimpleRuntimeEventValidation.Object128\u0026#34;) { Object128Count++; } } }; return () =\u0026gt; { Logger.logger.Log(\u0026#34;AllocationSampled counts validation\u0026#34;); Logger.logger.Log(\u0026#34;Nb events: \u0026#34; + AllocationSampledEvents); Logger.logger.Log(\u0026#34;Nb object128: \u0026#34; + Object128Count); return (AllocationSampledEvents \u0026gt;= MinExpectedEvents) \u0026amp;\u0026amp; (Object128Count != 0) ? 100 : -1; }; }; In my case, I’m adding a new event that is emitted when a new keyword is enabled. It means that TraceEvent does not know yet its ID (hence the 303 hardcoded value) or how to unpack the new event payload. This is why I created the AllocationSampleData type to expose the payload as public fields:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 class AllocationSampledData { const int EndOfStringCharLength = 2; private TraceEvent _payload; private int _pointerSize; public AllocationSampledData(TraceEvent payload, int pointerSize) { _payload = payload; _pointerSize = pointerSize; TypeName = \u0026#34;?\u0026#34;; ComputeFields(); } public GCAllocationKind AllocationKind; public int ClrInstanceID; public UInt64 TypeID; public string TypeName; public int HeapIndex; public UInt64 Address; public long ObjectSize; public long SampledByteOffset; ... } And the extraction of each field from the payload is done in the ComputeFields method: // The payload of AllocationSampled is not defined in TraceEvent yet // // \u0026lt;data name=\u0026#34;AllocationKind\u0026#34; inType=\u0026#34;win:UInt32\u0026#34; map=\u0026#34;GCAllocationKindMap\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;ClrInstanceID\u0026#34; inType=\u0026#34;win:UInt16\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;TypeID\u0026#34; inType=\u0026#34;win:Pointer\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;TypeName\u0026#34; inType=\u0026#34;win:UnicodeString\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;HeapIndex\u0026#34; inType=\u0026#34;win:UInt32\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;Address\u0026#34; inType=\u0026#34;win:Pointer\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;ObjectSize\u0026#34; inType=\u0026#34;win:UInt64\u0026#34; outType=\u0026#34;win:HexInt64\u0026#34; /\u0026gt; // \u0026lt;data name=\u0026#34;SampledByteOffset\u0026#34; inType=\u0026#34;win:UInt64\u0026#34; outType=\u0026#34;win:HexInt64\u0026#34; /\u0026gt; // private void ComputeFields() { int offsetBeforeString = 4 + 2 + _pointerSize; This offsetBeforeString value is computed based on the size of UInt32 (=4 bytes), UInt16 (=2 bytes) and a Pointer (depends on 32 bit=4 or 64 bit=8) fields before the string. As Span wraps the binary payload provided by TraceEvent:\n1 Span\u0026lt;byte\u0026gt; data = _payload.EventData().AsSpan(); Since I know the size of each field from the payload definition in ClrEtwAll.man, the numeric fields are extracted thanks to the BitConverter methods:\n1 2 AllocationKind = (GCAllocationKind)BitConverter.ToInt32(data.Slice(0, 4)); ClrInstanceID = BitConverter.ToInt16(data.Slice(4, 2)); Things start to be more complicated when you need to get the value of an address. Its size is 4 bytes in 32 bit and 8 bytes in 64 bit:\n1 2 3 4 5 6 7 8 if (_pointerSize == 4) { TypeID = BitConverter.ToUInt32(data.Slice(6, _pointerSize)); } else { TypeID = BitConverter.ToUInt64(data.Slice(6, _pointerSize)); } The bitness of the monitored application is given by the EventPipeSource’s PointerSize property that is passed to the AllocationSampledData constructor.\nFor the string case, you need to know that it is stored as UTF16 (so each character requires 2 bytes) with the trailing \\0 and its length is the total size of the payload minus the size of the other fields. That way, you can slice the Span to properly read the characters:\n1 2 // \\0 should not be included for GetString to work TypeName = Encoding.Unicode.GetString(data.Slice(offsetBeforeString, _payload.EventDataLength - offsetBeforeString - EndOfStringCharLength - 4 - _pointerSize - 8)); The rest of the fields are extracted with BitConverter helpers taking into account the size of the string:\n1 2 3 4 5 6 7 8 9 10 HeapIndex = BitConverter.ToInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength, 4)); if (_pointerSize == 4) { Address = BitConverter.ToUInt32(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); } else { Address = BitConverter.ToUInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4, _pointerSize)); } ObjectSize = BitConverter.ToInt64(data.Slice(offsetBeforeString + TypeName.Length * 2 + EndOfStringCharLength + 4 + 8, 8)); } The name of the sampled allocated type from the parsed payload is used to ensure that the expected allocations are indeed emitted when the keyword/verbosity are enabled for the .NET provider.\nTesting the performance impact The next step was to validate the impact of the changes on the GC performance. The baseline was the .NET 9 branch before the changes and in Release. The GCPerfSim library from the performance repository was used to allocate 500 GB of mixed size objects on 4 threads with a 50MB live object size. From the output, the seconds_taken line provides the duration to allocate these objects.\nTo ensure that you run with the rebuilt branch, you need to use the following commands:\n1 2 build.cmd clr+libs -c release src\\tests\\build.cmd generatelayoutonly Release The next step is to use \\artifacts\\tests\\coreclr\\windows.x64.Release\\Tests\\Core_Root\\corerun.exe instead of the usual dotnet.exe like the following:\n1 \u0026lt;clr repo\u0026gt;\\artifacts\\tests\\coreclr\\windows.x64.Release\\Tests\\Core_Root\\corerun \u0026lt;performance repo\u0026gt;\\artifacts\\bin\\GCPerfSim\\Release\\net7.0\\GCPerfSim.dll -tc 4 -tagb 500 -tlgb 0.05 -lohar 0 -sohsi 0 -lohsi 0 -pohsi 0 -sohpi 0 -lohpi 0 -sohfi 0 -lohfi 0 -pohfi 0 -allocType reference -testKind time I run this scenario 10 times to compute the median and the average. I’m doing the same for the PR branch. So far so good. Now, how to do the same but to measure the impact of the random sampling? Remember that the code only triggers if the .NET provider is enabled with a certain keyword and verbosity. It means that you have to use a tool such as dotnet-trace to start an event pipe session but you would need the process id. I could have changed the code of GCPerfSim to show the process id but I would still need to wait for the session to have been created before starting the seconds_taken computation. Not really easy to script a 10x runs that way…\nDon’t worry! dotnet-trace supports the — show-child-io true arguments that makes it start the session as the process starts and — providers allows you to enable a provider the way you want. Here is an example of the command line used for the performance runs:\n1 dotnet-trace collect --show-child-io true --providers Microsoft-Windows-DotNETRuntime:0x80000000000:4 -- corerun \u0026lt;performance repo\u0026gt;\\artifacts\\bin\\GCPerfSim\\Release\\net7.0\\GCPerfSim.dll … These dotnet-trace features are very handy for any scripting scenario unrelated to testing the CLR. For example, you could use Perfview to later on analyze how an application behaves thanks to the emitted events stored in the generated .nettrace file!\nThe next episode will describe unexpected usage of EventSource and debugging NativeAOT scenario.\n","cover":"https://chrisnas.github.io/posts/2024-08-13_tips-and-tricks-from/1_x2tSxxCnXqoEO8rW9nHW0A.png","date":"2024-08-13","permalink":"https://chrisnas.github.io/posts/2024-08-13_tips-and-tricks-from/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eDuring the implementation of our .NET allocation profiler, we realized that the current sampling mechanism based on a fixed threshold did not provide a good enough statistical distribution. With the help of \u003ca href=\"https://x.com/noahsfalk\"\u003eNoah Falk\u003c/a\u003e from the CLR Diagnostics team, I started to implement a randomized sampling based on a \u003ca href=\"https://github.com/dotnet/runtime/blob/ce40d3df8fb2d13750acfb075acc2c2adb3c8812/docs/design/features/RandomizedAllocationSampling.md#the-sampling-model\"\u003eBernoulli distribution model\u003c/a\u003e for .NET.\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2024-08-13_tips-and-tricks-from/1_x2tSxxCnXqoEO8rW9nHW0A.png\"\u003e\u003c/p\u003e\n\u003cp\u003eWith this kind of changes, you need to ensure that you don’t break any existing code, the impact on performance is limited and the mathematical results map the expected mathematical distribution.\u003c/p\u003e","title":"Tips and tricks from validating a Pull Request in .NET CLR"},{"content":" Introduction If you have read Microsoft documentation, you probably know that it is not recommended to trigger a garbage collection in your application code. However, in some troubleshooting cases, you might want to trigger a GC. For example, you don’t want to wait for a full gen2 compacting GC to figure out if your application is really leaking memory. For web applications, you can imagine having a hidden HTTP end point that simply call GC.Collect. What if you could simply call a command line tool to trigger a GC in any .NET application? This is exactly what my new dotnet-fullgc CLI tool is doing!\nThis Is The Way If you have read a few of my past posts about the .NET diagnostics mechanisms, you know that you can send commands to another .NET process via EventPipes. Well… there is no explicit command to trigger a GC.\nI have also explained how you could listen to events emitted by the CLR by enabling a provider with a set of keywords with a verbosity corresponding to the events you are interested in. This is how dotnet-trace and Perfview are collecting these events. If you want to trigger a GC, you simply need to enable the Microsoft-Windows-DotNETRuntime provider with the GCHeapCollect (= 0x800000L) keyword and an informal verbosity. Yes: it is as simple as that, and it is also working for .NET Framework!\nSo, you could trigger a GC with dotnet-trace via the following command line:\n1 dotnet trace collect -p \u0026lt;process id\u0026gt; - providers Microsoft-Windows-DotNETRuntime:0x800000:4 - duration 00:00:01 However, a .nettrace file would be generated and it is not possible to pass parameters (more on this later).\nThe rest of this post shows the C# code to obtain the same result. With Microsoft.Diagnostics.NETCore.Client and TraceEvent, you create a DiagnosticsClient with the ID of the process you are interested in:\n1 var client = new DiagnosticsClient(processId); The next step is to start an EventPipe session with the right provider, keyword and verbosity:\n1 2 3 4 5 6 7 8 9 10 11 var providers = new List\u0026lt;EventPipeProvider\u0026gt;() { new EventPipeProvider( \u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, EventLevel.Informational, (long)ClrTraceEventParser.Keywords.GCHeapCollect, Arguments // more on this later ), }; using (var session = client.StartEventPipeSession(providers, false)) { The source must be processed in another thread to avoid blocking the main thread:\n1 2 3 4 5 6 7 Task streamTask = Task.Run(() =\u0026gt; { // without source to process, session.Stop() will not return var source = new EventPipeEventSource(session.EventStream); source.Process(); }); The question to answer is how to stop the session: just create another task that waits for a second before stopping the session to exit from the Process call:\n1 2 3 4 5 6 7 8 Task inputTask = Task.Run(() =\u0026gt; { Thread.Sleep(1000); session.Stop(); }); Task.WaitAny(streamTask, inputTask); } That’s all!\nPitfalls Unfortunately, I faced a couple of issue during the implementation of the tool.\nWhat is your number? If you look at the TraceEvent implementation, you could imagine that is it possible to pass a GC ID as a parameter to the .NET provider:\n1 2 3 4 5 // // Summary: // Triggers a GC. Can pass a 64 bit value that will be logged with the GC Start // event so you know which GC you actually triggered. GCHeapCollect = 0x800000L, Unfortunately, this is not supported and the reasons are explained below.\nIf you check the .NET source code and look at EtwCallbackCommon(), you can indeed see that a numeric ID can be passed to ETW::GCLog::ForceGC(l64ClientSequenceNumber) and passed as ID in GCStart event instead of the one incrementally increased collection after collection.\nAt the diagnostics client level, you have the opportunity to pass a dictionary of key/value string pairs. This dictionary is used when defining the EventPipeProvider to be enabled:\n1 2 3 4 5 6 7 8 9 10 11 Dictionary\u0026lt;string, string\u0026gt; arguments = new Dictionary\u0026lt;string, string\u0026gt;(); arguments.Add(\u0026#34;Id\u0026#34;, \u0026#34;42\u0026#34;); var providers = new List\u0026lt;EventPipeProvider\u0026gt;() { new EventPipeProvider( \u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, EventLevel.Informational, (long)ClrTraceEventParser.Keywords.GCHeapCollect, arguments ), }; The diagnostics client transforms the dictionary into a string with key=value pairs separated by ‘;’ such as “Id=42;AnotherId=AnotherValue;…” and serializes it as is in the payload when sending a CollectTraces command.\nThe question is what identifier is expected by the CLR? To answer this question, you need to look at the code of provider_invoke_callback. The “key=value;…” string is stored into a buffer and each = and ; characters are transformed into \\0. So, “Id=42” is transformed into Id\\042\\0.\nThe next step is done by ep_event_filter_desc_init() that uses that buffer to fill up the 3 fields of an EventFilterDescriptor:\n1 2 3 uint64_t ptr = address of the buffer uint32_t size = size of the buffer (=6 for id\\042\\0) uint32_t type = 0 And finally, EtwCallbackCommon receives the filterdata and tries the following to get the collection id:\n1 2 3 4 5 6 7 PEVENT_FILTER_DESCRIPTOR FilterData = (PEVENT_FILTER_DESCRIPTOR)pFilterData; if ((FilterData != NULL) \u0026amp;\u0026amp; (FilterData-\u0026gt;Type == 1) \u0026amp;\u0026amp; (FilterData-\u0026gt;Size == sizeof(l64ClientSequenceNumber))) { l64ClientSequenceNumber = *(LONGLONG *) (FilterData-\u0026gt;Ptr); } As you can see, it will fail because:\nthe received type value is 0 the received size is not the size of a 64 bit number the ptr field does not point to the value as 64 bit number (but as a string) An issue has been filed with a possible fix.\nOnly one collection please! When I test my dotnet-fullgc with dotnet-gcstats, I always see 2 collections!\nWe investigated with my colleague Kevin Gosse and he created an issue for that. The EtwCallback() function is called whenever a session is enabled or disabled. Unfortunately, the call to ForceGC is made in both cases: so, when the session is stopped, a second garbage collection is triggered.\nNext steps Feel free to install dotnet-fullgc on your machine with dotnet tool install -g dotnet-fullgc.\nNext, use dotnet fullgc to trigger two gen2 full garbage collections in your running .NET processes.\n","cover":"https://chrisnas.github.io/posts/2024-05-22_trigger-your-gcs-with/1_OjENjdzCIyyrPNUiGrQEVw.png","date":"2024-05-22","permalink":"https://chrisnas.github.io/posts/2024-05-22_trigger-your-gcs-with/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIf you have read Microsoft documentation, you probably know that it is not recommended to trigger a garbage collection in your application code. However, in some troubleshooting cases, you might want to trigger a GC. For example, you don’t want to wait for a full gen2 compacting GC to figure out if your application is really leaking memory. For web applications, you can imagine having a hidden HTTP end point that simply call \u003cstrong\u003eGC.Collect\u003c/strong\u003e. What if you could simply call a command line tool to trigger a GC in any .NET application? This is exactly what my new dotnet-fullgc CLI tool is doing!\u003c/p\u003e","title":"Trigger your GCs with dotnet-fullgc!"},{"content":" Introduction While working on the second edition of Pro .NET Memory Management, it was needed to get statistics about each garbage collection to explain the condemned generation and other decisions taken by the GC. This post explains the different internal data structures used by the GC and how to get their value for each collection. Some require debugging the CLR and others are emitted via events. For the latter, I will show how I wrote the new dotnet-gcstats CLI tool to collect them and a personal Perfview GCStats displaying live data, garbage collection after garbage collection.\nHigh level view of GC Internals With regions, the GC keeps track of managed memory allocated by your application in instances of the gc_heap class. In Workstation mode, only 1 instance exists and in Server mode, by default, 1 instance is created per core. Each gc_heap keeps track of its 5 generations (gen0, gen1, gen2, Large Object Heap and Pinned Object Heap) in an array of 5 generation instances. Each generation references its dedicated regions wrapped by instances of heap_segment. These regions are reserved from a giant part of the process address space and committed as needed.\nDuring a garbage collection, the GC code relies on global fields per gc_heap:\n1 2 3 4 static gc_mechanisms settings; gc_history_global gc_data_global; // for non background GC including foreground GC during a background gc_history_global bgc_data_global; // for background GC only static dynamic_data dynamic_data_table[total_generation_count = 5]; The settings field contains a few interesting fields:\n1 2 3 4 5 6 7 8 9 10 11 12 class gc_mechanisms { public: gc_index; // starts from 1 for the first GC int condemned_generation; // generation to collect BOOL compaction; // true when compaction instead of sweep BOOL loh_compaction; // true when LOH needs compaction uint32_t concurrent; // 1 = concurrent/background GC gc_reason reason; // trigger reason gc_pause_mode pause_mode; // see GCSettings.LatencyMode ... }; including the trigger reason that will tell if your code called GC.Collect (i.e. induced), or if it was due to a LOH or SOH allocation for example. If compaction is true, a compacting GC will happen (instead of a sweeping one).\nThe gc/bgc_data_global contains almost the same information:\n1 2 3 4 5 6 7 8 9 struct gc_history_global { uint32_t num_heaps; // number of gc_heap instances int condemned_generation; gc_reason reason; int pause_mode; uint32_t mem_pressure; uint32_t global_mechanisms_p; }; Most of the fields are available from different events:\nGCStart: gc_index in Count, condemned_generation in Depth, reason in Reason GCGlobalHeapHistory: pause_mode in PauseMode and some others in GlobalMechanisms Which generation to collect = condemned generation The computation of the condemned_generation is complicated and relies on many factors including metrics stored for each “generation” (gen0, gen1, gen2, LOH and POH) in an array of dynamic_data called dynamic_data_table. The dynamic_data class contains a few fields used by the GC to take decisions such as when a collection should be triggered and which generation to condemn:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class dynamic_data { public: ptrdiff_t new_allocation; // remaining budget = budget - allocated size_t desired_allocation; // budget to trigger a GC // # of bytes taken by survived objects after mark. size_t survived_size; // # of bytes taken by survived pinned plugs after mark. size_t pinned_survived_size; // total object size after a GC, ie, doesn\u0026#39;t include fragmentation size_t current_size; size_t promoted_size; size_t fragmentation; }; Most of these fields are found in the payload of GCPerHeapHistory or GCHeapStat events. However, the most interesting one, new_allocation is not available. Why is it interesting? Because it would give you which generation had its budget exceeded. It is initialized with the generation budget at the end of a GC and then, each time an allocation context gets created, its size is deducted from it. When it reaches 0, it means that the budget is exceeded, and a collection should happen.\nSince I needed to debug the CLR to better understand all these algorithms, I added a breakpoint at the beginning of gc_heap::garbage_collect with the following action:\n#{settings.gc_index}[{gc_trigger_reason}]{\u0026#34;\\n\u0026#34;,s8b} new_allocation(0) = {dynamic_data_table[0].new_allocation}{\u0026#34;\\n\u0026#34;,s8b} desired_allocation(0) = {dynamic_data_table[0].desired_allocation}{\u0026#34;\\n\u0026#34;,s8b} begin_data_size(0) = {dynamic_data_table[0].begin_data_size}{\u0026#34;\\n\u0026#34;,s8b} promoted_size(0) = {dynamic_data_table[0].promoted_size}{\u0026#34;\\n\u0026#34;,s8b}-{\u0026#34;\\n\u0026#34;,s8b} new_allocation(1) = ... {dynamic_data_table[4].new_allocation}{\u0026#34;\\n\u0026#34;,s8b} desired_allocation(4) = {dynamic_data_table[4].desired_allocation}{\u0026#34;\\n\u0026#34;,s8b} begin_data_size(4) = {dynamic_data_table[4].begin_data_size}{\u0026#34;\\n\u0026#34;,s8b} promoted_size(4) = {dynamic_data_table[4].promoted_size}{\u0026#34;\\n\u0026#34;,s8b}__________{\u0026#34;\\n\u0026#34;,s8b} And now, each time a GC happens, I get the corresponding log in my Output pane in Visual Studio:\n#2[reason_alloc_soh (0)] new_allocation(0) = -22728 desired_allocation(0) = 134217728 begin_data_size(0) = 8391376 promoted_size(0) = 8383432 - new_allocation(1) = -5910416 desired_allocation(1) = 2473016 begin_data_size(1) = 375528 promoted_size(1) = 353288 - new_allocation(2) = -91144 desired_allocation(2) = 262144 begin_data_size(2) = 0 promoted_size(2) = 0 - new_allocation(3) = 28000088 desired_allocation(3) = 28000088 begin_data_size(3) = 8000024 promoted_size(3) = 8000024 - new_allocation(4) = 3145728 desired_allocation(4) = 3145728 begin_data_size(4) = 32712 promoted_size(4) = 32712 As you can see, gen0, gen1 and gen2 have all their budget exceeded (i.e. their new_allocation is negative) and it explains why a simple gen0 collection (from allocation in SOH = gen0) becomes a gen2 collection. If you wonder how gen1 and gen2 budgets are exceeded as your application is only allocating in gen0, you need to understand that when a GC copy surviving objects from one younger generation to the older, they are counted as allocations in the older and subtracted from its new_allocation metric.\nThe GC is encoding the different steps leading to the final condemned generation in a 32 bit value stored in a gen_to_condemn_tuning field that allows you to get:\ninitial condemned generation, final generation to condemn, which generation’s budget is exceeded. The value of the last one corresponds to the highest generation for which its new_allocation was negative.\nThis information is available in the CondemnReasons0 field of the GCPerHeapHistory event, and you need some arithmetic to get the generation you want:\n1 2 3 4 5 6 7 8 9 10 11 private const int gen_initial = 0; // indicates the initial gen to condemn. private const int gen_final_per_heap = 1; // indicates the final gen to condemn per heap. private const int gen_alloc_budget = 2; // indicates which gen\u0026#39;s budget is exceeded. private const int InitialGenMask = 0x0 + 0x1 + 0x2; static int GetGen(int val, int reason) { int gen = (val \u0026gt;\u0026gt; 2 * reason) \u0026amp; InitialGenMask; return gen; } Building your own tool Even though I could dig into the different matrices available in the Perfview’s GCStats view or its export to Excel, I decided to write dotnet-gcstats. This CLI tool listens to the CLR events emitted by a .NET application thanks to Microsoft.Diagnostics.NETCore.Client (connect to the application EventPipe) and TraceEvent (receive and analyze the CLR events).\nThe code is amazingly simple:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 var providers = new List\u0026lt;EventPipeProvider\u0026gt;() { new EventPipeProvider(\u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;, EventLevel.Informational, (long)ClrTraceEventParser.Keywords.GC), }; var client = new DiagnosticsClient(processId); using (var session = client.StartEventPipeSession(providers, false)) { Console.WriteLine(); Task streamTask = Task.Run(() =\u0026gt; { var source = new EventPipeEventSource(session.EventStream); ClrTraceEventParser clrParser = new ClrTraceEventParser(source); clrParser.GCPerHeapHistory += OnGCPerHeapHistory; clrParser.GCStart += OnGCStart; clrParser.GCGlobalHeapHistory += OnGCGlobalHeapHistory; try { source.Process(); } catch (Exception e) { ShowError($\u0026#34;Error encountered while processing events: {e.Message}\u0026#34;); } }); Each event handler is responsible for extracting and translating the interesting fields of its event payload with a few color enhancements:\nGCStart: collection count and reason (highlight induced collections). GCGlobalHeapHistory: condemned generation, pause mode and memory pressure. GCPerHeapHistory: starting -\u0026gt; final condemned generation and for each heap, budget, begin size, begin obj size, final size, promoted size and fragmentation. The final step was to transform a simple console application into a .NET CLI tool that everyone will be able to install with dotnet tool install -g dotnet-gcstats and use with dotnet gcstats . I followed the documentation by adding the following to the project file:\n1 2 3 4 5 6 \u0026lt;PropertyGroup\u0026gt; \u0026lt;PackAsTool\u0026gt;true\u0026lt;/PackAsTool\u0026gt; \u0026lt;ToolCommandName\u0026gt;dotnet-gcstats\u0026lt;/ToolCommandName\u0026gt; \u0026lt;PackageOutputPath\u0026gt;./nupkg\u0026lt;/PackageOutputPath\u0026gt; \u0026lt;GeneratePackageOnBuild\u0026gt;true\u0026lt;/GeneratePackageOnBuild\u0026gt; \u0026lt;/PropertyGroup\u0026gt; In addition, I provided a few additional details:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 \u0026lt;PropertyGroup\u0026gt; \u0026lt;PackageId\u0026gt;dotnet-gcstats\u0026lt;/PackageId\u0026gt; \u0026lt;PackageVersion\u0026gt;1.0.0\u0026lt;/PackageVersion\u0026gt; \u0026lt;Title\u0026gt;dotnet-gcstats\u0026lt;/Title\u0026gt; \u0026lt;Authors\u0026gt;christophe Nasarre\u0026lt;/Authors\u0026gt; \u0026lt;Owners\u0026gt;chrisnas\u0026lt;/Owners\u0026gt; \u0026lt;RepositoryUrl\u0026gt;https://github.com/chrisnas\u0026lt;/RepositoryUrl\u0026gt; \u0026lt;RepositoryType\u0026gt;git\u0026lt;/RepositoryType\u0026gt; \u0026lt;PackageProjectUrl\u0026gt;https://github.com/chrisnas/GCStats\u0026lt;/PackageProjectUrl\u0026gt; \u0026lt;PackageLicenseFile\u0026gt;LICENSE\u0026lt;/PackageLicenseFile\u0026gt; \u0026lt;Description\u0026gt;Global CLI tool to display live statistics during .NET garbage collections\u0026lt;/Description\u0026gt; \u0026lt;PackageReleaseNotes\u0026gt;Initial release\u0026lt;/PackageReleaseNotes\u0026gt; \u0026lt;Copyright\u0026gt;Copyright Christophe Nasarre 2024-$([System.DateTime]::UtcNow.ToString(yyyy))\u0026lt;/Copyright\u0026gt; \u0026lt;PackageTags\u0026gt;.NET TraceEvent CLR GC\u0026lt;/PackageTags\u0026gt; \u0026lt;/PropertyGroup\u0026gt; Once built, I simply uploaded the generated package to nuget.org et voila!\nNow, you should be able to better understand why some collections are triggered:\nAnd if it is not enough, wait for reading the second edition of Pro .NET Memory Management ;^)\n","cover":"https://chrisnas.github.io/posts/2024-03-01_view-your-gcs-statistics/1_3oHvH2Vxb3PgW46khVMSMQ.png","date":"2024-03-01","permalink":"https://chrisnas.github.io/posts/2024-03-01_view-your-gcs-statistics/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhile working on the second edition of \u003ca href=\"https://www.amazon.com/Pro-NET-Memory-Management-Performance/dp/148424026X\"\u003ePro .NET Memory Management\u003c/a\u003e, it was needed to get statistics about each garbage collection to explain the condemned generation and other decisions taken by the GC. This post explains the different internal data structures used by the GC and how to get their value for each collection. Some require debugging the CLR and others are emitted via events. For the latter, I will show how I wrote the new \u003cstrong\u003edotnet-gcstats\u003c/strong\u003e CLI tool to collect them and a personal Perfview GCStats displaying live data, garbage collection after garbage collection.\u003c/p\u003e","title":"View your GCs statistics live with dotnet-gcstats!"},{"content":" Introduction During the Datadog R\u0026amp;D week, my goal is to mimic the generation of a .gcdump from our .NET profiler. I’ve already written most of the code for a previous post and after changing the required plumbing, it is time to test the workflow.\nUnfortunately, I’m facing the dreaded stack corruption dialog:\nThe rest of the post explains the different steps I’m following to investigate this issue.\nTrying to understand the problem This stack check is done by the debug version of the C Runtime library by basically adding some special bytes on the stack before calling a function and checking these bytes are not tampered when returning from the call.\nSo, the next step is to debug the application to get more details and at least where in the code the problem happened:\nThe failed check occurs at the end of a function that looks like the following:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 bool GcDumpProvider::Get(IGcDumpProvider::gcdump_t\u0026amp; gcDump) { // trigger the GC and get the dump GcDump gcd(::GetCurrentProcessId()); gcd.TriggerDump(); auto const\u0026amp; dump = gcd.GetGcDumpState(); auto\u0026amp; types = dump._types; for (auto\u0026amp; type : types) { auto\u0026amp; typeInfo = type.second; uint64_t instancesCount = typeInfo._instances.size(); uint64_t instancesSize = 0; for (size_t i = 0; i \u0026lt; instancesCount; i++) { instancesSize += typeInfo._instances[i]._size; } gcDump.push_back({typeInfo._name, instancesCount, instancesSize}); } return true; } There is much more code behind this; especially in the TriggerDump() function. I have already tested this code many times when I dug into the .gcdump generation process without facing this stack corruption. I’m spending a few hours digging back into the code because:\nI’m not running inside the profiled process and not outside like in the blog post I’m introducing a “slight” change because I need to exit the communication with the CLR when the GC ends. I need to mention that Visual Studio is refusing to debug (Step Over or Step Into) and only Run to Cursor was possible due to mixed mode (managed and native) debugging. So, I created a simple native console application with my updated code for easier and faster debugging. After a couple of hours, it is time to go back to the Get() implementation because I do not find anything obviously wrong.\nMake it simpler and simpler and simpler again In that type of situation, I recommend the “remove code and debug” strategy (from “divide and conquer” attributed to Julius Cesar). From the simplified console application, the code now looks like:\n1 2 3 4 5 6 7 8 9 10 bool GetGcDump(int pid, IGcDumpProvider::gcdump_t\u0026amp; gcDump) { GcDump gcd(pid); // no more gcd.TriggerDump() // no more for (auto\u0026amp; type : types) return true; } No more complicated call nor iteration on the vector of results. Guess what? Same stack corruption.\nIt is time to go one level deeper: what does this GcDump class look like?\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class GcDump { public: GcDump(int pid); ~GcDump(); ... private: int _pid; DiagnosticsClient* _pClient; EventPipeSession* _pSession; HANDLE _hListenerThread; GcDumpState _gcDumpState; }; The constructor is setting the fields value to zero/nullptr and the destructor is cleaning up these fields if necessary. Since TriggerDump() is no more called, these fields never change.\nI’m commenting out all fields until only _gcDumpState remains and it continues to crash. When is it commented out, it is not more crashing.\nUse your debugger Luke! Let’s turn to the GcDumpState class that is even simpler:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 class GcDumpState { public: GcDumpState(); ~GcDumpState(); public: // fields removed for brevity private: bool _isStarted; bool _hasEnded; uint32_t _collectionIndex; }; The code of the destructor only sends a trace to the console (removing it completely does not fix the issue) and here is the constructor code:\n1 2 3 4 5 6 GcDumpState::GcDumpState() { _isStarted = false; _hasEnded = false; _collectionIndex = 0; } Again, same strategy: remove one field after the other. This time, the code stops crashing if the two Boolean fields are removed or if the 32 bits index is removed. If the index field is not set, no more corruption!\nHow could this assignation corrupt the stack? It is time to use the Visual Studio debugger to better understand what is going on.\nFirst, set a breakpoint on the assignment line and click Debug | Disassembly to see the corresponding assembly code:\nThe two lines of assembly code are easy to understand:\n1 2 mov rax,qword ptr [this] mov dword ptr [rax+4],0 The this pointer is stored in the rax register The 32 bits (mov dword) memory starting 4 bytes after the beginning of the object pointed to by this, is set to 0 I enter “this” in a Memory panel (Debug | Windows | Memory xx)\nAnd Visual Studio gives me the corresponding address and the content of the memory there:\nPressing F10 twice to Step Over the two assembly instructions and this is confirmed:\nInstead of storing the 32 bits 0 value just after the two bytes corresponding to the bool fields, it is stored 2 bytes away. It looks like a padding is added on my behalf.\nI change the build settings for the GcDumpState.cpp file to enable all warnings:\nThe compilation confirms what has been seen in the memory:\n1 GcDumpState.h(41,14): warning C4820: \u0026#39;GcDumpState\u0026#39;: \u0026#39;2\u0026#39; bytes padding added after data member \u0026#39;GcDumpState::_hasEnded\u0026#39; What’s next? My understanding is that the compiler is:\nadding a 2 bytes padding to align the 32 bits index field generating the constructor code based on that padding but the stack corruption checking code does not take it into account The solution is to either add a uint16_t field after the 2 bool fields as an explicit padding or use #pragma pack(1) to decorate the class definition.\nHowever, this looks really weird to me. We should have faced this issue a long time ago because we were never cautious about alignment in all the classes and structures that we allocate in our code. To validate the assumption, I’m writing a small reproduction code outside of all the .gcdump complexity. And guess what? I’m not able to reproduce the stack corruption. Another mystery of the C++ compilation optimizations probably…\nThis is the end of my debugging Friday at Datadog :^)\n","cover":"https://chrisnas.github.io/posts/2023-12-11_be-aligned-or-how/1_XEEcEc2kLKmy8xRDF2e_xA.png","date":"2023-12-11","permalink":"https://chrisnas.github.io/posts/2023-12-11_be-aligned-or-how/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eDuring the Datadog R\u0026amp;D week, my goal is to mimic the generation of a .gcdump from our .NET profiler. I’ve already written most of the code \u003ca href=\"/posts/2023-08-11_net-gcdump-internals/\"\u003efor a previous post\u003c/a\u003e and after changing the required plumbing, it is time to test the workflow.\u003c/p\u003e\n\u003cp\u003eUnfortunately, I’m facing the dreaded stack corruption dialog:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2023-12-11_be-aligned-or-how/1_5ydnBBOQBccS0f016OUDIw.png\"\u003e\u003c/p\u003e\n\u003cp\u003eThe rest of the post explains the different steps I’m following to investigate this issue.\u003c/p\u003e\n\u003ch2 id=\"trying-to-understand-theproblem\"\u003eTrying to understand the problem\u003c/h2\u003e\n\u003cp\u003eThis stack check is done by the debug version of the C Runtime library by basically adding some special bytes on the stack before calling a function and checking these bytes are not tampered when returning from the call.\u003c/p\u003e","title":"Be Aligned! Or how to investigate a stack corruption"},{"content":" Introduction When I started to work on the second edition of Pro .NET Memory Management : For Better Code, Performance, and Scalability by Konrad Kokosa, I already spent some time in the CLR code for a couple of pull requests related to the garbage collector. However, updating the book to cover 5 new versions of .NET requires looking at new APIs but also digging deep inside the CLR (and especially the GC) hundreds of thousand lines of code!\nThe first step is to install Visual Studio 2022 Preview that allows you to compile and run projects targeting .NET 8. Then, goto https://github.com/dotnet/runtime and git clone the tag of the .NET 8 preview version you have installed.\nThat way, you will be able to directly run the same version that you will debug.\nAnd now, what are the next steps?\nThe goal of this post is to share with you the tips and tricks I used to navigate into the CLR implementation so you could better understand how things are working.\nFrom C# to C++ As a .NET developer, I’m used to the APIs provided by the Base Class Library built on top of the CLR. Let’s take as an example the following code that is using the GC.AllocateArray method that allows you to allocate a pinned in memory array and available since .NET 5.\n1 2 3 4 5 6 7 8 9 10 using System; internal class Program { static void Main(string[] args) { byte[] pinned = GC.AllocateArray\u0026lt;byte\u0026gt;(90000, true); Console.WriteLine($\u0026#34;generation = {GC.GetGeneration(pinned)}\u0026#34;); } } When you Ctrl+click the method name (or use F12), thanks to Source Link integration, you go to its implementation where you can even set breakpoint:\nIf you don’t use Visual Studio, you could open the generated assembly into a decompiler such as ILSpy or DnSpy. The latter even allows you to set breakpoints and debug the disassembly IL without any source.\nIn both cases, only the managed implementation will be available: you soon end up to an “internal call” corresponding to a native function implemented by the CLR. The managed methods are decorated with the MethodImplOptions.InternalCall attribute.\nFor the garbage collector code, you can look into the GC.CoreCLR.cs file where these methods are defined. You can note some methods decorated with the DllImport attribute to bind to native functions exported by a “QCall” library. There is an optimized path in P/Invoke done by the JIT to transform these calls not like a usual LoadLibrary/GetProcAddress as you could expect. Instead, they will be routed to the exported methods by coredll.dll and defined in the s_QCall array in qcallentrypoints.cpp. But where to look further for the native implementation?\nInstead of searching among the thousands of files, focus on comutilnative.h that defines the signature of most exported functions. The implementation of the exported native functions is found in comutilnative.cpp. This is where you should start your journey in the native implementation of the CLR. For the list of all functions called by the libraries in the runtime, look at the ecalllist.h file (around gGCInterfaceFuncs and gGCSettingsFuncs specifically for the GC).\nNote that you might also find some implementations under the classlibnative folder like in the system.cpp file for GCSettings.IsServerGC.\nCLR Source code debugging It is nice to know that the implementation of most CLR exported native functions used by the BCL is in comutilnative.cpp. For the GC, the functions are either statics from the GCInterface class or static functions prefixed by GCInterface_; I don’t know why all are not part of GCInterface…\nWhen you look at the GC-related methods implementation, a lot are calling methods from the instance returned by GCHeapUtilities::GetGCHeap() that corresponds to the static g_pGCHeap global variable. It is interesting to follow the threads of calls like that, but I have to admit that, after a few hops, I’m starting to get lost. So, I’m drawing boxes for types on a piece of paper and arrays from their fields to other types as boxes.\nHowever, with a code base that big, I definitively prefer to set breakpoints and write a small C# application to call the methods I’m interested in and see what data structures are used in the different layers of implementation. Don’t be scared: WinDBG is not required to achieve this goal. As this page explains, you need to type the following commands in a shell at the root of the repo:\n.\\build.cmd -s clr -c Debug .\\build.cmd clr.nativeprereqs -a x64 -c debug .\\build.cmd -msbuild\nThe last command generates a CoreCLR.sln solution file in artifacts\\obj\\coreclr\\windows.x64.Debug\\ide) that you can open in Visual Studio 2022 Preview.\nIn VS, right-click the INSTALL project, select Properties and setup the Debugging properties\nHere are the details of each property:\nIt could be interesting to set some environment variables such as DOTNET_gcServer to 1 for a GC Server configuration instead of workstation. In that case, click the \u0026lt;Edit..\u0026gt; choice in the combo-box:\nAnd update the textbox at the top:\nThe final step is to set this project as the startup project:\nYou are now able to set the breakpoint you want in the native code of the CLR and type F5/Debug in Visual Studio to step into the code!\nAnd what about the assembly code? Some specific data structures, such as the NonGC Heap, are used by the JIT compiler when generating the assembly code from the IL compiled from your C# code. It means that you need to look at that JITted code to fully understand what is going on.\nA first way to get it is to use https://sharplab.io/, type your C# code and select x64 for Core of x86/x64 for Framework:\nBut as you can see from this screenshot, it is using the .NET 7 compiler. What if you would like to see the .NET 8 compilation result just in case something changed?\nThe solution I’m using is to generate a memory dump with procdump -ma of a test application. Before opening the dump in WinDBG, there is something you should be aware of: with the tiered compilation, you will need to call a method several times before the final optimized assembly code gets JITed. Or… decorate the method you are interested in with the [MethodImpl(MethodImplOptions.AggressiveOptimization)] attribute to instruct the JIT to directly generate the most optimized tier.\nOnce the dump loaded in WinDBG, the first step is to get the MethodTable pointer corresponding to the method you are interested in. For that, use the name2ee SOS command:\nClick the link corresponding to MethodDesc to run the dumpmd SOS command:\nThe last step is to click the link corresponding to CodeAddr to run the U command and see the JITted assembly code:\nIf you compare this code to get the “Hello, World!” string, with the one shown by sharplab,\n1 2 3 4 Program.Hello() L0000: mov rcx, 0x257f7cbc368 L000a: mov rcx, [rcx] L000d: jmp qword ptr [0x7ff9c9bd7f48] you might notice a tiny difference: there is one less indirection in .NET 8! But this is another story that will be told in the second edition of the “Pro .NET Memory Management: For Better Code, Performance, and Scalability” book ;^)\n","cover":"https://chrisnas.github.io/posts/2023-11-12_how-to-dig-into/1_4kg2jOleRs06edogUKEgQA.png","date":"2023-11-12","permalink":"https://chrisnas.github.io/posts/2023-11-12_how-to-dig-into/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhen I started to work on the second edition of \u003cem\u003ePro .NET Memory Management : For Better Code, Performance, and Scalability\u003c/em\u003e by Konrad Kokosa, I already spent some time in the CLR code for a couple of pull requests related to the garbage collector. However, updating the book to cover 5 new versions of .NET requires looking at new APIs but also digging deep inside the CLR (and especially the GC) hundreds of thousand lines of code!\u003c/p\u003e","title":"How to dig into the CLR"},{"content":" Introduction When you have a call with a customer who explains to you that his application is crashing when your profiler is enabled, it is never a great experience. This post is listing which steps were followed to investigate such an issue I faced last week; from the basics up to the final in analysing memory dumps in WinDbg.\nGet as many setup details as possible The situation was the following:\nA web application was running fine with our Datadog .NET profiler on some non-production servers with less traffic. The same application was crashing on production servers with more traffic. We were lucky to be able to remote access both machines. A lot of time was spent to check the setup that is based on environment variables. Basically, for our profiler to be loaded by a .NET application, a few Microsoft related environment variables need to be set. Then, you enable the Datadog profiler by setting DD_PROFILING_ENABLED to 1 in order to get the profiling details available in our UI. Since the web application is running in IIS, things get more complicated because some environments variables must be set in… the Registry.\nSo, we checked the environment variables set at the machine level with the set command in a prompt and those for IIS with the Registry Editor. However, we got some inconsistencies, and we needed a way to validate what were the environment variables really seen by the web application! The Process Explorer tool from Sysinternals was downloaded and launched. After finding the process ID of the running w3wp.exe corresponding to the web application, a simple right-click to get the Properties and selecting the Environment Tab gave us the truth:\n(This screenshot shows the results for one of our test applications on my development machine).\nGetting a memory dump Once the setup was checked on both machines without any too weird issues, the next step was to figure out why the application was randomly crashing. Even if the machines received different traffic loads, since applications running without our profiler enabled were not crashing, the chances were high that our C++ code was at the source of the problem. But the crashes were random… And you can’t install Visual Studio on a production server and attach to the process hoping that it will crash and start a debugging session there!\nWindows Error Reporting is generating mini dumps when applications are crashing but they are usually not enough to start an investigation. Again, the other Sysinternals tools procdump was installed as a global crash handler with procdump -i c:\\dumps -ma. The next time the application crashed, a memory dump was be generated in the c:\\dumps folder. Don’t forget to create it manually if it does not exist.\nFrom addresses to source code To play with a memory dump, WinDbg is my preferred toy. I opened the memory dump and, in the case of a crash, the stack panel automatically displayed the call stack of the faulted thread:\nThe last frame triggering the issue (i.e., before KiUserExceptionDispatch) is Datadog_Profiler_Native!DllCanUnloadNow+0x2954b. Knowing that WinDbg transforms . in file names into _ leads to Datadog.Profiler.Native.dll which is the file where our profiler is implemented. However, WinDbg was not able to find the name of the function and only looked at the exported public symbols. With the lm command, you can see how WinDbg gets the symbols for this dll:\nWith DllCanUnloadNow, you could tell that we are dealing with some COM stuff but it did not really help me for the investigation: I needed to know which function was running which part of its code. Hopefully, for each release of the .NET profiler in Github, in addition to the .msi installer, the symbols and the source code are also provided.\nBoth files were unzipped in the folder where the dumps were copied. Then, I changed the Debugging Settings in WinDbg to point to these folders:\nLet’s start with the symbols to let WinDbg match an instruction pointer to a function name. I asked WinDbg to provide details about the symbol resolution with !sym noisy. Then I forced the symbols for my module to gets reloaded with .reload /f “Datadog.Profiler.Native.dll”. In the flow of errors, I find out where the .pdb file should be stored so that WinDbg would find it:\nSo the problem is triggered somewhere in our Windows64BitStackFramesCollector::CollectStackSampleImplementation function. By simply double-clicking this frame, WinDbg automagically found the corresponding source file and pinpointed the culprit line:\nA bit of WinDbg magic To follow me a bit further, you need to understand what this code is doing: it is walking the stack of a thread to find the instruction pointers of each called function. This line 260 is dereferencing the address contained in context.Rsp. I looked at Locals panel to get its value:\nThe !address command gave me in which module this code was executed from:\nIt looked like a valid page with executable code…\nI wanted to see why our stack walking code would break here. What if I asked WinDbg to show me this stack? To do that, I first needed to know which thread our code was trying to stack walk. I knew that Windows64BitStackFramesCollector was keeping track of the currently walked thread in a ManagedThreadInfo instance pointed to by its _pCurrentCollectionThreadInfo field:\nThis instance stores the thread ID in its _osThreadId field: now let’s ask WinDbg to switch to this thread.\nThe ~ command lists all threads:\nA quick CTRL+F with “27600” stopped on the thread #72. Threads have a lot of identifiers in WinDbg and the first one allowed me to switch with ~72s.\nThe Stack panel was almost empty:\nTo be sure, I used the kp command… that told me that WinDbg was not really happy neither:\nI was kind of stuck but my colleague Kevin Gosse mentioned that I could use r rip to see what would be the next instruction to be executed by this thread:\nThen, the ln command (close to the !address command I used just before) allowed me to click the Browse Module link and see that, again, some code from Sentinel One was ready to execute.\nThis agent is part of an anti-virus (and more) solution that seems to highjack the stack of threads and our code was not dealing properly with this kind of situation. The fix was to protect our dereferencing code against access violation and stop walking the stack in that case.\nAnother debugging day at Datadog :^)\n","cover":"https://chrisnas.github.io/posts/2023-10-02_crap-the-application-is/1_ayVTO6s3e0jSFwVeGbl3zQ.png","date":"2023-10-02","permalink":"https://chrisnas.github.io/posts/2023-10-02_crap-the-application-is/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhen you have a call with a customer who explains to you that his application is crashing when your profiler is enabled, it is never a great experience. This post is listing which steps were followed to investigate such an issue I faced last week; from the basics up to the final in analysing memory dumps in WinDbg.\u003c/p\u003e\n\u003ch2 id=\"get-as-many-setup-details-aspossible\"\u003eGet as many setup details as possible\u003c/h2\u003e\n\u003cp\u003eThe situation was the following:\u003c/p\u003e","title":"Crap: the application is randomly crashing!"},{"content":" The .NET runtime (both .NET Framework and .NET Core) allows you to generate a lightweight dump containing the allocated type instances count and references including roots. They are usually generated into .gcdump files by tools such as Perfview or dotnet-gcdump and can also be viewed in Visual Studio. In addition to a view of the allocated types in the managed heap, these files are often used during memory leak investigations because they are much smaller than full memory dump and they contain explicit dependency information between types up to their roots.\nThe goal of this document is to dig into their generation and see how to leverage the same mechanisms from the .NET CLR for live heap profiling and memory leak detection.\nSimply listening to CLR events Both .NET Framework and .NET Core work the same way to allow a tool to generate a .gcdump file:\nIt seems that the type table needs to be flushed for .NET Core by first creating and immediately closing an event session with the Microsoft-DotNETCore-SampleProfiler provider. You create an event session (either through ETW for Framework or EventPipe for Core) where the Microsoft-Windows-DotNETRuntime provider is enabled with a verbose level for a long list of keywords corresponding to the 0x1980001 value: This will trigger an induced non concurrent gen2 garbage collection during which the GC will walk the remaining live objects with their size and emit tons of events, most of them not documented:\nGCStart: wait for the first induced gen2 foreground GC (Depth = 2, Type = GCType.NonConcurrentGC and Reason = GCReason.Induced) GCStop: detect when the heap walk is over BulkType: type blocks are enqueued GCBulkNode: node blocks are enqueued GCBulkEdge: edge blocks are enqueued GCBulkRootEdge: enqueue non-weak reference roots (GCRootFlag \u0026amp; GCRootFlags.WeakRef != 0) based on their GCRootKind GCBulkRootStaticVar: static variable blocks GCBulkRCW and GCBulkRootCCW for Runtime Callable Wrappers and COM Callable Wrappers COM-based roots GCBulkRootConditionalWeakTableElementEdge: ?? GCGenerationRange: one for each managed heap segment with generations boundary (not really needed to build the dependency graph but interesting to figure out objects in generation 2) The payload of most of these events contains arrays of instances with their dependencies. For Perfview and dotnet-gcdump, the whole graph is built in the ConvertHeapDataToGraph method after the garbage collection ends.\nDeciphering CLR events payload When you look at the dotnet-gcdump implementation, you realize that most of the complex code to compute the .gcdump files is physically copied from the Perfview repository.\nThe GCBulkXXX events payload contains an array of elements; each element being different. The common Count field contains the number of elements. If the element contains a string such as for BulkType, it means that each one has a different size and the string must be read entirely from the payload before accessing the next element.\nType definition To avoid sending expensive type names all the time, each type will have an identifier that will be used in the GCBulkXXX-nodes related events.\nThe BulkType events contain an array of types definition elements with the following layout:\nTypeID: id of the type (i.e. pointer to the Method Table) ModuleID: id of the module where the type is defined TypeNameID: if Name is empty, use this address as a name Flags: if this bitset contains 0x8, it is an array so append “[]” to the name in that case CorElementType:? Name: Unicode string corresponding to the name of the type where `xxx need to be removed in case of generics TypeParameterCount: for generics but not used Array of type parameter: for generics but not used Once the type mappings are known, it becomes possible to build the graph of live type instances instead of just nodes with IDs.\nListing Live Objects and References The live objects are sent in the GCBulkNode events payload:\nIndex: incrementing index of the bulk starting from 0 Count: number of objects in the array followed by an array of Values\nAddress: address in memory where the object is stored Size: size of the object (including for arrays) TypeID: identifier of the object class usable EdgeCount: number of objects pointed to by this object (i.e. non null reference type fields) Each event contains an array of live objects identified by their address. The Size and TypeID fields are easy to understand but what does the EdgeCount field represent? This is the number of objects that are referenced by this object. At the code level, this is the count of non-null reference type fields. For example, if a class A defines one integer field and a second one as a reference to an instance of type B, the EdgeCount would be 1 (because an integer is not a reference type).\nSo the next question is from where do you get which instances are referenced by the objects received in GCBulkNode payload? Since these objects are in memory, they are part of the GCBulkNode events payload but where is the relationship between the EdgeCount value and the corresponding objects? You will have to rebuild this relationship because the referenced objects are received in the GCBulkEdge events payload:\nIndex: incrementing index of the bulk starting from 0 Count: number of objects in the array followed by an array of Values\nValue: address in memory where the object is stored ReferencingFieldID: this is not used and is always 0 Again, an array of elements is received as payload and the only interesting information is the address of the object.\nBut the magic is that both nodes and edges events payload are “in sync”: when an object is read from a GCBulkNode array with let’s say 2 as EdgeCount value, the current 2 elements in the array of the GCBulkEdge payload will contain the addresses of these 2 objects. If the next object in the GCBulkNode array has 1 as EdgeCount value, the next element in the array of the GCBulkEdge payload will be address of this object as shown in the following figure:\nIt means that both payloads must be iterated in sync.\nJust with these two events, it is possible to get a detailed view of the objects still used in memory with their size and their type like what you get with dotnet-gcdump report or !sos.dumpheap -stat.\nWith the nodes (live objects) and the edges (objects referenced by each object), it is now possible to build the reference graph of live objects.\nListing Roots In addition to the objects related events, the GC is also emitting events to list the roots that are referencing objects in the managed heap from the stack, statics, handles or other weird places.\nThe most interesting roots are available thanks to the following events:\nGCBulkRootEdge\nIndex: incrementing index of the bulk starting from 0 Count: number of roots in the array followed by an array of Values\nRootedNodeAddress: address in memory of the root object GCRootKind: is Stack for local variables GCRootFlag: if not a local variable, could be RefCounted, Finalizer, strong/pinning handles, or other handles GCRootID: address of the handle that points to the root object The static ones are given by the following events:\nGCBulkRootStaticVar\nCount: number of static roots in the array AppDomainID: app domain in which the static variable is stored followed by an array of Values\nGCRootID:address of the handle that points to the root object ObjectID: address of the root object TypeID: type identifier of the root object Flags: could be ThreadLocal or not FieldName: Unicode string corresponding to the name of the field in the type corresponding to the root Other rarely used roots are available from the GCBulkRootConditionalWeakTableElementEdge and COM-related ones from GCBulkRootCCW/GCBulkRCW with the ref count for example.\nSo, each of these events provides arrays of root objects addresses. These can be used in conjunction with the reference graph built from the previous node/edge events to identify the reason why objects stay in memory. Like for the ICorProfilerCallback::ObjectReferences usage previously described, it is needed to rebuild the inverse reference chain from the reference graph:\nand deal with cycles:\nThese roots could be an expected cache or a memory leak. For the memory leak scenario, filtering on objects in the gen2 could definitely help. This is where the GCGenerationRange events could help because their payload contains the ranges of memory addresses in each segment with the corresponding generation:\nGCGenerationRange\nGeneration: generation of the segment RangeStart: address of the start of the segment RangeUsedLength: size of the committed part of the segment RangeReservedLength: size of the reserved part of the segment When an address fits inside RangeStart and RangeStart + RangeUsedLength, it is part of this segment. The generation of the segment could be 0, 1 or 2 for the ephemeral segments, 3 for the Large Object Heap, and 4 for the Pinned Object Heap.\nIntegration with a .NET Profiler As a .NET Profiler, it is possible to listen to CLR events via ICorProfilerCallback::EventPipeEventDelivered. If the same keywords as 0x1980001 have been enabled thanks to ICorProfilerInfo12::EventPipeStartSession, the corresponding messages will be received and you have to keep track of the fact that a gcdump is in progress. This should not be a big deal because only the GC keyword (0x1) might be already used and events describing the collections will be processed anyway. There won’t be duplication of events in that case.\nHowever, since it is needed to start a session with the right keyword to trigger the special garbage collection, the ICorProfilerCallback mechanism cannot be used to continuously process the corresponding specific messages. This one time EventPipe session should be started independently by manually connecting to the EventPipe of the currently running CLR as described in details in this blog series.\nYou are now ready to integrate this feature of the CLR without the need to install a tool!\n","cover":"https://chrisnas.github.io/posts/2023-08-11_net-gcdump-internals/1_nMFTtE3rNI50uxIs7qMUtw.png","date":"2023-08-11","permalink":"https://chrisnas.github.io/posts/2023-08-11_net-gcdump-internals/","summary":"\u003chr\u003e\n\u003cp\u003eThe .NET runtime (both .NET Framework and .NET Core) allows you to generate a lightweight dump containing the allocated type instances count and references including roots. They are usually generated into .gcdump files by tools such as \u003ca href=\"https://github.com/microsoft/perfview\"\u003ePerfview\u003c/a\u003e or \u003ca href=\"https://github.com/dotnet/diagnostics/blob/main/documentation/dotnet-gcdump-instructions.md\"\u003edotnet-gcdump\u003c/a\u003e and can also be viewed in Visual Studio. In addition to a view of the allocated types in the managed heap, these files are often used during memory leak investigations because they are much smaller than full memory dump and they contain explicit dependency information between types up to their roots.\u003c/p\u003e","title":".NET .gcdump Internals"},{"content":" Introduction It’s been almost 12 years since I wrote LeakShell to help me automate the search of memory leaks in .NET. The idea was simple: compare 2 memory dumps of a leaking .NET application to show the types with increasing instances count.\nToday, you could use Visual Studio Memory Usage tool to do the same but with a much better user interface! The additional killer feature is the ability to see the references chain that explains why a “leaky” object stays in memory.\nMy previous series about building your own .NET memory profiler in C# is based on CLR events and does not allow to get the references chain. This post explains how you could write your own memory profiler based on.NET profiler APIs in C++. Refer to this post for an introduction of how to implement ICorProfilerCallback to be loaded by the CLR in a .NET process.\nHow to detect memory leaks From a high level view, detecting a memory leak means being able to know which objects stay alive garbage collection after garbage collection:\nimplement ICorProfilerCallback::ObjectAllocated to keep track of ALL objects in the heap, use ICorProfilerCallback::MovedReferences2 to fixup the addresses when the live objects are moved during compaction garbage collections, since** **ICorProfilerCallback::ObjectReferences is called for each surviving object, clean your list of live objects. The first drawback of this solution is that the CLR has to disable concurrent GC to call these functions with probable impact on performances. However, if you can’t find a leak in production that leads to out of memory crashes, running one instance in this mode is perfectly acceptable. The second drawback is the complexity of keeping track of objects through compacting GCs.\nThis is why I implemented in .NET 7 a new set of functions in ICorProfilerInfo13 to mimic what you can do in C# with a weak reference:\nCreateHandle : create a weak handle to wrap an object, GetObjectIDFromHandle: get the address of the wrapped object or null if the object is no more in the heap, DestroyHandle: clean up the weak handle. Creating such a weak handle for allocated objects get rid of the address fixup complexity. However, you should not create a handle for ALL allocated objects because it will slow down the garbage collections. So the next step is to listen to the AllocationTick CLR event and create a weak handle for each sampled allocation. Even though the statistical distribution of such 100 KB threshold-based sampling is not perfect, leaking objects should appear.\nAfter each garbage collection detected in ICorProfilerCallback2::GarbageCollectionFinished or via specific GC events, you could clean up this list of allocated objects by removing those for which GetObjectIDFromHandle returns null.\nFeel free to look at the corresponding implementation in Datadog .NET profiler code.\nRebuild references chain up to a root Even though it is possible to get the call stack that led to allocating a leaking object thanks to the AllocationTick event, it would be better to know why it stays in memory. So the next step is to rebuild the references chain up to the root.\nAs explained in a previous post, it is possible, for a given object, to get the list of its fields and build a graph of dependencies from a parent to its children. However, you are interested in the opposite and it would require to get these parent/children references for ALL objects in the heap. And this is not possible with the sampled AllocationTick event…\nThis is where ICorProfilerCallback::ObjectReferences shines:\n1 2 3 4 5 6 HRESULT ObjectReferences( ObjectID objectId, ClassID classId, ULONG cObjectRefs, ObjectID objectRefIds[] ); This method is called during a garbage collection for all objects (i.e. objectId first parameter) still alive and lists its fields referencing objects in the heap (i.e. objectRefIds last parameter).\nYou could store each object as an ObjectNode:\n1 2 3 4 5 6 7 8 9 struct ObjectNode { public: ObjectNode(ObjectID objectId); public: ObjectID instance; std::vector\u0026lt;ObjectNode*\u0026gt; rootRefs; }; in a vector that represents the heap.\nFor each fields, look for its corresponding node in the vector and add the node of its parent (given by objectId) to its rootRefs vector of parents. That way, you are building a back reference graph:\nThe small blue arrows show the parent/children reference given by ObjectReferences and the large purple ones are kept to build a reverse references graph you are interested in.\nYou know when all live objects in the heap have been listed when ICorProfilerCallback2::GarbageCollectionFinished is called. It is now time to get the build the references chain for all sampled objects still alive (thanks to GetObjectIDFromHandle returning non null address).\nIt is important to understand that it is a graph and not a tree because cycles exist in .NET.\nThis is common in situations where objects need to keep a reference to their “parents”. It means that these cycles should be detected when looking for the list of references of a given live object to avoid infinite recursion:\nbool DumpNode(ObjectNode* node, std::vector\u0026amp; referenceStack)\nThe traversing DumpNode method takes a node (i.e. an object of the heap) and a stack where the parents will be added as we dig into the graph.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 bool DumpNode(ObjectNode* node, std::vector\u0026lt;ObjectID\u0026gt;\u0026amp; referenceStack) { // end of recursion: the node is a root if (node-\u0026gt;rootRefs.size() == 0) { // dump the root std::cout \u0026lt;\u0026lt; std::endl; std::cout \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; node-\u0026gt;instance \u0026lt;\u0026lt; std::dec; COR_PRF_GC_ROOT_KIND kind; COR_PRF_GC_ROOT_FLAGS flags; if (FindRoot(_roots, node-\u0026gt;instance, kind, flags)) { std::cout \u0026lt;\u0026lt; \u0026#34; | \u0026#34;; DumpKind(kind); std::cout \u0026lt;\u0026lt; \u0026#34; - \u0026#34;; DumpFlags(flags); } else { std::cout \u0026lt;\u0026lt; \u0026#34; | ?\u0026#34;; } std::cout \u0026lt;\u0026lt; \u0026#34; = \u0026#34;; DumpObjectType(node-\u0026gt;instance, _pCorProfilerInfo, _pFrameStore); std::cout \u0026lt;\u0026lt; std::endl; // dump the references from the root for (int16_t i = referenceStack.size()-1; i \u0026gt;= 0; i--) { ObjectID reference = referenceStack[i]; std::cout \u0026lt;\u0026lt; \u0026#34; --\u0026gt; \u0026#34;; std::cout \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; reference \u0026lt;\u0026lt; std::dec; std::cout \u0026lt;\u0026lt; \u0026#34; = \u0026#34;; DumpObjectType(reference, _pCorProfilerInfo, _pFrameStore); std::cout \u0026lt;\u0026lt; std::endl; } return true; } // detect cycles if (Find(referenceStack, node-\u0026gt;instance)) { return false; } // go up into the reference chain referenceStack.push_back(node-\u0026gt;instance); for (auto\u0026amp; parentNode : node-\u0026gt;rootRefs) { if (DumpNode(parentNode, referenceStack)) { return true; } } referenceStack.pop_back(); return false; } If a parent node is already in the stack, a cycle is detected and that path is not used. Once a root is reached, the stack is dumped as shown in the following output:\nOnGarbageCollectionFinished: 3859 objects in the heap. OnRootReferences2: 90/109 roots. stack:40 finalizer:1 handle:49 other:0 ------------------ 21266c00020 | H - 0 = Object[] --\u0026gt; 21268c092c8 = NativeRuntimeEventSource --\u0026gt; 21268c3fbe8 = EventSource.EventMetadata[] --\u0026gt; 21268c18010 = ParameterInfo[] --\u0026gt; 21268c17ef0 = RuntimeParameterInfo --\u0026gt; 21268c0ed58 = RuntimeMethodInfo --\u0026gt; 21268c0e708 = RuntimeType.RuntimeTypeCache --\u0026gt; 21268c0e840 = RuntimeType.RuntimeTypeCache.MemberInfoCache\u0026lt;System.Reflection.RuntimeMethodInfo\u0026gt; --\u0026gt; 21268c14978 = RuntimeMethodInfo[] --\u0026gt; 21268c10d70 = RuntimeMethodInfo --\u0026gt; 21268c29720 = Signature --\u0026gt; 21268c29770 = RuntimeType[] ===================================== As shown in the output, it is possible to provide details about the kind of root is keeping the references chain alive thanks to ICorProfilerCallback::RootReferences2:\n1 2 3 4 5 6 7 HRESULT RootReferences2( ULONG cRootRefs, ObjectID rootRefIds[], COR_PRF_GC_ROOT_KIND rootKinds[], COR_PRF_GC_ROOT_FLAGS rootFlags[], UINT_PTR rootIds[] ); This function is called with three synchronized arrays cRootRefs long that contain for each root:\nthe address (**rootRefsIds **objectID), the kind (rootKind for stack, finalizer, handle and other) and flags (rootFlags for pinned, weak reference interior or ref counted). These are stored in a vector of ObjectRoot:\nGoodies: how to get arrays type name I did not mention how to get the type name of either an ObjectID or a ClassID because it is explained in a previous post. However, I forgot to explain how to deal with the different kinds of arrays: single dimension (ex: byte[]), multidimensional (ex: byte[,]) or jagged (ex: byte[][]).\nWhen you call ICorProfilerInfo::GetClassInfo on a ClassID corresponding to an array,\n1 2 3 ModuleID moduleId; mdTypeDef typeDefToken; hr = _pCorProfilerInfo-\u0026gt;GetClassIDInfo(classId, \u0026amp;moduleId, \u0026amp;typeDefToken); it won’t fail but the module id and the metadata token will both be set to 0.\nInstead, you have to call ICorProfilerInfo::IsArrayClass to get the rank and the item class ID of the array. This is then done recursively on the item class ID until it fails:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 std::string arrayBuilder; CorElementType baseElementType; ClassID itemClassId; ULONG rank = 0; if (_pCorProfilerInfo-\u0026gt;IsArrayClass(classId, \u0026amp;baseElementType, \u0026amp;itemClassId, \u0026amp;rank) == S_OK) { classId = itemClassId; isArray = true; AppendArrayRank(arrayBuilder, rank); // in case of matrices, it is needed to look for the last \u0026#34;good\u0026#34; item class ID // because all others might be array of array of ... for (size_t i = 0; i \u0026lt; rank; i++) { HRESULT hr = _pCorProfilerInfo-\u0026gt;IsArrayClass(classId, \u0026amp;baseElementType, \u0026amp;itemClassId, \u0026amp;rank); if ((hr == S_FALSE) || FAILED(hr)) { itemClassId = classId; break; } AppendArrayRank(arrayBuilder, rank); classId = itemClassId; } } Notice that the way to concatenate the possible [] / [,] / [][] could is the opposite of how the array type is defined in C#:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void AppendArrayRank(std::string\u0026amp; arrayBuilder, ULONG rank) { if (rank == 1) { arrayBuilder = \u0026#34;[]\u0026#34; + arrayBuilder; } else { std::stringstream builder; builder \u0026lt;\u0026lt; \u0026#34;[\u0026#34;; for (size_t i = 0; i \u0026lt; rank - 1; i++) { builder \u0026lt;\u0026lt; \u0026#34;,\u0026#34;; } builder \u0026lt;\u0026lt; \u0026#34;]\u0026#34;; arrayBuilder = builder.str() + arrayBuilder; } } For example, a byte[][,] is defined as an rank 2 array of array of byte.\nReferences Automate the search of memory leaks with LeakShell Building your own .NET memory profiler in C# Introduction to .NET Profiling with ICorProfilerCallback Pull request in .NET 7 for ICorProfilerInfo13 to create weak handles Datadog .NET Live Heap Profiler implementation ","cover":"https://chrisnas.github.io/posts/2023-05-08_raiders-of-the-lost/1_GXNAUQtq1-moMncqOM_yHw.png","date":"2023-05-08","permalink":"https://chrisnas.github.io/posts/2023-05-08_raiders-of-the-lost/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIt’s been almost 12 years since I wrote \u003ca href=\"https://github.com/chrisnas/DebuggingExtensions/tree/master/src/LeakShell\"\u003eLeakShell\u003c/a\u003e to help me \u003ca href=\"https://codenasarre.wordpress.com/2011/05/18/leakshell-or-how-to-automatically-find-managed-leaks/\"\u003eautomate the search of memory leaks\u003c/a\u003e in .NET. The idea was simple: compare 2 memory dumps of a leaking .NET application to show the types with increasing instances count.\u003c/p\u003e\n\u003cp\u003eToday, you could use \u003ca href=\"https://learn.microsoft.com/en-us/visualstudio/profiling/memory-usage-without-debugging2?view=vs-2022?WT.mc_id=DT-MVP-5003325\"\u003eVisual Studio Memory Usage\u003c/a\u003e tool to do the same but with a much better user interface! The additional killer feature is the ability to see the references chain that explains why a “leaky” object stays in memory.\u003c/p\u003e","title":"Raiders of the lost root: looking for memory leaks in .NET"},{"content":" The previous episodes started the parsing of the “nettrace” format used when contacting the .NET Diagnostics IPC server, initiate the protocol to receive CLR events and start to parse stacks. This last episode covers the Metadata and Event blocks.\nIn terms of format, both Metadata and Event blocks share the same memory layout:\nThe common EventBlockHeader starts the block:\n1 2 3 4 5 6 7 8 9 10 #pragma pack(1) struct EventBlockHeader { uint16_t HeaderSize; uint16_t Flags; uint64_t MinTimestamp; uint64_t MaxTimestamp; // some optional reserved space might be following }; The timestamp fields give the time of the first and last event in the block. The HeaderSize fields is important because additional information can be stored in the header. Since I have no idea what could be stored there, I simply skip it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 bool EventParserBase::OnParse() { // read event block header EventBlockHeader ebHeader = {}; if (!Read(\u0026amp;ebHeader, sizeof(ebHeader))) { return false; } // skip any optional content if any if (ebHeader.HeaderSize \u0026gt; sizeof(EventBlockHeader)) { uint8_t optionalSize = ebHeader.HeaderSize - sizeof(EventBlockHeader); if (!SkipBytes(optionalSize)) { return false; } } The important piece of information to figure out how to unpack the rest of the block is kept in the Flags field. If the lowest bit is set, it means that the blobs header will be compressed:\n1 2 3 4 5 6 // the rest of the block is a list of Event blobs // DWORD blobSize = 0; DWORD totalBlobSize = 0; DWORD remainingBlockSize = _blockSize - ebHeader.HeaderSize; bool isCompressed = ((ebHeader.Flags \u0026amp; 1) == 1); The rest of the code iterates on each blob:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 // Note: in order to gain space, some fields of the header could be \u0026#34;inherited\u0026#34; // from the header of the previous blob --\u0026gt; need to pass it from blob to blob EventBlobHeader header = {}; while (OnParseBlob(header, isCompressed, blobSize)) { totalBlobSize += blobSize; blobSize = 0; if (totalBlobSize \u0026gt;= remainingBlockSize - 1) // try to detect last blob { // don\u0026#39;t forget to check the end of block tag uint8_t tag; if (!ReadByte(tag) || (tag != NettraceTag::EndObject)) { std::cout \u0026lt;\u0026lt; \u0026#34;Missing end of block tag\\n\u0026#34;; return false; } return true; } } return true; } Here is the tricky part: to gain space, each blob starts with a header that could be “compressed”. The compression mechanism is simple: the first byte is a bitfield value that indicates which fields are present (i.e. their value should be read from the memory block) or skipped (i.e. their value is the same as the previous blob header). Therefore, an EventBlobHeader is passed by reference to the OnParseBlob function. My MetadataParser and EventParser implementations of OnParseBlob both starts with the same code to read the header:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 bool XXXParser::OnParseBlob(EventBlobHeader\u0026amp; header, bool isCompressed, DWORD\u0026amp; blobSize) { if (isCompressed) { if (!ReadCompressedHeader(header, blobSize)) { return false; } } else { if (!ReadUncompressedHeader(header, blobSize)) { return false; } } The implementation to read compressed and uncompressed version of the header is a direct translation of the TraceEvent C# code into C++.\nThe EventBlobHeader contains details of events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #pragma pack(1) struct EventBlobHeader_V4 { uint32_t EventSize; uint32_t MetadataId; uint32_t SequenceNumber; uint64_t ThreadId; uint64_t CaptureThreadId; uint32_t ProcessorNumber; uint32_t StackId; uint64_t Timestamp; GUID ActivityId; GUID RelatedActivityId; uint32_t PayloadSize; }; The “identity” of an event is given by the MetadataId field that refers to information defined in Metadata “object“ (for which MetadataId is 0). The SequenceNumber field is incremented on a per thread basis each time an event is emitted. This could be used to detect if some events have been dropped (for a given CaptureThreadId, two consecutive events have a SequenceNumber incremented by more than 1 — more on dropped events in the forthcoming SequencePoint “object” description). Its value is 0 for a metadata “object” The ThreadId and CaptureThreadId field have always the same value for Event “object”; CaptureThreadId is 0 for Metadata “object”. In case of Event “object”, the StackId field refers to one of the stacks extracted from a Stack “object”. Its value is 0 for Metadata “object”. The Metadata “object” As the documentation states, each MetadataBlock holds a set of metadata records. Each metadata record has an ID and it describes one type of event. Each event has a metadataId field which will indicate the ID of the metadata record which describes that event.\nThe resulting mapping is stored in EventPipeSession class:\n1 2 // per metadataID event metadata description std::unordered_map\u0026lt;uint32_t, EventCacheMetadata\u0026gt; _metadata; However, the rest of the documentation is partially right in the case of nettrace stream received through EventPipe: Metadata includes an event name, provider name, and the layout of fields that are encoded in the event’s payload section.\nFirst, the fields layout is simply not there. In addition, for some providers (dotnet runtime, private and rundown), the event names are empty strings. So, the data structure filled from the MetadataBlock will most of the time have an empty EventName field. Note that the “Microsoft-DotNETCore-EventPipe” provider (i.e. command events for that specific provider) and EventSource-derived classes written in C# provide the events name:\n1 2 3 4 5 6 7 8 9 10 11 class EventCacheMetadata { public: uint32_t MetadataId; std::wstring ProviderName; uint32_t EventId; std::wstring EventName; // empty most of the time uint64_t Keywords; uint32_t Version; uint32_t Level; }; In addition to the provider’s name serialized as a UTF16 string (including last ‘\\0’ wide character), the EventId field is the key used to identify an event.\nAfter these details, you will find a 4 bytes value corresponding to the number of fields in the event payload. As already mentioned, this value is always 0 so my code is skipping the rest of the metadata block payload.\nThe Event “object” And at last, here comes the time to parse Event “object” payload! The MetadataId field of the EventBlobHeader is used to find the provider’s name and event id:\n1 2 3 4 5 bool EventParser::OnParseBlob(EventBlobHeader\u0026amp; header, bool isCompressed, DWORD\u0026amp; blobSize) { ... auto\u0026amp; metadataDef = _metadata[header.MetadataId]; So, the rest of the function reads the payload based on the expected event id:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 switch (metadataDef.EventId) { case EventIDs::AllocationTick: if (!OnAllocationTick(header.PayloadSize, metadataDef)) { return false; } break; case ... break; case EventIDs::ExceptionThrown: if (!OnExceptionThrown(header.PayloadSize, metadataDef)) { return false; } break; default: // skip events we are not interested in { SkipBytes(header.PayloadSize); } } blobSize += header.PayloadSize; return true; } The format of each event payload is usually given by the Microsoft documentation. If not, you should look into the ClrEtwall.man file where the payload of ALL events are defined. For example, the AllocationTick event payload provides the name of the last allocated type to reach the 100 KB threshold (read this blog post for more details about how to use this event):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // AllocationAmount UInt32 The allocation size, in bytes. // This value is accurate for allocations that are less than the length of a ULONG(4,294,967,295 bytes). // If the allocation is greater, this field contains a truncated value. // Use AllocationAmount64 for very large allocations. // AllocationKind UInt32 0x0 - Small object allocation(allocation is in small object heap). // 0x1 - Large object allocation(allocation is in large object heap). // ClrInstanceID UInt16 Unique ID for the instance of CLR or CoreCLR. // AllocationAmount64 UInt64 The allocation size, in bytes.This value is accurate for very large allocations. // TypeId Pointer The address of the MethodTable.When there are several types of objects that were allocated during this event, // this is the address of the MethodTable that corresponds to the last object allocated (the object that caused the 100 KB threshold to be exceeded). // TypeName UnicodeString The name of the type that was allocated.When there are several types of objects that were allocated during this event, // this is the type of the last object allocated (the object that caused the 100 KB threshold to be exceeded). // HeapIndex UInt32 The heap where the object was allocated.This value is 0 (zero)when running with workstation garbage collection. // Address Pointer The address of the last allocated object. // Based on this fields definition, the EventParser::OnAllocationTick function is reading each field after the other thanks to the ReadWord, ReadDWord, ReadLong and ReadWString :\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 bool EventParser::OnAllocationTick(DWORD payloadSize, EventCacheMetadata\u0026amp; metadataDef) { DWORD readBytesCount = 0; DWORD size = 0; std::cout \u0026lt;\u0026lt; \u0026#34;\\nAllocation Tick:\\n\u0026#34;; // get common fields uint32_t dword = 0; if (!ReadDWord(dword)) { return false; } readBytesCount += sizeof(dword); std::cout \u0026lt;\u0026lt; \u0026#34; Amount = \u0026#34; \u0026lt;\u0026lt; dword \u0026lt;\u0026lt; \u0026#34; bytes\\n\u0026#34;; if (!ReadDWord(dword)) { return false; } readBytesCount += sizeof(dword); std::cout \u0026lt;\u0026lt; \u0026#34; Kind = \u0026#34; \u0026lt;\u0026lt; ((dword == 1) ? \u0026#34;LOH\u0026#34; : \u0026#34;small\u0026#34;) \u0026lt;\u0026lt; \u0026#34; bytes\\n\u0026#34;; uint16_t word = 0; if (!ReadWord(word)) { return false; } readBytesCount += sizeof(word); std::cout \u0026lt;\u0026lt; \u0026#34; CLR ID = \u0026#34; \u0026lt;\u0026lt; word \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; uint64_t ulong = 0; if (!ReadLong(ulong)) { return false; } readBytesCount += sizeof(ulong); std::cout \u0026lt;\u0026lt; \u0026#34; Amount64 = \u0026#34; \u0026lt;\u0026lt; ulong \u0026lt;\u0026lt; \u0026#34; bytes\\n\u0026#34;; The bitness of the monitored application is important when “pointers” need to be read from the payload: use ReadDWord for 32-bit and ReadLong for 64-bit:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 // skip useless MT address // Note: handle 32/64 bit difference if (_is64Bit) { if (!ReadLong(ulong)) { return false; } readBytesCount += sizeof(ulong); } else { if (!ReadDWord(dword)) { return false; } readBytesCount += sizeof(dword); } And if you don’t need the rest of the payload, SkipBytes is your friend:\n1 2 3 // skip the rest of the payload return SkipBytes(payloadSize - readBytesCount); } I had some issues when dealing with the ExceptionThrown event payload:\n1 2 3 4 5 6 7 8 9 10 11 // Type wstring Exception type // Message wstring Exception message // EIPCodeThrow win:Pointer Instruction pointer where exception occurred. // ExceptionHR win:UInt32 Exception HRESULT. // ExceptionFlags win:UInt16 // 0x01: HasInnerException (see CLR ETW Events in the Visual Basic documentation). // 0x02: IsNestedException. // 0x04: IsRethrownException. // 0x08: IsCorruptedStateException (indicates that the process state is corrupt). // 0x10: IsCLSCompliant (an exception that derives from Exception is CLS-compliant). // ClrInstanceID win:UInt16 Unique ID for the instance of CLR or CoreCLR. In case of an empty message, the field itself was not even there! Not even 0 for a ‘\\0’ wide character… In fact, there is a bug in the serialization code that skips the field in that case. This has been fixed in .NET 6 by storing “NULL” as the serialized string: I would have preferred ‘\\0’ but it seems to be compatible with the ETW implementation.\nTo support .NET Core 3+ and .NET 5, my code is comparing the size of the remaining of the payload after reading the exception type with the expected size of the 4 remaining fields after the exception message. If it is greater then it means that there is a string for the message. If not, I know that the message is empty:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 bool EventParser::OnExceptionThrown(DWORD payloadSize, EventCacheMetadata\u0026amp; metadataDef) { DWORD readBytesCount = 0; DWORD size = 0; // read exception type ... // Size of the ExceptionThrown payload AFTER the Message field uint16_t exceptionRemainingPayloadSize = (_is64Bit ? 8 : 4) + 4 + 2 + 2; // In case of \u0026#34;empty\u0026#34; message, it might not be even visible as \u0026#34;\\0\u0026#34; before .NET Core 6 (and after, will be \u0026#34;NULL\u0026#34;) // so it is needed to check if the remaining payload contains such a string if ((payloadSize - readBytesCount) == _exceptionRemainingPayloadSize) { std::wcout \u0026lt;\u0026lt; L\u0026#34; message = \u0026#39;\u0026#39;\\n\u0026#34;; } else { if (!ReadWString(strBuffer, size)) { return false; } readBytesCount += size; // handle empty string case (check for \u0026#34;NULL\u0026#34; in case of .NET 6+) if (strBuffer.empty() || (wcscmp(strBuffer.c_str(), L\u0026#34;NULL\u0026#34;) == 0)) std::wcout \u0026lt;\u0026lt; L\u0026#34; message = \u0026#39;\u0026#39;\\n\u0026#34;; else { std::wcout \u0026lt;\u0026lt; L\u0026#34; message = \u0026#34; \u0026lt;\u0026lt; strBuffer.c_str() \u0026lt;\u0026lt; L\u0026#34;\\n\u0026#34;; } } // skip the rest of the payload return SkipBytes(payloadSize - readBytesCount); } The SequencePointBlock “object” The last “object” type is the sequence point block that contains the following fields:\nIn addition to these fields, it also implicitly tells you that new stack “object” will be received (with stack id restarting from 1) to match next Event “objects”. For example, the following trace shows how a sequence point block resets the stacks by restarting at 1:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Event block (140 bytes) blob header: StackId = 3 Contention blob header: StackId = 4 Event = 81 blob header: StackId = 3 Contention ... ------------------------------------------------ ________________________________________________ SequencePoint block (217 bytes) ... ------------------------------------------------ ________________________________________________ Stack block (105 bytes) Stack block header: FirstID: 1 Count : 2 ------------------------------------------------ ________________________________________________ Event block (92 bytes) blob header: StackId = 1 Contention blob header: StackId = 2 Event = 81 blob header: StackId = 1 Contention: ------------------------------------------------ So, the stacks you might have cached based on the already received stack “objects” should now be invalidated like what I’m doing in SequencePointParser::OnParse:\n1 2 3 4 5 6 bool SequencePointParser::OnParse() { // reset stack caches _stacks32.clear(); _stacks64.clear(); ... You now have all the elements you need to listen to CLR events on Windows and Linux for .NET Core 3+ and .NET 5+. If you are still running applications with .NET Framework, you will need to use ETW but this is another story.\nResources Episode 1 — Digging into the CLR Diagnostics IPC Protocol in C# Episode 2 — .NET Diagnostic IPC protocol: the C++ way [Episode 3 ](/posts/2022-10-23_clr-events-go-for/ CLR events: go for the nettrace file format! [Episode 4 ](/posts/2022-11-27_parsing-the-nettrace-stream/ Parsing the “nettrace” steam [Episode 5 ](/posts/2023-01-15_reading-object-in-memory/ Reading “object” in memory — starting with stacks Source code for the C++ implementation of CLR events listener Diagnostics IPC protocol documentation ","cover":"https://chrisnas.github.io/posts/2023-03-10_from-metadata-to-event/1_8U7zPxOVCe2Bws5g5TkX_A.png","date":"2023-03-10","permalink":"https://chrisnas.github.io/posts/2023-03-10_from-metadata-to-event/","summary":"\u003chr\u003e\n\u003cp\u003eThe previous episodes started the parsing of the “nettrace” format used when \u003ca href=\"/posts/2022-09-18_net-diagnostic-ipc-protocol/\"\u003econtacting the .NET Diagnostics IPC server\u003c/a\u003e, \u003ca href=\"/posts/2022-10-23_clr-events-go-for/\"\u003einitiate the protocol to receive CLR events\u003c/a\u003e and start to \u003ca href=\"/posts/2023-01-15_reading-object-in-memory/\"\u003eparse stacks\u003c/a\u003e. This last episode covers the Metadata and Event blocks.\u003c/p\u003e\n\u003cp\u003eIn terms of format, both Metadata and Event blocks share the same memory layout:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2023-03-10_from-metadata-to-event/1_8U7zPxOVCe2Bws5g5TkX_A.png\"\u003e\u003c/p\u003e\n\u003cp\u003eThe common \u003cstrong\u003eEventBlockHeader\u003c/strong\u003e starts the block:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cdiv class=\"chroma\"\u003e\n\u003ctable class=\"lntable\"\u003e\u003ctr\u003e\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode\u003e\u003cspan class=\"lnt\"\u003e 1\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 2\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 3\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 4\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 5\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 6\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 7\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 8\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 9\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e10\n\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\n\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-cpp\" data-lang=\"cpp\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"cp\"\u003e#pragma pack(1)\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003estruct\u003c/span\u003e \u003cspan class=\"nc\"\u003eEventBlockHeader\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"kt\"\u003euint16_t\u003c/span\u003e \u003cspan class=\"n\"\u003eHeaderSize\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"kt\"\u003euint16_t\u003c/span\u003e \u003cspan class=\"n\"\u003eFlags\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"kt\"\u003euint64_t\u003c/span\u003e \u003cspan class=\"n\"\u003eMinTimestamp\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"kt\"\u003euint64_t\u003c/span\u003e \u003cspan class=\"n\"\u003eMaxTimestamp\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"c1\"\u003e// some optional reserved space might be following\n\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e};\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\u003cp\u003eThe timestamp fields give the time of the first and last event in the block. The \u003cstrong\u003eHeaderSize\u003c/strong\u003e fields is important because additional information can be stored in the header. Since I have no idea what could be stored there, I simply skip it:\u003c/p\u003e","title":"From Metadata to Event block in nettrace format"},{"content":" The previous episodes started the parsing of the “nettrace” format used when contacting the .NET Diagnostics IPC server and initiate the protocol to receive CLR events. It is now time to see how to get the payload of each “object” type, especially how stacks are stored.\nWe have seen that the stream starts with a TraceObject that describes the rest of the stream followed by a sequence of “object”:\nThe remaining of each “object” is a 32 bit block size followed by the payload.\nWell… not only. One thing I missed when I started to work on the nettrace format is the fact that all “object” payloads must be 4-bytes aligned on the beginning of the stream!\nThis is why I’m keeping track of the current position in the EventPipeSession class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 private: IIpcEndpoint* _pEndpoint; bool _stopRequested; // parsers MetadataParser _metadataParser; EventParser _eventParser; StackParser _stackParser; SequencePointParser _sequencePointParser; // Keep track of the position since the beginning of the \u0026#34;file\u0026#34; // i.e. starting at 0 from the first character of the NettraceHeader // Nettrace uint64_t _position; ... }; So each ParseXXXBlock function checks the minimum reader version in the header before reading the “object” payload as a memory block. The idea is being able to support backward compatibility:\n1 2 3 4 5 6 7 8 9 10 11 12 13 bool EventPipeSession::ParseMetadataBlock(ObjectHeader\u0026amp; header) { if (header.MinReaderVersion != 2) return false; uint32_t blockSize = 0; // read the block and send it to the corresponding parser uint64_t blockOriginInFile = 0; if (!ExtractBlock(\u0026#34;Metadata\u0026#34;, blockSize, blockOriginInFile)) return false; return _metadataParser.Parse(_pBlock, blockSize, blockOriginInFile); } The ExtractBlock function reads the size of the payload (and skips the padding if any) with ReadBlockSize:\n1 2 3 4 5 6 7 8 9 bool EventPipeSession::ExtractBlock(const char* blockName, uint32_t\u0026amp; blockSize, uint64_t\u0026amp; blockOriginInFile) { // get the block size if (!ReadBlockSize(blockName, blockSize)) return false; // skip the block + final EndOfObject tag blockSize++; ... The block name is only used for error messages if needed.\nThe next step is to read the payload in a memory block using these two EventPipeSession fields:\n1 2 3 4 5 ... // buffer used to read each block that will be then parsed uint8_t* _pBlock; uint32_t _blockSize; ... In the session constructor, _blockSize is set to 4 KB and _pBlock points to an allocated memory buffer of that size.\nThe rest of ExtractBlock deals with payload size: if the current payload to parse is larger than _blockSize, then these fields are updated up to a maximum of 100 KB (i.e. max block size sent by the CLR).\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // check if it is needed to resize the block buffer if (_blockSize \u0026lt; blockSize) { // don\u0026#39;t expect blocks larger than 100KB if (blockSize \u0026gt; MAX_BLOCK_SIZE) return false; delete [] _pBlock; _pBlock = new uint8_t[blockSize]; ::ZeroMemory(_pBlock, blockSize); _blockSize = blockSize; } // keep track of the current position in file for padding blockOriginInFile = _position; if (!Read(_pBlock, blockSize)) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while extracting \u0026#34; \u0026lt;\u0026lt; blockName \u0026lt;\u0026lt; \u0026#34; block: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error \u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } std::cout \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34; \u0026lt;\u0026lt; blockName \u0026lt;\u0026lt; \u0026#34; block (\u0026#34; \u0026lt;\u0026lt; blockSize \u0026lt;\u0026lt; \u0026#34; bytes)\\n\u0026#34;; DumpBuffer(_pBlock, blockSize); return true; } For debugging sake, I’m displaying each “object” payload\nthanks to the DumpBuffer helper.\nTo ease the memory access to the memory block content, my BlockParser will be used as a base class for each dedicated parsers:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 class BlockParser { public: BlockParser(); bool Parse(uint8_t* pBlock, uint32_t bytesCount, uint64_t blockOriginInFile); void SetPointerSize(uint8_t pointerSize); public: uint8_t PointerSize; protected: virtual bool OnParse() = 0; // Access helpers bool Read(LPVOID buffer, DWORD bufferSize); bool ReadByte(uint8_t\u0026amp; byte); bool ReadWord(uint16_t\u0026amp; word); bool ReadDWord(uint32_t\u0026amp; dword); bool ReadLong(uint64_t\u0026amp; ulong); bool ReadDouble(double\u0026amp; d); bool ReadVarUInt32(uint32_t\u0026amp; val, DWORD\u0026amp; size); bool ReadVarUInt64(uint64_t\u0026amp; val, DWORD\u0026amp; size); bool ReadWString(std::wstring\u0026amp; wstring, DWORD\u0026amp; bytesRead); bool SkipBytes(uint32_t byteCount); // shared fields protected: bool _is64Bit; uint32_t _blockSize; uint32_t _pos; private: uint8_t* _pBlock; uint64_t _blockOriginInFile; }; The Parse function accepts the memory buffer containing an “object” payload, its size and its position since the beginning of the stream. The derived class will have to implement the OnParse function using the ReadXXX helpers.\nThe two ReadVarUintXXX functions are different from the other direct read helpers because they deal with some simple compression mechanisms used by the serialization of 32-bit and 64-bit numbers.\nIn the different types of “object” payloads, the strings are serialized as UTF16 strings ending with a “\\0” wide character. Here is the implementation of the helper function used to read a std::wstring from a memory block:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 bool BlockParser::ReadWString(std::wstring\u0026amp; wstring, DWORD\u0026amp; bytesRead) { uint16_t character; bytesRead = 0; // in case of empty string while (true) { if (!ReadWord(character)) { return false; } // protect against invalid UNICODE character (due to missing fields in ExceptionThrown event) if (character \u0026gt; 256) { // rewind the character _pos = _pos - sizeof(character); // this is only covering a missing string return (bytesRead == 0); } bytesRead += sizeof(character); // Note that an empty string contains only that \\0 character if (character == 0) // \\0 final character of the string return true; wstring.push_back(character); } } Note the check for character content in the loop: this is due to a serialization issue I will discuss later when the event “object” block will be detailed.\nThe Stack “object” If you remember my previous post about retrieving call stacks for CLR events with TraceEvent, you might be wondering why there is a specific stack object since a ClrStackWalk event should contain the frames if the Stack keyword is enabled for the .NET provider. In fact, the current TraceEvent implementation is not using the stack object sent by the CLR (maybe to have the same code between ETW and EventPipe).\nOne stack “object” received in a nettrace stream contains one or more stacks. Each stack is identified by an id (more about this soon) and contains a list of instruction pointer addresses.\nIn the previous screenshot, the id of the first stack is 1 and the second is 2. In the next stack “object”, the FirstId field will be 3 and so on. This avoids storing the id in each call stack and saves space.\nNote that even if this does not seem to make any sense, it might happen that the addresses list is empty.\nThese call stacks are stored in EventPipeSession as a per id cache:\n1 2 3 4 // per stackID stack // only one will be used depending on the bitness of the monitored application std::unordered_map\u0026lt;uint32_t, EventCacheStack32\u0026gt; _stacks32; std::unordered_map\u0026lt;uint32_t, EventCacheStack64\u0026gt; _stacks64; The frames are stored as addresses in a vector:\n1 2 3 4 5 6 class EventCacheStack32 { public: uint32_t Id; std::vector\u0026lt;uint32_t\u0026gt; Frames; }; The CLR is sending one stack per unique callstack (i.e. at least one frame is different). As you will soon see, each event “object” contains a stack id corresponding to the chain of code from which it is sent.\nThe next episode will detail the Metadata and Event blocks to end the series.\nResources Episode 1 — Digging into the CLR Diagnostics IPC Protocol in C# Episode 2 — .NET Diagnostic IPC protocol: the C++ way [Episode 3 ](/posts/2022-10-23_clr-events-go-for/ CLR events: go for the nettrace file format! [Episode 4 ](/posts/2022-11-27_parsing-the-nettrace-stream/ Parsing the “nettrace” steam Source code for the C++ implementation of CLR events listener Diagnostics IPC protocol documentation ","cover":"https://chrisnas.github.io/posts/2023-01-15_reading-object-in-memory/1_H_HbR0-xWzR3SWV2KEimpQ.png","date":"2023-01-15","permalink":"https://chrisnas.github.io/posts/2023-01-15_reading-object-in-memory/","summary":"\u003chr\u003e\n\u003cp\u003eThe previous episodes started the parsing of the “nettrace” format used when \u003ca href=\"/posts/2022-09-18_net-diagnostic-ipc-protocol/\"\u003econtacting the .NET Diagnostics IPC server\u003c/a\u003e and \u003ca href=\"/posts/2022-10-23_clr-events-go-for/\"\u003einitiate the protocol to receive CLR events\u003c/a\u003e. It is now time to see how to get the payload of each “object” type, especially how stacks are stored.\u003c/p\u003e\n\u003cp\u003eWe have seen that the stream starts with a \u003cstrong\u003eTraceObject\u003c/strong\u003e that describes the rest of the stream followed by a sequence of “object”:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2023-01-15_reading-object-in-memory/1_E9Rq89JSc_OIfW9ooEfm1A.png\"\u003e\u003c/p\u003e\n\u003cp\u003eThe remaining of each “object” is a 32 bit block size followed by the payload.\u003c/p\u003e","title":"Reading “object” in memory — starting with stacks"},{"content":" The previous episodes explained how to contact the .NET Diagnostics IPC server and initiate the protocol to receive CLR events. It is now time to dig into the “nettrace” stream format!\nAs the IPC command documentation states, the response to the CollectTracing command is followed by an Optional Continuation of a nettrace format stream of events. In fact, before .NET Core 3, the netperf format was used but I will focus on the nettrace format also used in .NET 5+.\nFrom a high-level view, it is a header followed by a stream of “objects”; each described by a header and ending with a byte with 6 as value:\nLet’s start with the nettrace header:\n1 2 3 4 5 6 7 8 9 10 #pragma pack(1) struct NettraceHeader { uint8_t Magic[8]; // \u0026#34;Nettrace\u0026#34; with not \u0026#39;\\0\u0026#39; uint32_t FastSerializationLen; // 20 uint8_t FastSerialization[20]; // \u0026#34;!FastSerialization.1\u0026#34; with not \u0026#39;\\0\u0026#39; }; const char* NettraceHeaderMagic = \u0026#34;Nettrace\u0026#34;; const char* FastSerializationMagic = \u0026#34;!FastSerialization.1\u0026#34;; It can be used to check the format and version of the received data (in case of format evolution over time):\n1 2 3 4 5 6 7 8 9 10 11 12 13 bool CheckNettraceHeader(NettraceHeader\u0026amp; header) { if (!IsSameAsString(header.Magic, sizeof(header.Magic), NettraceHeaderMagic)) return false; if (header.FastSerializationLen != strlen(FastSerializationMagic)) return false; if (!IsSameAsString(header.FastSerialization, sizeof(header.FastSerialization), FastSerializationMagic)) return false; return true; }; In memory, the “strings” are stored as an array of UTF8 characters without trailing ‘\\0’\nThis drives the implementation of the comparison helper:\n1 2 3 4 bool IsSameAsString(uint8_t* bytes, uint16_t length, const char* characters) { return memcmp(bytes, characters, length) == 0; } Everything is an “object” After the header, data is represented as “objects” whose description is stored in an ObjectHeader:\n1 2 3 4 5 6 7 8 9 10 #pragma pack(1) struct ObjectHeader { NettraceTag TagTraceObject; // 5 NettraceTag TagTypeObjectForTrace; // 5 NettraceTag TagType; // 1 uint32_t Version; // uint32_t MinReaderVersion; // uint32_t NameLength; // length of UTF8 name that follows }; followed by the name of the object type in UTF8. Note that, like for “!FastSerialization.1” in NettraceHeader, its length is provided in the NameLength field of the ObjectHeader. For example, here is how the initial TraceObject header is stored in memory:\nand its equivalent in code:\n1 2 3 4 5 6 7 8 9 10 11 12 #pragma pack(1) struct TraceObjectHeader : ObjectHeader { //NettraceTag TagTraceObject; // 5 //NettraceTag TagTypeObjectForTrace; // 5 //NettraceTag TagType; // 1 //uint32_t Version; // 4 //uint32_t MinReaderVersion; // 4 //uint32_t NameLength; // 5 uint8_t Name[5]; // \u0026#39;Trace NettraceTag TagEndTraceObject; // 6 }; Note that after the “object” type name, an “end of object” byte (i.e. = 6) appears before the payload.\nSo, each kind of “object” shares the same ObjectHeader followed by a UTF8 type name:\n“EventBlock” : contains one or more events “MetadataBlock” : contains partial description of events (no name nor payload fields) “StackBlock” : contains call stacks (i.e. arrays of instruction pointers) “SPBlock” : contains check point inside the stream inside the stream (used for drop message detection and callstack cache invalidation) It means that you need to compare the strings to figure out the “object” type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 const char* EventBlockName = \u0026#34;EventBlock\u0026#34;; const char* MetadataBlockName = \u0026#34;MetadataBlock\u0026#34;; const char* StackBlockName = \u0026#34;StackBlock\u0026#34;; const char* SequencePointBlockName = \u0026#34;SPBlock\u0026#34;; ObjectType EventPipeSession::GetObjectType(ObjectHeader\u0026amp; header) { // check validity if (header.TagTraceObject != NettraceTag::BeginPrivateObject) return ObjectType::Unknown; if (header.TagTypeObjectForTrace != NettraceTag::BeginPrivateObject) return ObjectType::Unknown; if (header.TagType != NettraceTag::NullReference) return ObjectType::Unknown; // figure out which type it is based on the name: // EventBlock -\u0026gt; \u0026#34;EventBlock\u0026#34; (size = 10) // MetadataBlock -\u0026gt; \u0026#34;MetadataBlock\u0026#34; (size = 13) // StackBlock -\u0026gt; \u0026#34;StackBlock\u0026#34; (size = 10) // SequencePointBlock -\u0026gt; \u0026#34;SPBlock\u0026#34; (size = 7) if (header.NameLength == 13) { uint8_t buffer[13]; if (!Read(buffer, 13)) return ObjectType::Unknown; if (IsSameAsString(buffer, 13, MetadataBlockName)) return ObjectType::MetadataBlock; return ObjectType::Unknown; } else if (header.NameLength == 10) { uint8_t buffer[10]; if (!Read(buffer, 10)) return ObjectType::Unknown; if (IsSameAsString(buffer, 10, EventBlockName)) return ObjectType::EventBlock; else if (IsSameAsString(buffer, 10, StackBlockName)) return ObjectType::StackBlock; return ObjectType::Unknown; } else if (header.NameLength == 7) { uint8_t buffer[7]; if (!Read(buffer, 7)) return ObjectType::Unknown; if (IsSameAsString(buffer, 7, SequencePointBlockName)) return ObjectType::SequencePointBlock; return ObjectType::Unknown; } return ObjectType::Unknown; } The first object that appears in the stream contains details about the whole stream:\nAfter the header, some fields follow as a payload:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #pragma pack(1) struct ObjectFields { uint16_t Year; uint16_t Month; uint16_t DayOfWeek; uint16_t Day; uint16_t Hour; uint16_t Minute; uint16_t Second; uint16_t Millisecond; uint64_t SyncTimeQPC; uint64_t QPCFrequency; uint32_t PointerSize; uint32_t ProcessId; uint32_t NumProcessors; uint32_t ExpectedCPUSamplingRate; }; Beyond the timestamp information and the pointer size (required when call stack instruction pointers will be read as 8 (64-bit) or 4 (32-bit) bytes addresses) the other fields are not really interesting.\nLet’s go back to a higher-level view Here is the code to listen to the nettrace stream:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 bool EventPipeSession::Listen() { if (!ReadHeader()) return false; if (!ReadTraceObjectHeader()) return false; ObjectFields ofTrace; if (!ReadObjectFields(ofTrace)) return false; // use the \u0026#34;trace object\u0026#34; fields to figure out the bitness of the application Is64Bit = ofTrace.PointerSize == 8; _stackParser.SetPointerSize(ofTrace.PointerSize); _metadataParser.SetPointerSize(ofTrace.PointerSize); _eventParser.SetPointerSize(ofTrace.PointerSize); // don\u0026#39;t forget to check the end object tag uint8_t tag; if (!ReadByte(tag) || (tag != NettraceTag::EndObject)) return false; // read one \u0026#34;object\u0026#34; after the other while (ReadNextObject()) { std::cout \u0026lt;\u0026lt; \u0026#34;------------------------------------------------\\n\u0026#34;; std::cout \u0026lt;\u0026lt; \u0026#34;\\n________________________________________________\\n\u0026#34;; } return _stopRequested; } I will come back to the different _XXXparser fields soon.\nThe ReadNextObject helper is responsible for reading the expected ObjectHeader and the string that follows to figure out what is the type of this “object” and what payload to expect:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 bool EventPipeSession::ReadNextObject() { // get the type of object from the header ObjectHeader header; if (!Read(\u0026amp;header, sizeof(ObjectHeader))) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while reading Object header: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error \u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } ObjectType ot = GetObjectType(header); if (ot == ObjectType::Unknown) { std::cout \u0026lt;\u0026lt; \u0026#34;Invalid object header type:\\n\u0026#34;; DumpObjectHeader(header); return false; } // don\u0026#39;t forget to check the end object tag uint8_t tag; if (!ReadByte(tag) || (tag != NettraceTag::EndObject)) { std::cout \u0026lt;\u0026lt; \u0026#34;Missing end of object tag: \u0026#34; \u0026lt;\u0026lt; (uint8_t)tag \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } ... The GetObjectType function checks the header validity and extracts the “object” type name:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 ObjectType EventPipeSession::GetObjectType(ObjectHeader\u0026amp; header) { // check validity if (header.TagTraceObject != NettraceTag::BeginPrivateObject) return ObjectType::Unknown; if (header.TagTypeObjectForTrace != NettraceTag::BeginPrivateObject) return ObjectType::Unknown; if (header.TagType != NettraceTag::NullReference) return ObjectType::Unknown; // figure out which type it is based on the name: // EventBlock -\u0026gt; \u0026#34;EventBlock\u0026#34; (size = 10) // MetadataBlock -\u0026gt; \u0026#34;MetadataBlock\u0026#34; (size = 13) // StackBlock -\u0026gt; \u0026#34;StackBlock\u0026#34; (size = 10) // SequencePointBlock -\u0026gt; \u0026#34;SPBlock\u0026#34; (size = 7) if (header.NameLength == 13) { uint8_t buffer[13]; if (!Read(buffer, 13)) return ObjectType::Unknown; if (IsSameAsString(buffer, 13, MetadataBlockName)) return ObjectType::MetadataBlock; return ObjectType::Unknown; } else if (header.NameLength == 10) { uint8_t buffer[10]; if (!Read(buffer, 10)) return ObjectType::Unknown; if (IsSameAsString(buffer, 10, EventBlockName)) return ObjectType::EventBlock; else if (IsSameAsString(buffer, 10, StackBlockName)) return ObjectType::StackBlock; return ObjectType::Unknown; } else if (header.NameLength == 7) { uint8_t buffer[7]; if (!Read(buffer, 7)) return ObjectType::Unknown; if (IsSameAsString(buffer, 7, SequencePointBlockName)) return ObjectType::SequencePointBlock; return ObjectType::Unknown; } return ObjectType::Unknown; } The same IsSameAsString helper is used to compare the read “object” name with the known types.\nThe end of ReadNextObject simply parses the “object” payload as a memory block based on its type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... switch (ot) { case ObjectType::EventBlock: return ParseEventBlock(header); case ObjectType::MetadataBlock: return ParseMetadataBlock(header); case ObjectType::StackBlock: return ParseStackBlock(header); case ObjectType::SequencePointBlock: return ParseSequencePointBlock(header); default: return false; } } The next step will be to look into each “object” type payload.\nResources Episode 1 — Digging into the CLR Diagnostics IPC Protocol in C# Episode 2 — .NET Diagnostic IPC protocol: the C++ way [Episode 3 ](/posts/2022-10-23_clr-events-go-for/ CLR events: go for the nettrace file format! Source code for the C++ implementation of CLR events listener Diagnostics IPC protocol documentation ","cover":"https://chrisnas.github.io/posts/2022-11-27_parsing-the-nettrace-stream/1_wgeRsZ8QOX-IeBluAI6nmg.png","date":"2022-11-27","permalink":"https://chrisnas.github.io/posts/2022-11-27_parsing-the-nettrace-stream/","summary":"\u003chr\u003e\n\u003cp\u003eThe previous episodes explained how to \u003ca href=\"/posts/2022-09-18_net-diagnostic-ipc-protocol/\"\u003econtact the .NET Diagnostics IPC server\u003c/a\u003e and \u003ca href=\"/posts/2022-10-23_clr-events-go-for/\"\u003einitiate the protocol to receive CLR events\u003c/a\u003e. It is now time to dig into the “nettrace” stream format!\u003c/p\u003e\n\u003cp\u003eAs the \u003ca href=\"https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md\"\u003eIPC command documentation\u003c/a\u003e states, the response to the \u003cstrong\u003eCollectTracing\u003c/strong\u003e command is \u003cem\u003efollowed by an Optional Continuation of a nettrace format stream of events\u003c/em\u003e. In fact, before .NET Core 3, the \u003ca href=\"https://github.com/microsoft/perfview/blob/main/src/TraceEvent/EventPipe/NetPerfFormat.md\"\u003enetperf format\u003c/a\u003e was used but I will focus on the nettrace format also used in .NET 5+.\u003c/p\u003e","title":"Parsing the “nettrace” stream of (not only) events"},{"content":" As shown in the previous post, the processing of ProcessInfo diagnostic commands is easy because you send a request and read the different fields from the response. This is different if you want to receive events from the CLR via EventPipe. In C#, the TraceEvent nuget package wraps everything under a nice event handler based model as shown in many of my previous posts.\nBehind the scene, a StartSession command is sent (more details about the parameters later) and the response contains the numeric ID of the session. Then, the events will be read from the IPC channel as a binary stream of data with the “nettrace“ file format. The collection ends when the StopTracing command is sent.\nThe source code is available from my github repository.\nHidding the transport layer: IIpcEndoint Unlike the previous post, to send the command and read the response back from the CLR , I’m wrapping the transport layer with the IIpcEndpoint interface:\n1 2 3 4 5 6 7 8 9 10 11 12 13 class IIpcEndpoint { public: virtual bool Write(LPCVOID buffer, DWORD bufferSize, DWORD* writtenBytes) = 0; virtual bool Read(LPVOID buffer, DWORD bufferSize, DWORD* readBytes) = 0; virtual bool ReadByte(uint8_t\u0026amp; byte) = 0; virtual bool ReadWord(uint16_t\u0026amp; word) = 0; virtual bool ReadDWord(uint32_t\u0026amp; dword) = 0; virtual bool ReadLong(uint64_t\u0026amp; ulong) = 0; virtual bool Close() = 0; virtual ~IIpcEndpoint() = default; }; It abstracts the write and read accesses to the underlying transport layer. In addition, the base class accepts a “recorder” that allows me to store what is received from the CLR into any kind of storage (today only a file-based recorder that helped a lot to reproduce specific situations without the need to have a running process to connect to):\nThe PidEndpoint class accepts the process id of the running .NET application to monitor its CLR events and an optional recorder implementing the IIpcRecorder interface. The Create static factory implementation creates the expected named pipe on Windows (or the domain socket on Linux) and stores the handle into its _handle field:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 PidEndpoint* PidEndpoint::CreateForWindows(int pid, IIpcRecorder* pRecorder) { PidEndpoint* pEndpoint = new PidEndpoint(pRecorder); // build the pipe name as described in the protocol wchar_t pszPipeName[256]; int nCharactersWritten = -1; nCharactersWritten = wsprintf( pszPipeName, L\u0026#34;\\\\\\\\.\\\\pipe\\\\dotnet-diagnostic-%d\u0026#34;, pid ); // check that CLR has created the diagnostics named pipe if (!::WaitNamedPipe(pszPipeName, 200)) { auto error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Diagnostics named pipe is not available for process #\u0026#34; \u0026lt;\u0026lt; pid \u0026lt;\u0026lt; \u0026#34; (\u0026#34; \u0026lt;\u0026lt; error \u0026lt;\u0026lt; \u0026#34;)\u0026#34; \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return nullptr; } // connect to the named pipe HANDLE hPipe; hPipe = ::CreateFile( pszPipeName, // pipe name GENERIC_READ | // read and write access GENERIC_WRITE, 0, // no sharing NULL, // default security attributes OPEN_EXISTING, // opens existing pipe 0, // default attributes NULL); // no template file if (hPipe == INVALID_HANDLE_VALUE) { std::cout \u0026lt;\u0026lt; \u0026#34;Impossible to connect to \u0026#34; \u0026lt;\u0026lt; pszPipeName \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return nullptr; } pEndpoint-\u0026gt;_handle = hPipe; return pEndpoint; } The next step is to open a tracing session by sending the StartSession command.\nThe Trace diagnostic commands Following the same object model provided by the Microsoft.Diagnostics.NETCore.Client nuget, my DiagnosticsClient class hides the transport layer. It also exposes high level functions such as OpenEventPipeSession to initiate a trace event session with the CLR:\n1 EventPipeSession* OpenEventPipeSession(uint64_t keywords, EventVerbosityLevel verbosity); If you remember from TraceEvent, you need a few parameters to create a session:\nsize of circular buffers used by the CLR to cache events (same as Perfview, use 16 MB as default) netttrace format (i.e. value of 1) if rundown events are needed a list of providers (“Microsoft-Windows-DotNETRuntime” for the CLR in my case) keywords verbosity level possible arguments (none here) Here is the corresponding C++ description of the command type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 const uint8_t DotnetProviderMagicLength = 32; struct MagicProvider { wchar_t Magic[DotnetProviderMagicLength]; }; // 32 wchar_t (including \\0) const MagicProvider DotnetProviderMagic = { L\u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34; }; const uint32_t CircularBufferMBSize = 16; const uint32_t NetTraceFormat = 1; #pragma pack(1) struct StartSessionMessage : public IpcHeader { uint32_t CircularBufferMB; // 16 MB uint32_t Format; // 1 for NetTrace format uint8_t RequestRundown; // 0 because don\u0026#39;t want rundown // array of provider configuration uint32_t ProviderCount; // 1 only: Microsoft-Windows-DotNETRuntime uint64_t Keywords; // from EventKeyword uint32_t Verbosity; // from EventPipeEventLevel uint32_t ProviderStringLen; // number of UTF16 characters = 32 (including last \\0) union // dotnet provider name { MagicProvider _magic; uint8_t Provider[2 * DotnetProviderMagicLength]; }; uint32_t Arguments; // 0 for empty string (no argument) }; The code to fill up the command is straightforward:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 StartSessionMessage* CreateStartSessionMessage(uint64_t keywords, EventVerbosityLevel verbosity) { auto message = new StartSessionMessage(); ::ZeroMemory(message, sizeof(message)); memcpy(message-\u0026gt;Magic, \u0026amp;DotnetIpcMagic_V1, sizeof(message-\u0026gt;Magic)); message-\u0026gt;Size = sizeof(StartSessionMessage); message-\u0026gt;CommandSet = (uint8_t)DiagnosticServerCommandSet::EventPipe; message-\u0026gt;CommandId = (uint8_t)EventPipeCommandId::CollectTracing2; message-\u0026gt;Reserved = 0; message-\u0026gt;CircularBufferMB = CircularBufferMBSize; message-\u0026gt;Format = NetTraceFormat; message-\u0026gt;RequestRundown = 0; message-\u0026gt;ProviderCount = 1; message-\u0026gt;Keywords = (uint64_t)keywords; message-\u0026gt;Verbosity = (uint32_t)verbosity; message-\u0026gt;ProviderStringLen = DotnetProviderMagicLength; memcpy(message-\u0026gt;Provider, \u0026amp;DotnetProviderMagic, sizeof(message-\u0026gt;Provider)); message-\u0026gt;Arguments = 0; return message; } The provider list is defined with the ProviderCount field and string containing the list (only one here) follows the Verbosity field. To start the session, it is needed to send the StartSession message and read the session id from the response:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 bool EventPipeStartRequest::Process(IIpcEndpoint* pEndpoint, uint64_t keywords, EventVerbosityLevel verbosity) { // send an StartSessionMessage and parse the response StartSessionMessage* pMessage = CreateStartSessionMessage(keywords, verbosity); DWORD writtenBytes = 0; if (!pEndpoint-\u0026gt;Write(pMessage, pMessage-\u0026gt;Size, \u0026amp;writtenBytes)) { return false; } // analyze the response IpcHeader response = {}; DWORD bytesReadCount = 0; if (!pEndpoint-\u0026gt;Read(\u0026amp;response, sizeof(response), \u0026amp;bytesReadCount)) { return false; } if (response.CommandId != (uint8_t)DiagnosticServerResponseId::OK) { return false; } // get the session ID from the payload uint16_t payloadSize = response.Size - sizeof(response); if (payloadSize \u0026lt; sizeof(uint64_t)) { return false; } if (!pEndpoint-\u0026gt;ReadLong(SessionId)) { return false; } return true; } Once the StartSession command has been sent, the events corresponding to the given provider/keywords/verbosity (here the CLR runtime/gc+exception+contention/verbose)\n1 2 3 4 5 6 auto pSession = pClient-\u0026gt;OpenEventPipeSession( EventKeyword::gc | EventKeyword::exception | EventKeyword::contention, EventVerbosityLevel::Verbose // required for AllocationTick ); will be read from the event pipe. Since this action will be synchronous, it is recommended to dedicate a thread to read and process the events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 if (pSession != nullptr) { DWORD tid = 0; auto hThread = ::CreateThread(nullptr, 0, ListenToEvents, pSession, 0, \u0026amp;tid); std::cout \u0026lt;\u0026lt; \u0026#34;Press ENTER to stop listening to events...\\n\\n\u0026#34;; std::string line; std::getline(std::cin, line); std::cout \u0026lt;\u0026lt; \u0026#34;Stopping session\\n\\n\u0026#34;; pSession-\u0026gt;Stop(); std::cout \u0026lt;\u0026lt; \u0026#34;Session stopped\\n\\n\u0026#34;; // test if it works ::Sleep(1000); ::CloseHandle(hThread); } The ListenToEvents callback executed by the new thread is “simply” listening to the event pipe of the session:\n1 2 3 4 5 6 7 8 DWORD WINAPI ListenToEvents(void* pParam) { EventPipeSession* pSession = static_cast\u0026lt;EventPipeSession*\u0026gt;(pParam); pSession-\u0026gt;Listen(); return 0; } Before describing how to read the events, it is important to understand how to stop the flow. First, inside the EventPipeSession, the internal loop that reads events needs to exit thanks to the _stopRequested boolean:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 bool EventPipeSession::Stop() { _stopRequested = true; if (_pid == -1) return true; // it is neeeded to use a different ipc connection to stop the Session DiagnosticsClient* pStopClient = DiagnosticsClient::Create(_pid, nullptr); pStopClient-\u0026gt;StopEventPipeSession(SessionId); delete pStopClient; return true; } In addition, a message with StopTracing command id from the EventPipe command set needs to be sent to tell the CLR to stop sending the events. This message must be sent through a different IPC channel (hence the pStopClient variable used in the previous code. The StopEventPipeSession helper function uses the EventPipeStopRequest wrapper:\n1 2 3 4 5 bool DiagnosticsClient::StopEventPipeSession(uint64_t sessionId) { EventPipeStopRequest request; return request.Process(_pEndpoint, sessionId); } The StopSession command accepts the session ID as single parameter:\n1 2 3 4 5 #pragma pack(1) struct StopSessionMessage : public IpcHeader { uint64_t SessionId; }; The processing of the stop request is to create such a message and send it through the IPC channel:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 StopSessionMessage* CreateStopMessage(uint64_t sessionId) { StopSessionMessage* message = new StopSessionMessage(); ::ZeroMemory(message, sizeof(message)); memcpy(message-\u0026gt;Magic, \u0026amp;DotnetIpcMagic_V1, sizeof(message-\u0026gt;Magic)); message-\u0026gt;Size = sizeof(StopSessionMessage); message-\u0026gt;CommandSet = (uint8_t)DiagnosticServerCommandSet::EventPipe; message-\u0026gt;CommandId = (uint8_t)EventPipeCommandId::StopTracing; message-\u0026gt;Reserved = 0; message-\u0026gt;SessionId = sessionId; return message; } bool EventPipeStopRequest::Process(IIpcEndpoint* pEndpoint, uint64_t sessionId) { StopSessionMessage* pMessage = CreateStopMessage(sessionId); DWORD writtenBytes; if (!pEndpoint-\u0026gt;Write(pMessage, pMessage-\u0026gt;Size, \u0026amp;writtenBytes)) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while sending EventPipe Stop message to the CLR: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error \u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; delete pMessage; return false; } delete pMessage; ... // handle the response return true; } When the stop command is received by the CLR, the remaining “data” (more on this in the next episode) is sent through the first IPC channel before being closed. This is how the code knows that the session can stop listening to the EventPipe.\nThe next episode will start to parse the nettrace stream of events.\nResources Episode 1 — Digging into the CLR Diagnostics IPC Protocol in C# Episode 2 — .NET Diagnostic IPC protocol: the C++ way Source code for the C++ implementation of CLR events listener Diagnostics IPC protocol documentation Microsoft.Diagnostics.NETCore.Client source code ","cover":"https://chrisnas.github.io/posts/2022-10-23_clr-events-go-for/1_ugj8AdZBJZv4qyi-VfTeog.png","date":"2022-10-23","permalink":"https://chrisnas.github.io/posts/2022-10-23_clr-events-go-for/","summary":"\u003chr\u003e\n\u003cp\u003eAs shown in the \u003ca href=\"/posts/2022-09-18_net-diagnostic-ipc-protocol/\"\u003eprevious post\u003c/a\u003e, the processing of \u003cstrong\u003eProcessInfo\u003c/strong\u003e diagnostic commands is easy because you send a request and read the different fields from the response. This is different if you want to receive events from the CLR via EventPipe. In C#, the \u003ca href=\"https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent/\"\u003eTraceEvent nuget package\u003c/a\u003e wraps everything under a nice event handler based model as shown in many of my \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eprevious posts\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eBehind the scene, a \u003cstrong\u003eStartSession\u003c/strong\u003e command is sent (more details about the parameters later) and the response contains the numeric ID of the session. Then, the events will be read from the IPC channel as a binary stream of data with the \u003ca href=\"https://github.com/microsoft/perfview/blob/main/src/TraceEvent/EventPipe/EventPipeFormat.md\"\u003e“nettrace“ file format\u003c/a\u003e. The collection ends when the \u003cstrong\u003eStopTracing\u003c/strong\u003e command is sent.\u003c/p\u003e","title":"CLR events: go for the nettrace file format!"},{"content":" The previous post was describing the C# helpers to communicate with the diagnostic server in the CLR of a running .NET application.\nIf, like me, you must write native code (i.e not in C#), you will need to implement the transport and protocol yourself. And, as you will see, it is not that complicated thanks to the documentation but also by using the available C# code of the Microsoft.Diagnostics.NETCore.Client implementation as a guide.\nEventPipe transport layer The first step is to connect to the CLR of a running .NET process. On Linux, you connect to a domain socket named “{$TMPDIR}/dotnet-diagnostic-{%d:PID}-{%llu:disambiguation key}-socket”. For Windows, a named pipe called \\.\\pipe\\dotnet-diagnostic-{%d:PID} needs to be accessed.\nHere is the Windows implementation code to connect to the IPC named pipe:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 int BasicConnection(DWORD pid) { wchar_t pszPipeName[256]; // build the pipe name as described in the protocol int nCharactersWritten = -1; nCharactersWritten = wsprintf( pszPipeName, L\u0026#34;\\\\\\\\.\\\\pipe\\\\dotnet-diagnostic-%d\u0026#34;, pid ); // check that CLR has created the diagnostics named pipe if (!::WaitNamedPipe(pszPipeName, 200)) { auto error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Diagnostics named pipe is not available for process #\u0026#34; \u0026lt;\u0026lt; pid \u0026lt;\u0026lt; \u0026#34; (\u0026#34; \u0026lt;\u0026lt; error \u0026lt;\u0026lt; \u0026#34;)\u0026#34; \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return -1; } // connect to the named pipe HANDLE hPipe; hPipe = ::CreateFile( pszPipeName, // pipe name GENERIC_READ | // read and write access GENERIC_WRITE, 0, // no sharing NULL, // default security attributes OPEN_EXISTING, // opens existing pipe 0, // default attributes NULL); // no template file if (hPipe == INVALID_HANDLE_VALUE) { std::cout \u0026lt;\u0026lt; \u0026#34;Impossible to connect to \u0026#34; \u0026lt;\u0026lt; pszPipeName \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return -2; } // ... send a command... // don\u0026#39;t forget to close the named pipe ::CloseHandle(hPipe); return 0; } Dig into the Command protocol Once the connection is created, a command can be sent by writing to the pipe (or socket on Linux). The answer will be received by reading from the pipe (or socket on Linux). There is something important to remember: if you need to send different commands, it is needed to create one connection for each. You should not try to reuse a given connection that might also be closed after a command has been processed.\nLet’s start with the IpcHeader Both the command and the response share the same header format:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 #pragma pack(1) struct IpcHeader { union { MagicVersion _magic; uint8_t Magic[14]; // Magic Version number; a 0 terminated ANSI char array }; uint16_t Size; // The size of the incoming packet, size = header + payload size uint8_t CommandSet; // The scope of the Command. uint8_t CommandId; // The command being sent uint16_t Reserved; // reserved for future use }; const MagicVersion DotnetIpcMagic_V1 = { \u0026#34;DOTNET_IPC_V1\u0026#34; }; The Size field stores the size of the header (= 20) plus the size of the payload (if any). Next comes the CommandSet field that identifies the groups in which the command belongs to:\n1 2 3 4 5 6 7 8 9 10 enum class DiagnosticServerCommandSet : uint8_t { // reserved = 0x00, Dump = 0x01, EventPipe = 0x02, Profiler = 0x03, Process = 0x04, Server = 0xFF, }; It is then followed by the ID of a command in that CommandSet :\nFor example, here is the memory layout of a ProcessInfo command:\nSince there is no additional parameter that would need to be encoded in a payload, the Size field is set to 20 (= 0x14 in hexadecimal) which is the size of the header alone.\nTo make command handling easier, I have defined the corresponding headers:\n1 2 3 4 5 6 7 8 const IpcHeader ProcessInfoMessage = { { DotnetIpcMagic_V1 }, (uint16_t)sizeof(IpcHeader), (uint8_t)DiagnosticServerCommandSet::Process, (uint8_t)DiagnosticServerResponseId::OK, (uint16_t)0x0000 }; All that makes the the code to send such a ProcessInfo command straightforward:\n1 2 3 4 5 6 7 8 9 10 11 bool ProcessInfoRequest::Send(HANDLE hPipe) { // send the request IpcHeader message = ProcessInfoMessage; DWORD bytesWrittenCount = 0; if (!::WriteFile(hPipe, \u0026amp;message, sizeof(message), \u0026amp;bytesWrittenCount, nullptr)) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while sending ProcessInfo message to the CLR: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error \u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } The next step is to read from the pipe to get the answer from the CLR. As mentioned earlier, the same IpcHeader is received; always with the Server (0xFF) CommandSet value. The CommandId field is 0 for a success and 0xFF in case of an error.\n1 2 3 4 5 6 enum class DiagnosticServerResponseId : uint8_t { OK = 0x00, // future Error = 0xFF, }; For example, here is the memory layout of a ProcessInfo answer:\nNote that numbers are little-endian encoded (hence the 0001 for the Size field: it means 0x0100 = 256 bytes)\nThe code to analyze the response follows:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 // analyze the response // 1. get the header to know how large the buffer should be to get the payload message = {}; DWORD bytesReadCount = 0; if (!::ReadFile(hPipe, \u0026amp;message, sizeof(message), \u0026amp;bytesReadCount, nullptr)) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while getting ProcessInfo response from the CLR: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error\u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } if (message.CommandId != (uint8_t)DiagnosticServerResponseId::OK) { Error = message.CommandId; std::cout \u0026lt;\u0026lt; \u0026#34;Error returned by the CLR in ProcessInfo response: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error\u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } In case of success, the size of the response payload is obtained from the Size field by subtracting the size of the header. The next step is to allocate a buffer and read the payload from the pipe into that buffer:\n1 2 3 4 5 6 7 8 9 uint16_t payloadSize = message.Size - sizeof(message); _buffer = new uint8_t[payloadSize]; if (!::ReadFile(hPipe, _buffer, payloadSize, \u0026amp;bytesReadCount, nullptr)) { Error = ::GetLastError(); std::cout \u0026lt;\u0026lt; \u0026#34;Error while getting ProcessInfo payload: 0x\u0026#34; \u0026lt;\u0026lt; std::hex \u0026lt;\u0026lt; Error \u0026lt;\u0026lt; std::dec \u0026lt;\u0026lt; \u0026#34;\\n\u0026#34;; return false; } // Note: bytesReadCount == payloadSize How to access the response fields Now that the response payload has been read into a memory buffer, it is time to look at the expected fields as described in the documentation. Here is the encoding of the different field types:\nTo continue with the ProcessInfo example, here are the expected fields:\nSo here is the memory layout for the response payload for my monitored Windows 64-bit simulator.exe test application:\nTo simplify the implementation, the class in charge of a command allocates a buffer corresponding to the payload and provides fields with values either copied from it or, in case of strings, pointing to the buffer (just after the 32-bit length):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 bool ProcessInfoRequest::ParseResponse(DWORD payloadSize) { uint32_t index = 0; memcpy(\u0026amp;Pid, \u0026amp;_buffer[index], sizeof(Pid)); index += sizeof(Pid); if (payloadSize \u0026lt; index) return false; memcpy(\u0026amp;RuntimeCookie, \u0026amp;_buffer[index], sizeof(RuntimeCookie)); index += sizeof(RuntimeCookie); if (payloadSize \u0026lt; index) return false; PointToString(_buffer, index, CommandLine); if (payloadSize \u0026lt; index) return false; PointToString(_buffer, index, OperatingSystem); if (payloadSize \u0026lt; index) return false; PointToString(_buffer, index, Architecture); return true; } The PointToString helper code is straightforward:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 void PointToString(uint8_t* buffer, uint32_t\u0026amp; index, wchar_t*\u0026amp; string) { // strings are stored as: // - characters count as uint32_t // - array of UTF16 characters followed by \u0026#34;\\0\u0026#34; // Note that the last L\u0026#34;\\0\u0026#34; IS COUNTED uint32_t count; memcpy(\u0026amp;count, \u0026amp;buffer[index], sizeof(count)); // skip characters count index += sizeof(count); // empty string case // Note: could make it point to the \u0026#34;count\u0026#34; which is 0 in the buffer // instead of returning nullptr if (count == 0) { string = nullptr; return; } string = (wchar_t*)\u0026amp;buffer[index]; // skip the whole string (including last UTF16 \u0026#39;\\0\u0026#39;) index += count * (uint32_t)sizeof(wchar_t); } You now know how to send a command without parameter and analyze the expected response. The next post will show you how to listen to CLR events like dotnet trace.\nResources Episode 1 — Digging into the CLR Diagnostics IPC Protocol in C# Diagnostics IPC protocol documentation Microsoft.Diagnostics.NETCore.Client source code ","cover":"https://chrisnas.github.io/posts/2022-09-18_net-diagnostic-ipc-protocol/1_STn1QLFwnmlgIZBXFuTvhA.png","date":"2022-09-18","permalink":"https://chrisnas.github.io/posts/2022-09-18_net-diagnostic-ipc-protocol/","summary":"\u003chr\u003e\n\u003cp\u003eThe \u003ca href=\"/posts/2022-07-28_digging-into-the-clr/\"\u003eprevious post\u003c/a\u003e was describing the C# helpers to communicate with the diagnostic server in the CLR of a running .NET application.\u003c/p\u003e\n\u003cp\u003eIf, like me, you must write native code (i.e not in C#), you will need to implement the transport and protocol yourself. And, as you will see, it is not that complicated thanks to the \u003ca href=\"https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md\"\u003edocumentation\u003c/a\u003e but also by using the \u003ca href=\"https://github.com/dotnet/diagnostics/tree/main/src/Microsoft.Diagnostics.NETCore.Client\"\u003eavailable C# code\u003c/a\u003e of the Microsoft.Diagnostics.NETCore.Client implementation as a guide.\u003c/p\u003e","title":".NET Diagnostic IPC protocol: the C++ way"},{"content":" Introduction As I explained during a DotNext conference session, the .NET CLI tools such as dotnet-trace, dotnet-counter or dotnet-dump are communicating with the CLR thanks to Named Pipe on Windows and Domain Socket on Linux. Within the CLR, a diagnostic server thread is responsible for answering requests. A communication protocol allows a tool to send commands and expect responses. This Diagnostic IPC Protocol is pretty well documented in the dotnet Diagnostics repository.\nBefore going into the protocol details, here is a list of the available commands and their effect:\nThis series will detail how to communicate with a CLR using this protocol both in C# and in C++. Also note that processing CLR events thanks to EventPipe will also be covered.\nMake it simple: use Microsoft.Diagnostics.NETCore.Client nuget With TraceEvent nugget package, Microsoft provided a great library to easily listen to CLR events in C#. If you want to easily send CLR diagnostic IPC protocol commands to a CLR in a .NET process, Microsoft.Diagnostics.NETCore.Client nuget package is for you. Remember that EventPipe is implemented by .NET Core and .NET 5+ (so no .NET Framework support)\nThe Swiss knife class DiagnosticsClient gives you access to most of the commands plus a way to list .NET processes as a bonus:\nIf you want to get the pid of all supported running .NET applications, call the static GetPublishedProcesses() method. Beware that the pid of your own application will also be included.\n1 2 3 4 5 6 7 8 9 private static void ListProcesses() { var selfPid = Process.GetCurrentProcess().Id; foreach (var pid in DiagnosticsClient.GetPublishedProcesses()) { var process = Process.GetProcessById(pid); Console.WriteLine($\u0026#34;{pid,6}{GetSeparator(pid == selfPid)}{process.ProcessName}\u0026#34;); } } Otherwise, create an instance passing the process ID of the .NET application you are interested in. With this object, call the method corresponding to the command you want to send. For example, the following code is calling GetProcessEnvironment() to list the environment variables:\n1 2 3 4 5 6 7 8 9 10 private static void ListEnvironmentVariables(int pid) { // get environment variables via existing wrapper in DiagnosticsClient var client = new DiagnosticsClient(pid); var envVariables = client.GetProcessEnvironment(); foreach (var variable in envVariables.Keys.OrderBy(k =\u0026gt; k)) { Console.WriteLine($\u0026#34;{variable,26} = {envVariables[variable]}\u0026#34;); } } Note that the value “ExitCode=00000000” is associated to the “” (empty) key for reason unknown to me…\nEven though the undocumented command to set an environment variable is available via the SetEnvironmentVariable() method, there is no helper method wrapping the ProcessInfo command. In fact, a GetProcessInfo() method exists but it is internal! The PidIpcEndpoint type in charge of the transport and the IpcMessage, IpcResponse and IpcClient types dealing with commands are also internal. It means that the nuget will not help if you need to send the ProcessInfo command.\nStill easy: use Microsoft.Diagnostics.NETCore.Client source code The .NET team spends some extra time testing, documenting, and verifying they are happy with the APIs in NetCore.Client before making them public, so sometimes you will see types that they used in their own tools that are still internal. But wait, if the CLI tools need some of these types, how will it work? Well…\nThe C# project corresponding to the MicrosoftDiagnostics.NETCore.Client assembly is part of the dotnet Diagnostic repository where the tools are implemented. If you look at the .csproj file, you will see InternalsVisibleTo attributes to allow the tools to access the internal types:\n1 2 3 4 5 6 7 8 9 10 11 \u0026lt;ItemGroup\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;dotnet-counters\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;dotnet-dsrouter\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;dotnet-monitor\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;dotnet-trace\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;Microsoft.Diagnostics.Monitoring\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;Microsoft.Diagnostics.Monitoring.EventPipe\u0026#34; /\u0026gt; \u0026lt;!-- Temporary until Diagnostic Apis are finalized--\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;Microsoft.Diagnostics.Monitoring.WebApi\u0026#34; /\u0026gt; \u0026lt;InternalsVisibleTo Include=\u0026#34;Microsoft.Diagnostics.NETCore.Client.UnitTests\u0026#34; /\u0026gt; \u0026lt;/ItemGroup\u0026gt; The great thing about OSS is that you can compile your own fork to make these types public. Of course you will be on your own to support these custom builds of the library and it is possible there will be changes to the API before .NET makes it public.\nSo what you could do to use these internal types in your code is the following:\ncopy the folder from the Diagnostics repository add the name of your assembly that needs to access the internal types and members into the .csproj replace the reference to the nuget package by a project reference to the copied project And now GetProcessInfo and the other internal types are public for you:\n1 2 3 4 5 6 7 8 9 private static void ListProcessInfo(int pid) { var client = new DiagnosticsClient(pid); var info = client.GetProcessInfo(); // this method is internal Console.WriteLine($\u0026#34; Command Line = {info.CommandLine}\u0026#34;); Console.WriteLine($\u0026#34; Architecture = {info.ProcessArchitecture}\u0026#34;); Console.WriteLine($\u0026#34; Entry point assembly = {info.ManagedEntrypointAssemblyName}\u0026#34;); Console.WriteLine($\u0026#34; CLR Version = {info.ClrProductVersionString}\u0026#34;); } Note that during my tests, I was able to get a value for the ManagedEntrypointAssemblyName or ClrProductVersionString properties only with .NET 6+: the ProcessInfo2 (0x404) command does not seem to be implemented in previous versions.\nThe next episode of the series will start to explain the EventPipe IPC protocol from a native C++ developer perspective.\nResources Microsoft.Diagnostics.NETCore.Client nuget package TraceEvent nugget package Diagnostics IPC protocol documentation ","cover":"https://chrisnas.github.io/posts/2022-07-28_digging-into-the-clr/1_lbYwy45LUJHmX-WsdrGJYw.png","date":"2022-07-28","permalink":"https://chrisnas.github.io/posts/2022-07-28_digging-into-the-clr/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAs I explained during \u003ca href=\"https://www.youtube.com/watch?v=Jpoy3O6x-wM\u0026amp;t=1530s\"\u003ea DotNext conference session\u003c/a\u003e, the .NET CLI tools such as \u003cstrong\u003edotnet-trace\u003c/strong\u003e, \u003cstrong\u003edotnet-counter\u003c/strong\u003e or \u003cstrong\u003edotnet-dump\u003c/strong\u003e are communicating with the CLR thanks to Named Pipe on Windows and Domain Socket on Linux. Within the CLR, a \u003ca href=\"https://github.com/dotnet/coreclr/blob/release/3.1/src/vm/diagnosticserver.cpp#24\"\u003ediagnostic server thread\u003c/a\u003e is responsible for answering requests. A communication protocol allows a tool to send \u003cem\u003ecommands\u003c/em\u003e and expect \u003cem\u003eresponses\u003c/em\u003e. This Diagnostic IPC Protocol is \u003ca href=\"https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md\"\u003epretty well documented\u003c/a\u003e in the dotnet Diagnostics repository.\u003c/p\u003e\n\u003cp\u003eBefore going into the protocol details, here is a list of the available commands and their effect:\u003c/p\u003e","title":"Digging into the CLR Diagnostics IPC Protocol in C#"},{"content":" Introduction With the new 2.10 release of the Datadog .NET Tracer and Continuous Profiler available, it is time to update some investigation workflows I already introduced. New features have been added to help you diagnose performance issues in your .NET applications:\nLinux support! Code Hotspots: allow you to automatically navigate from lengthy spans and requests to profiles CPU profiling: pinpoint high CPU consuming methods Exceptions profiling: identify exceptions distributions Profile sequence: easily profile an application startup The goal of this post is to show you how all these features make your investigations easier. I would recommend reading the previous post; especially for the environment setup that I won’t repeat here.\nIt’s Linux showtime! The .NET Continuous Profiler is now available for Linux. The only limitation is the presence of glibc 2.18+ in the distribution; for example, CentOS 7 is not supported. Beyond that, we provide features parity between Linux and Windows.\nIn terms of installation, download the .NET Tracer package that supports your operating system and architecture. Go to the documentation for the additional configuration steps.\nFrom spans to profiles When analysing lengthy requests, you usually start from looking at the corresponding spans in the APM Traces part of the UI. It is now possible to view the corresponding profiles by clicking the “View Profile” button in the “Code Hotspots” tab:\nBefore digging into the profiling information, you are already able to see that more than half of the time is spent in Buffer._Memmove that is called by Buggybits ProductsController.Index method:\nFrom the profile view, it is also possible to come back to the traces:\nLet’s see now what new features are available at the profiling side.\nCPU profiling The most demanded feature was the ability to analyse CPU consumption (a.k.a. CPU profiling). The idea is to be able to identify code that really consumes CPU usage and optimize it. This is particularly important in the context of cloud-based computing where what you pay is related to the consumed CPU.\nIn term of implementation, unlike Wall Time profiling, we look at the time spent by a thread on a CPU core and not the elapsed time since the last time we checked (every ~10ms). We also collect the call stack of a thread only if it is currently running on a core. Why? Because we want to only record call stacks corresponding to code paths that are consuming CPU. For example, ThreadPool threads are usually waiting (not interesting call stack) for a work item to process (interesting call stack).\nIn the last blog post, the code responsible for lengthy requests is doing too many string concatenations (look for += in the following code):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 public IActionResult Index() { var sw = new Stopwatch(); sw.Start(); var products = dataLayer.GetAllProducts(); var productsTable = \u0026#34;\u0026lt;table\u0026gt;\u0026lt;tr\u0026gt;\u0026lt;th\u0026gt;Product Name\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Description\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Price\u0026lt;/th\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; foreach (var product in products) { productsTable += $\u0026#34;\u0026lt;tr\u0026gt;\u0026lt;td\u0026gt;{product.ProductName}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Description}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Price}\u0026lt;/td\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; } productsTable += \u0026#34;\u0026lt;/table\u0026gt;\u0026#34;; sw.Stop(); ViewData[\u0026#34;ElapsedTimeInMs\u0026#34;] = sw.ElapsedMilliseconds; ViewData[\u0026#34;ProductsTable\u0026#34;] = productsTable; return View(); } The Wall Time view was very explicit about ProductsController.Index() calling String.Concat() culprit:\nA simple solution is to use a StringBuilder to optimize the concatenations:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public IActionResult Builder() { var sw = new Stopwatch(); sw.Start(); var products = dataLayer.GetAllProducts(); var productsTable = new StringBuilder(1000 * 80); productsTable.Append(\u0026#34;\u0026lt;table\u0026gt;\u0026lt;tr\u0026gt;\u0026lt;th\u0026gt;Product Name\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Description\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Price\u0026lt;/th\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;); foreach (var product in products) { productsTable.Append($\u0026#34;\u0026lt;tr\u0026gt;\u0026lt;td\u0026gt;{product.ProductName}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Description}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Price}\u0026lt;/td\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;); } productsTable.Append(\u0026#34;\u0026lt;/table\u0026gt;\u0026#34;); sw.Stop(); ViewData[\u0026#34;ElapsedTimeInMs\u0026#34;] = sw.ElapsedMilliseconds; ViewData[\u0026#34;ProductsTable\u0026#34;] = productsTable; return View(\u0026#34;Index\u0026#34;); } The Wall Time view of the profile with the fix does not provide anything useful…\nIf you want to go deeper, you need to look at the CPU consumption:\nThe ProductController.Builder method is handling the request (like Index() in the String.Concat case) and calls DataLayer.GetAllProducts() where most of the CPU-related work is done.\nWould it be interesting to continue optimizing the code? Notice that GetAllProducts() is “only” consuming 94ms and the other Number.* and String.Concat method around 350ms. So a gain might be neglectable compared to the total ~3 seconds CPU usage\nRemember that you should not optimize for the sake of “optimizing”: you should have metrics that tell you when to start (too lengthy request processing) and when to stop.\nExceptions profiling In the .NET world, exceptions are at the center of errors handling. It is now possible to get a sampled view of the exceptions that happened during an application lifetime; by type:\nand by message:\nAt the implementation level, the new exceptions profiler is notified by the CLR when an exception is thrown. Since in special cases (such as network issue or invalid parsed data for example), an application could trigger thousands of exceptions in a very short period, it is needed to sample them. Otherwise, the impact on performances would be severe; especially if the call stack needs to be rebuilt for each exception.\nFirst, at least one exception per type is kept, ensuring that weird specific exceptions are not lost in the flow. Second, exceptions are sampled over time based on a fixed number of exceptions per profile and the rate of appearance. For knowing the exact number of exceptions, feel free to leverage the Runtime Metrics package as explained in the previous blog post.\nProfiling the application bootstrap In some situations, you are interested in analysing an application bootstrap. In Datadog APM Profile Search UI, it means finding the first profile of the given service execution. However, even with the date and time column, it is not obvious to find the right one:\nTo help you find the initial profile of a service execution, a new “profile_seq” tag has been added to the HTTP request used to upload the profiles. It contains the count of generated profiles for a given execution of a service, starting from 0.\nSo now, in the Options of the profile list, add a “profile_seq” column:\nThe first profile is then easily spottable with a 0 value:\nIn the future, a more visual hint might be added to identify it without the need to add the column.\nMajor implementation refactoring Finally, our implementation has benefited from a large code refactoring. As the previous post explained, the generation of .pprof files and their upload was done in C#. This has been replaced by using a rust library shared amongst different profiler libraries (native, Ruby, .NET).\nIt means that you should not anymore see these frames in the application call stacks:\nIt does not mean that one third of the processing has been removed! Just that no more C# code is running with performance gain. First, the managed implementation was allocating objects managed by the garbage collector; adding pressure that might trigger more collections. Second, with the native rust implementation, there is no need to duplicate data between the collecting native part of the continuous profiler and the managed code used to serialize it.\nIn addition, several optimizations have been done in the symbol’s resolution (i.e., type and method names) part of the code that also reduce memory consumption and CPU usage.\nHappy profiling!\nReferences Datadog Tracer \u0026amp; Continuous Profiler .msi Installer and Linux tar.gz Datadog Continuous Profiler documentation Datadog Tracer documentation Datadog Runtime metrics documentation Tess Ferrandez repository for BuggyBits labs ","cover":"https://chrisnas.github.io/posts/2022-06-09_troubleshooting-cpu-and-except/1_uNg9te1UxHzFUzCCwJN3PQ.png","date":"2022-06-09","permalink":"https://chrisnas.github.io/posts/2022-06-09_troubleshooting-cpu-and-except/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWith the new 2.10 release of the Datadog .NET Tracer and Continuous Profiler available, it is time to update some investigation workflows \u003ca href=\"/posts/2022-01-28_troubleshooting-net-performanc/\"\u003eI already introduced\u003c/a\u003e. New features have been added to help you diagnose performance issues in your .NET applications:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eLinux support!\u003c/li\u003e\n\u003cli\u003eCode Hotspots: allow you to automatically navigate from lengthy spans and requests to profiles\u003c/li\u003e\n\u003cli\u003eCPU profiling: pinpoint high CPU consuming methods\u003c/li\u003e\n\u003cli\u003eExceptions profiling: identify exceptions distributions\u003c/li\u003e\n\u003cli\u003eProfile sequence: easily profile an application startup\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe goal of this post is to show you how all these features make your investigations easier. I would recommend reading \u003ca href=\"/posts/2022-01-28_troubleshooting-net-performanc/\"\u003ethe previous post\u003c/a\u003e; especially for the environment setup that I won’t repeat here.\u003c/p\u003e","title":"Troubleshooting CPU and exceptions issues with Datadog toolbox"},{"content":" Here comes the end of the series about .NET profiling APIs. This final episode describes how to get fields of a value type instance and how to deal with exceptions.\nGetting fields of a value type instance The case of a value type is very similar to a reference type except that the address you receive points directly to the beginning of the fields value; instead of the type MethodTable (or ObjectID if you prefer).\n1 2 3 4 case ELEMENT_TYPE_VALUETYPE: { // same as reference type except that the received address points to the beginning of the value type instance fields byte* managedReference = (byte*)address; It means that you won’t be able to call ICorProfilerInfo::GetClassFromObject to get its ClassID and start the field enumeration like for a reference type. Note that despite its name, ICorProfilerInfo2::GetClassLayout is perfectly capable of providing fields offset for a value type.\nInstead, you will have to use the metadata token (extracted from the method signature) corresponding to the parameter. If the type is defined in the same assembly as the method, it is just a matter of calling ICorProfilerInfo::GetClassFromToken with the same moduleID as the method:\n1 2 3 4 if (TypeFromToken(elementTypeToken) == mdtTypeDef) { hr = _pProfilerInfo-\u0026gt;GetClassFromToken(moduleId, elementTypeToken, \u0026amp;classID); } If the type is defined in another assembly (i.e. TypeFromToken() will return mdTypeRef), the metadata keeps track of the relationships:\nThe ResolutionScope (i.e. assembly where the typeref is defined) is given by IMetaDataImport::GetTypeRefProps:\n1 2 3 4 WCHAR szName[MAX_CLASS_NAME]; ULONG chName = MAX_CLASS_NAME-1; mdToken resolutionScope = mdTokenNil; hr = pMetaDataImport-\u0026gt;GetTypeRefProps(elementTypeToken, \u0026amp;resolutionScope, szName, chName, \u0026amp;chName); Unfortunately, I did not find any direct API call (either from IMetaDataImport or ICorProfilerInfo) to find the ModuleID where a typeref is defined in another assembly. The only link is the IMetaDataImport corresponding to the module implementing the typeref that is available via the not recommended IMetaDataImport::ResolveTypeRef:\n1 2 3 IMetaDataImport* pMetaDataImportRef = NULL; mdToken referencedElementTypeToken = mdTokenNil; hr = pMetaDataImport-\u0026gt;ResolveTypeRef(elementTypeToken, IID_IMetaDataImport, (IUnknown**)\u0026amp;pMetaDataImportRef, \u0026amp;referencedElementTypeToken); This looks like a dead end: the metadata API knows about tokens (i.e. values generated by C# compiler) and the profiling API knows about IDs (i.e. pointers to internal data structures).\nRemember that ICorProfilerInfo:: GetModuleMetaData returns the IMetaDataImport corresponding to a given ModuleID. So the idea is to be able to identify a ModuleID by its IMetaDataImport counterpart, enumerate the modules loaded by the profiler and get their “identifier” to compare with the one implementing the type we are interested in. This identifier could be the mdModule token return by IMetaDataImport::GetModuleFromScope:\n1 2 mdModule module = mdModuleNil; hr = pMetaDataImport-\u0026gt;GetModuleFromScope(\u0026amp;module); Well… not really because I always got 0x1 in my test. This value could be the module in the assembly and I only tested single-module assemblies generated by Visual Studio. Hopefully, each module is labelled by a unique “mvid” (i.e. a GUID identifying each module) returned by IMetaDataImport::GetScopeProps:\n1 2 GUID refMvid; hr = pMetaDataImport-\u0026gt;GetScopeProps(szName, chName, \u0026amp;chName, \u0026amp;refMvid); Here is the code to enumerate profiled modules and check for the given refMvid:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ICorProfilerModuleEnum* pEnumModule = NULL; hr = _pProfilerInfo-\u0026gt;EnumModules(\u0026amp;pEnumModule); ModuleID enumeratedModuleId = NULL; ModuleID refModuleId = NULL; GUID mvid; IMetaDataImport* pEnumeratedModuleMetadata = NULL; mdModule enumeratedModuleToken = mdModuleNil; ULONG fetchedModulesCount = 0; do { hr = pEnumModule-\u0026gt;Next(1, \u0026amp;enumeratedModuleId, \u0026amp;fetchedModulesCount); if (FAILED(hr)) break; if (fetchedModulesCount == 0) break; // get the IMetadataImport corresponding to this module hr = _pProfilerInfo-\u0026gt;GetModuleMetaData(enumeratedModuleId, ofRead, IID_IMetaDataImport, (IUnknown**)\u0026amp;pEnumeratedModuleMetadata); // get the module token hr = pEnumeratedModuleMetadata-\u0026gt;GetModuleFromScope(\u0026amp;enumeratedModuleToken); hr = pEnumeratedModuleMetadata-\u0026gt;GetScopeProps(szName, chName, \u0026amp;chName, \u0026amp;mvid); pEnumeratedModuleMetadata-\u0026gt;Release(); if (refMvid == mvid) { refModuleId = enumeratedModuleId; } } while (TRUE); pEnumModule-\u0026gt;Release(); // this is the one! moduleId = refModuleId; For performance sake, it would be better to build (in your IProfilerCallback implementation of ModuleLoadFinished and ModuleUnloadFinished), a map between the loaded modules and their mvid. This map could then be used when a ModuleID is needed while only the metadata side is known.\nWhat has been returned? The final step of our journey is to figure out what is returned by a method. The leave callback executed each time a method returns receives a FunctionID and a COR_PRF_ELT_INFO as parameters:\n1 2 3 4 PROFILER_STUB LeaveStub(FunctionID functionId, COR_PRF_ELT_INFO eltInfo) { ... } The signature parsing for a FunctionID already shown tells whether it returns void or an instance of a type identified by an element type and a metadata token.\n1 2 3 4 5 6 7 8 9 void CorProfilerHelpers::DumpLeaveReturnValue(FunctionID functionId, FunctionSignature* pSignature, COR_PRF_ELT_INFO eltInfo) { char value[128]; value[0] = \u0026#39;\\0\u0026#39;; if (_stricmp(pSignature-\u0026gt;pszReturnType, \u0026#34;void\u0026#34;) == 0) { strcpy_s(value, ARRAY_LEN(value) - 1, \u0026#34;void\u0026#34;); } The COR_PRF_ELT_INFO parameter is the key to get the address of the returned instance thanks to ICorProfilerInfo3::GetFunctionLeave3Info:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 else { ULONG pcbArgumentInfo = 0; COR_PRF_FRAME_INFO frameInfo; COR_PRF_FUNCTION_ARGUMENT_RANGE returnValueInfo; HRESULT hr = _pProfilerInfo-\u0026gt;GetFunctionLeave3Info(functionId, eltInfo, \u0026amp;frameInfo, \u0026amp;returnValueInfo); UINT_PTR pStartValue = returnValueInfo.startAddress; ULONG length = returnValueInfo.length; const FunctionParameter* pReturnParameter = pSignature-\u0026gt;GetReturnParameter(); GetObjectValue(pStartValue, length, pReturnParameter-\u0026gt;ElementType, pReturnParameter-\u0026gt;TypeToken, pSignature-\u0026gt;ModuleId, value, sizeof(value) / sizeof(value[0]) - 1); } ... } To get meaningful information from this API, the COR_PRF_ENABLE_FUNCTION_RETVAL flag must be set when ICorProfilerInfo::SetEventMask is called during ICorProfilerCallback::Initialize. The returned COR_PRF_FUNCTION_ARGUMENT_RANGE contains the address of the returned instance in its startAddress field.\nThe same GetObjectValue helper function already used to get parameters’ value is still valid here.\nAnd what about exceptions? I discussed how to follow the normal flow of execution by entering and exiting a method. When, in a method, an exception is thrown and not caught, you won’t get notified by the Leave callback. Instead other methods of ICorProfilerCallback are called if you pass COR_PRF_MONITOR_EXCEPTIONS to ICorProfilerInfo::SetEventMask.\nLet’s take the following C# example to understand when which callbacks are executed:\nThe blue arrows are showing the flow of execution from Throws to ThrowLevel3. When the InvalidOperationException is thrown, ExceptionSearchFunctionXXX callbacks are executed “backward” to find the first catch block that will match the exception (i.e. up to Throws). It is now time to run the finally blocks (if any) starting from where the exception was thrown (i.e. ThrowLevel3) up to the catch block in Throws.\nThe object corresponding to the exception is passed to ExceptionThrown and ExceptionCatcherEnter as ObjectID. Feel free to use the code that has been presented earlier to get the type of the exception. However, getting interesting fields such as _message, or _innerException requires to figure out the ClassID of the System.Exception base class.\nAs already mentioned, the ICorProfilerInfo2::GetClassIDInfo2 function returns the ClassID of the parent type. Here is the code to search a parent type in a type hierarchy:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 HRESULT GetExceptionBaseClass(ICorProfilerInfo8* pInfo, ClassID classId, ClassID* baseClassId) { ModuleID moduleId; ClassID parentClassId; HRESULT hr = pInfo-\u0026gt;GetClassIDInfo2(classId, \u0026amp;moduleId, nullptr, \u0026amp;parentClassId, 0, nullptr, nullptr); if (FAILED(hr)) return hr; WCHAR szName[260]; hr = CorProfilerHelpers::GetTypeName(pInfo, classId, moduleId, szName, ARRAY_LEN(szName)-1); if (wcscmp(L\u0026#34;System.Exception\u0026#34;, szName) == 0) { *baseClassId = classId; return S_OK; } return GetExceptionBaseClass(pInfo, parentClassId, baseClassId); } The FunctionID corresponding to the method is passed as a parameter to ExceptionSearchFunctionEnter, ExceptionSearchFilterEnter, ExceptionSearchCatcherFound, ExceptionUnwindFunctionEnter, ExceptionUnwindFinallyEnter, and ExceptionCatcherEnter. (i.e. not to the xxxLeave callbacks)\nConclusion This series of articles introduced the .NET native profiling API in the context of method enter/leave tracing. The relationships between its metadata counterpart has also been detailed. You should now be able to implement other overrides of ICorProfilerCallback to get details about allocations for example.\nReferences Episode 1: Start a journey into the .NET Profiling APIs Episode 2: Dealing with Modules, Assemblies and Types with CLR profiling API Episode 3: Decyphering methods signature with .NET Profiling APIs Episode 4: Reading parameters value with the .NET Profiling APIs Episode 5: Accessing arrays and class fields with .NET profiling APIs ","cover":"https://chrisnas.github.io/posts/2022-03-14_value-types-and-exceptions/1_lIP5rxv4DVU5zs22mVWSXA.png","date":"2022-03-14","permalink":"https://chrisnas.github.io/posts/2022-03-14_value-types-and-exceptions/","summary":"\u003chr\u003e\n\u003cp\u003eHere comes the end of the series about .NET profiling APIs. This final episode describes how to get fields of a value type instance and how to deal with exceptions.\u003c/p\u003e\n\u003ch2 id=\"getting-fields-of-a-value-typeinstance\"\u003eGetting fields of a value type instance\u003c/h2\u003e\n\u003cp\u003eThe case of a value type is very similar to a reference type except that the address you receive points directly to the beginning of the fields value; instead of the type MethodTable (or \u003cstrong\u003eObjectID\u003c/strong\u003e if you prefer).\u003c/p\u003e","title":"Value types and exceptions in .NET profiling"},{"content":" Introduction The beta of Datadog .NET Continuous Profiler is available!\nThis is a great opportunity to show how to use the different tools provided by Datadog to troubleshoot .NET applications facing performance issues. Tess Ferrandez updated her famous BuggyBits application to .NET Core. Among the different available scenarios, let’s see how to investigate the Lab 4 — High CPU Hang with Datadog. It will be completely different from Tess way: no need to analyze memory dump anymore.\nSetup the environment First, you need to download and run the .msi from our Tracer repository: it will install both the Tracer and the Profiler. The former allows you, among other things, to see how long it takes to process ASP.NET Core requests. The latter is in Beta today and provides wall time duration of your threads (more on this later). Look at the corresponding documentations for the details of enabling tracing and profiling once installed.\nNext, ensure that .NET Runtime Metrics are installed for your organization:\nAdd DD_RUNTIME_METRICS_ENABLED=true environment variable for the application/service you want to monitor. Once enabled for your application, this package allows you to see the evolution of important metrics including some that you won’t find anywhere else such as GC pause time, thread contention time or count of exceptions per type.\nEnsure that DogstatsD is setup for the Agent\nLooking at the symptoms In my example, the buggybits application is running under the datadog.demos.buggybits service name. This is how I can filter the related traces in the APM/Traces part of the Datadog portal:\nIn this screenshot, the Products/Index requests duration is around 6 seconds; which is way too long!\nWhen clicking such a request, the details panel provides the exact URL in the Tags tab:\nThe Metrics tab shows CPU usage and other few metrics around the trace time:\nSo, in addition to having slow requests, the CPU usage seems to increase.\nIt is now time to go to the .NET runtime metrics dashboard and look at what is going on in more details. The first graph that shows up is the number of gen2 collections:\nIt means that during the 10 minutes test where very few requests are processed, almost 800 gen2 GC are happening every 10s (all runtime metrics are computed every 10 seconds).\nThe load test corresponding to these requests lasted 10 minutes between 4pm and 6+pm. Each time the requests were processed:\nthe CPU usage increased the number of gen2 collections increased the duration of pauses due to garbage collections increased the threads contention time increased In addition to being slow, it seems that the code that processes products/index HTTP requests has also an impact on the CPU (i.e. on the overall application and machine performances).\nIt would be great if we could see the callstacks corresponding to this processing. This is exactly what the .NET Wall time Continuous Profiler is all about: looking at the duration of methods through a flamegraph representation.\nHere comes the profiler Today, there is no direct way to jump from a trace to the profile containing the callstacks while the corresponding request was processed. We are currently working on this new feature called Code Hotspots.\nHowever, it is easy to use the service name to filter the profiles and select the period of time from APM/Profile Search:\nWhen you click a one-minute profile (the callstacks are gathered and sent every minute), a panel appears with the Performance tab selected. It shows a framegraph on the left and a list on the right.\nGetting used to flamegraph When you look at the wall time flamegraph, you see everything that happened during a single minute:\nThe previous screenshot highlights the groups of callstacks corresponding to the different threads of execution (from left to right):\nthe Main() entry point of the application the code in the CLR responsible for sending counters the code in the Profiler in charge of generating and sending (the very thin spike) the profile every minute the Tracer code the code that listens to the CLR events to generate the runtime metrics …and the application code that processes the requests! In the flamegraph, the width of each frame on a row represents the relative time during which the frame was found on a callstack. For example, in our tests, we have 4 threads simply calling Thread.Sleep; one for 10 seconds, one for 20 seconds, one for 30 seconds and a last one for 40 seconds. This is the expected result in a flamegraph (i.e. the widths are consistent with the 1/2/3/4 ratio):\nThis also applies to CPU-bound threads. For example, if 3 threads are computing the sum of numbers in a tight loop, this is the expected result (i.e. all OnCPUxxx have the same width)\nThese explanations should stop the fear that started to crawl inside your head about the “visible cost” of the Datadog Tracer and Profiler based on the previous screenshot. The large width of the Datadog threads frames is all about wall time, not CPU time: we are mostly sleeping or waiting but we don’t stop :^)\nInvestigate the performance issue The next step is to focus on the stack frames corresponding to the request processing to better understand what is going on.\nBasically, you would like to either remove a branch or keep only a branch. You simply have to move the mouse over a frame (i.e. ThreadPoolWorkQueue in the previous screenshot) and click the three dots that just appeared. Next, select Show From to keep only that branch in the flamegraph:\nNow, scroll-down into the flamegraph and the flow of execution corresponding to processing the Products/Index request becomes more visible:\nIt seems that the Index() method of the ProductsController is spending most of its time calling String.Concat().\nLet’s have a look at the source code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 // GET: Products public IActionResult Index() { var products = dataLayer.GetAllProducts(); var productsTable = \u0026#34;\u0026lt;table\u0026gt;\u0026lt;tr\u0026gt;\u0026lt;th\u0026gt;Product Name\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Description\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Price\u0026lt;/th\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; foreach (var product in products) { productsTable += $\u0026#34;\u0026lt;tr\u0026gt;\u0026lt;td\u0026gt;{product.ProductName}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Description}\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;{product.Price}\u0026lt;/td\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; } productsTable += \u0026#34;\u0026lt;/table\u0026gt;\u0026#34;; ViewData[\u0026#34;ProductsTable\u0026#34;] = productsTable; return View(); } But still no sign of String.Concat()… Well, this is because the C# compiler is hiding it from you with the += syntaxic sugar. Let’s have a look at the decompiled code as shown by IlSpy (without the string.Concat transformation):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 public Microsoft.AspNetCore.Mvc.IActionResult Index() { var products = dataLayer.GetAllProducts(); string productsTable = \u0026#34;\u0026lt;table\u0026gt;\u0026lt;tr\u0026gt;\u0026lt;th\u0026gt;Product Name\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Description\u0026lt;/th\u0026gt;\u0026lt;th\u0026gt;Price\u0026lt;/th\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; var enumerator = products.GetEnumerator(); try { while (enumerator.MoveNext()) { BuggyBits.Models.Product product = enumerator.Current; string[] array = new string[8]; array[0] = productsTable; array[1] = \u0026#34;\u0026lt;tr\u0026gt;\u0026lt;td\u0026gt;\u0026#34;; array[2] = product.ProductName; array[3] = \u0026#34;\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;\u0026#34;; array[4] = product.Description; array[5] = \u0026#34;\u0026lt;/td\u0026gt;\u0026lt;td\u0026gt;\u0026#34;; array[6] = product.Price; array[7] = \u0026#34;\u0026lt;/td\u0026gt;\u0026lt;/tr\u0026gt;\u0026#34;; productsTable = string.Concat(array); } } finally { ((System.IDisposable)enumerator).Dispose(); } productsTable = string.Concat(productsTable, \u0026#34;\u0026lt;/table\u0026gt;\u0026#34;); base.ViewData[\u0026#34;ProductsTable\u0026#34;] = productsTable; return View(); } So now we can see the call to string.Concat() at the end of the while loop iteration.\nBehind the scene, since a string object is immutable, string.Concat() will create a new string each time it is called and the previous string referenced by productsTable is no more rooted and will put more pressure on the GC. If I’m telling you that datalayer.GetAllProducts() returns 10.000 products, it means that string.Concat gets called 10.000 times.\nAs the string grows, it will reach the 85000 bytes limit and start to be allocated in the LOH, adding more pressure on GC that will trigger gen2 collections; hence the high number of gen2 collections seen in the runtime metrics dashboard.\nNote that if the native frames were visible in the flamegraph (by the way, let me know if this is a feature that would make sense to add), you would see the methods of the CLR responsible for the GC.\nLook at Tess Ferrandez post for a possible solution to this expensive code pattern (i.e. calling string.Contact in a large tight loop)\nDifferent types of filters Before leaving, I would like to quicky talk about the list shown on the right hand-side of the UI.\nIt allows you to see a wall time summary per method name, namespace (i.e. sum of methods from types in the same namespace, thread ID (i.e. internal Datadog unique ID), thread name or AppDomain Name.\nFirst, the only methods listed here are “leaf” methods: they appear at the top of at least one callstack. If you would like to visually see some specific frames, you should use the filter box:\nAll other frames are faded out.\nSecond, the list is sorted with the largest wall time at the top: this could be an easy way to spot method “expensive” in term of CPU (i.e. will frequently be running so appear at the top of the stack). You simply need to skip the wait and sleep related methods like shown in the previous screenshot: String.Concat and Buffer._Memmove (used by string.Concat) were just in front of your eyes!\nWhen you select an element of the list, the flamegraph is updated accordingly: only the callstacks containing this element will be visible (it could speed up the filtering process)\nReferences Datadog Tracer \u0026amp; Continuous Profiler .msi Installer Datadog Continuous Profiler documentation Datadog Tracer documentation Datadog Runtime metrics documentation Tess Ferrandez repository for BuggyBits labs ","cover":"https://chrisnas.github.io/posts/2022-01-28_troubleshooting-net-performanc/1__B5VXVtYIxx32h8wm-HNLA.png","date":"2022-01-28","permalink":"https://chrisnas.github.io/posts/2022-01-28_troubleshooting-net-performanc/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eThe beta of Datadog .NET Continuous Profiler is \u003ca href=\"https://github.com/DataDog/dd-trace-dotnet/releases/tag/v2.1.1-profiler-beta1\"\u003eavailable\u003c/a\u003e!\u003c/p\u003e\n\u003cp\u003eThis is a great opportunity to show how to use the different tools provided by Datadog to troubleshoot .NET applications facing performance issues. \u003ca href=\"https://twitter.com/TessFerrandez\"\u003eTess Ferrandez\u003c/a\u003e updated her famous BuggyBits application to .NET Core. Among the \u003ca href=\"https://www.tessferrandez.com/blog/2008/02/04/debugging-demos-setup-instructions.html\"\u003edifferent available scenarios\u003c/a\u003e, let’s see how to investigate the \u003ca href=\"https://www.tessferrandez.com/blog/2008/02/27/net-debugging-demos-lab-4-walkthrough.html\"\u003eLab 4 — High CPU Hang\u003c/a\u003e with Datadog. It will be completely different from Tess way: no need to analyze memory dump anymore.\u003c/p\u003e","title":"Troubleshooting .NET performance issues with Datadog toolbox"},{"content":" Introduction After getting basic and strings parameters, it is time to look at arrays and reference types.\nAccessing managed arrays You check against null array parameter the same way as for string:\n1 2 3 4 5 6 7 8 9 10 case ELEMENT_TYPE_SZARRAY: { // look at the reference stored at the given address unsigned __int64* pAddress = (unsigned __int64*)address; byte* managedReference = (byte*)(*pAddress); if (managedReference == NULL) { strcpy_s(value, charCount, \u0026#34;null array\u0026#34;); break; } The ELEMENT_TYPE_SZARRAY applies to single dimension arrays including jagged arrays. ELEMENT_TYPE_ARRAY is used for matrice :\n1 2 3 4 5 6 7 8 9 10 case ELEMENT_TYPE_ARRAY: { // look at the reference stored at the given address unsigned __int64* pAddress = (unsigned __int64*)address; byte* managedReference = (byte*)(*pAddress); if (managedReference == NULL) { strcpy_s(value, charCount, \u0026#34;null matrix\u0026#34;); break; } Since arrays are reference types, we know that the managed reference points to the address of the Method Table but we need more insights to get the elements. Again, Sergey Tepliakov explains in great details how single dimension arrays are laid out in memory:\nThe length is stored in front of the elements as you can see in Visual Studio for the following 10 elements integer array:\nvar ints = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; Note that “jagged” arrays (i.e. array of array such as int[][]) are stored the same way: each element of the first array contains a reference to another array:\nThe layout is a little bit different for matrices (i.e. multi-dimensional arrays) such as this 2 x 4 integers array:\nvar matrix = new int[,] { { 1, 1, 1, 1 }, { 2, 2, 2, 2 } }; In that case, the total element count appears before each dimension length. The elements are stored row after row.\nThe profiling API ICorProfilerInfo2::GetArrayObjectInfo gives us all the implementation details we need:\n1 2 3 4 5 // single dimension array so the following arrays only need 1 element to receive size and lower bound ULONG32* pDimensionSizes = new ULONG32[1]; int* pDimensionLowerBounds = new int[1]; byte* pElements; // will point to the beginning of the array elements HRESULT hr = _pProfilerInfo-\u0026gt;GetArrayObjectInfo((ObjectID)managedReference, 1, pDimensionSizes, pDimensionLowerBounds, \u0026amp;pElements); Here is the description of each parameter:\nan ObjectID (i.e. a reference to an object in the managed heap) corresponding to an array. the number of dimensions (a.k.a. rank) so 1 for ELEMENT_TYPE_SZARRAY array. I will show in a moment how to get it for matrices. an allocated array to receive the size of each dimension an allocated array to receive the lower bound of each dimension; should be 0 for C# the address of the beginning of the elements So it is easy to detect an empty array: it means that its length is 0:\n1 2 3 4 5 6 7 8 9 ULONG32 arrayLength = pDimensionSizes[0]; delete pDimensionSizes; delete pDimensionLowerBounds; if (arrayLength == 0) { strcpy_s(value, charCount, \u0026#34;empty single dimension array\u0026#34;); break; } The next step is to get the value of each array element. It is easy to get the ClassID of a given object by calling ICorProfilerInfo::GetClassFromObject and then ICorProfilerInfo::IsArrayClass will provide the array rank and its elements CorElementType and ClassID.\n1 2 3 4 5 6 7 // get element type and array rank // (could be used before calling GetArrayObjectInfo to allocate the size + bounds arrays) ClassID classId; ULONG rank = 0; CorElementType baseElementType; hr = _pProfilerInfo-\u0026gt;GetClassFromObject((ObjectID)managedReference, \u0026amp;classId); hr = _pProfilerInfo-\u0026gt;IsArrayClass(classId, \u0026amp;baseElementType, NULL, \u0026amp;rank); With these details, iterating over each element to get its value is not that complicated:\n1 2 3 4 5 6 7 8 9 10 11 char elementValue[128]; for (ULONG current = 0; current \u0026lt; arrayLength; current++) { hr = GetElementValue(pElements, baseElementType, elementValue, ARRAY_LEN(elementValue) - 1); strcat_s(value, charCount, elementValue); if (FAILED(hr)) break; if (current \u0026lt; arrayLength - 1) strcat_s(value, charCount, \u0026#34;, \u0026#34;); } strcat_s(value, charCount, \u0026#34;)\u0026#34;); The GetElementValue is where you need to use the element type to compute the value but also to know how many byte you need to move forward to look at the next element:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 HRESULT CorProfilerHelpers::GetElementValue(byte*\u0026amp; pElement, CorElementType elementType , mdToken elementToken, ModuleID moduleId, char* value, ULONG charCount) { GetObjectValue((UINT_PTR)pElement, sizeof(void*), elementType, elementToken, moduleId, value, charCount); switch (elementType) { case ELEMENT_TYPE_BOOLEAN: pElement += sizeof(bool); break; case ELEMENT_TYPE_CHAR: pElement += sizeof(WCHAR); break; case ELEMENT_TYPE_I1: pElement += sizeof(char); break; case ELEMENT_TYPE_U1: pElement += sizeof(unsigned char); break; case ELEMENT_TYPE_I2: pElement += sizeof(short); break; case ELEMENT_TYPE_U2: pElement += sizeof(unsigned short); break; case ELEMENT_TYPE_I4: pElement += sizeof(int); break; case ELEMENT_TYPE_U4: pElement += sizeof(unsigned int); break; case ELEMENT_TYPE_I8: pElement += sizeof(long); break; case ELEMENT_TYPE_U8: pElement += sizeof(unsigned long); break; case ELEMENT_TYPE_R4: pElement += sizeof(float); break; case ELEMENT_TYPE_R8: pElement += sizeof(double); break; case ELEMENT_TYPE_STRING: pElement += sizeof(void*); break; case ELEMENT_TYPE_CLASS: // NOTE: can\u0026#39;t call GetObjectValue recursively because won\u0026#39;t fit on one line sprintf_s(value, charCount, \u0026#34;0x%p\u0026#34;, *(UINT_PTR*)pElement); pElement += sizeof(void*); break; case ELEMENT_TYPE_SZARRAY: // arrays are reference types so skip the size of an address pElement += sizeof(void*); break; case ELEMENT_TYPE_OBJECT: strcpy_s(value, charCount, \u0026#34;obj\u0026#34;); pElement += sizeof(void*); break; default: strcpy_s(value, charCount, \u0026#34;?\u0026#34;); return E_FAIL; } return S_OK; } For matrices, it is needed to know the rank ahead of time to allocate the GetArrayObjectInfo out parameters:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 case ELEMENT_TYPE_ARRAY: { // same code to check null matrix ... // get element type and array rank // --\u0026gt; used before calling GetArrayObjectInfo to allocate the size + bounds arrays ClassID classId; ULONG rank = 0; CorElementType baseElementType; HRESULT hr = _pProfilerInfo-\u0026gt;GetClassFromObject((ObjectID)managedReference, \u0026amp;classId); hr = _pProfilerInfo-\u0026gt;IsArrayClass(classId, \u0026amp;baseElementType, NULL, \u0026amp;rank); ULONG32* pDimensionSizes = new ULONG32[rank]; int* pDimensionLowerBounds = new int[rank]; byte* pElements; // will point to the beginning of the array elements hr = _pProfilerInfo-\u0026gt;GetArrayObjectInfo((ObjectID)managedReference, rank, pDimensionSizes, pDimensionLowerBounds, \u0026amp;pElements); The following code shows how to compute each dimension length:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 // show dimensions strcpy_s(value, charCount, \u0026#34;[\u0026#34;); char buffer[16]; for (ULONG32 i = 0; i \u0026lt; rank; i++) { sprintf_s(buffer, ARRAY_LEN(buffer) - 1, \u0026#34;%u\u0026#34;, pDimensionSizes[i]); strcat_s(value, charCount, buffer); if (i \u0026lt; rank -1) strcat_s(value, charCount, \u0026#34;, \u0026#34;); } strcat_s(value, charCount, \u0026#34;]\u0026#34;); delete pDimensionSizes; delete pDimensionLowerBounds; Getting fields of a reference type instance Since most “basic” types have been covered, it is now time to discuss the case of reference type parameters. Let’s take the following simple class as an example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 public class ClassType { public ClassType(int val) { intField = val; stringField = (val + 1).ToString(); IntProperty = val * 2; } public int IntProperty { get; set; } public int intField; public string stringField; } On purpose, one property and two fields are defined. The C# compiler translates the automatic property syntax into a backing field to store the value\nAnd the corresponding get/set accessors pair:\nSo when an instance of this class is passed as a parameter to the ClassParamReturnClass(ClassType obj) method, you should be able to list these three fields and access their value to build the following output:\n--\u0026gt; ClassType ClassParamReturnClass (ClassType obj) | int32 \u0026lt;IntProperty\u0026gt;k__BackingField = 84 | int32 intField = 42 | String stringField = 43 ClassType obj = 0x00000276D0A98E78 When you read Sergey Tepliakov in Managed object internals, Part 4. Fields layout, it sounds quite hard to achieve due to the complicated padding rules dictating where each field is stored in memory. Hopefully, the profiling API will help you a lot with a three steps process:\nGet the offset of each field value Get the name of each field Get the type of each field and then compute the value using the offset First, you need the ClassID corresponding to the type of the reference you want to dump and we have seen that ICorProfilerInfo::GetClassFromObject is perfect for that. Then, pass it to ICorProfilerInfo2::GetClassLayout to get the number of fields and their offset within an instance. This API expects you to call it once to get the number of fields and a second time to get the offset that are stored in a buffer you allocate after the first call.\n1 2 3 4 5 6 7 8 9 ULONG fieldCount = 0; hr = _pProfilerInfo-\u0026gt;GetClassLayout(classID, NULL, 0, \u0026amp;fieldCount, NULL); // no field to dump if (fieldCount == 0) break; COR_FIELD_OFFSET* pFieldOffsets = new COR_FIELD_OFFSET[fieldCount]; hr = _pProfilerInfo-\u0026gt;GetClassLayout(classID, pFieldOffsets, fieldCount, \u0026amp;fieldCount, NULL); The COR_FIELD_OFFSET structure has a confusing name because it contains more than just the offset:\n1 2 3 4 typedef struct COR_FIELD_OFFSET { mdFieldDef ridOfField; ULONG ulOffset; } COR_FIELD_OFFSET; The ridOfField part gives you the metadata token corresponding to a field as shown in ILSpy:\nIt will allow you to get its name and the usual binary signature for its type via IMetaDataImport::GetFieldProps.\nSo you first need to get the IMetaDataImport implementation for the class module:\n1 2 3 4 5 6 IMetaDataImport* pMetaDataImport = NULL; ModuleID moduleID = NULL; mdTypeDef typeDefToken = NULL; WCHAR name[256]; hr = _pProfilerInfo-\u0026gt;GetClassIDInfo(classID, \u0026amp;moduleID, \u0026amp;typeDefToken); hr = _pProfilerInfo-\u0026gt;GetModuleMetaData(moduleID, ofRead, IID_IMetaDataImport, (IUnknown**)\u0026amp;pMetaDataImport); It is now time to iterate on each field to get its name, type and value for the given object:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 char value[512]; char buffer[2 * 260]; for (ULONG fieldIndex = 0; fieldIndex \u0026lt; fieldCount; fieldIndex++) { PCCOR_SIGNATURE pSigBlob; ULONG sigBlobSize; name[0] = L\u0026#39;\\0\u0026#39;; hr = pMetaDataImport-\u0026gt;GetFieldProps(pFieldOffsets[fieldIndex].ridOfField, NULL, name, ARRAY_LEN(name), NULL, NULL, \u0026amp;pSigBlob, \u0026amp;sigBlobSize, NULL, NULL, NULL); if (SUCCEEDED(hr)) { // skip the \u0026#34;calling convention\u0026#34; that should correspond to a \u0026#39;field\u0026#39; ULONG callingConvention = *pSigBlob++; assert(callingConvention == IMAGE_CEE_CS_CALLCONV_FIELD); buffer[0] = \u0026#39;\\0\u0026#39;; pSigBlob = ParseElementType(pMetaDataImport, pSigBlob, NULL, NULL, \u0026amp;elementType, buffer, ARRAY_LEN(buffer) - 1); // get its value from pFieldOffsets[fieldIndex].ulOffset value[0] = \u0026#39;\\0\u0026#39;; GetObjectValue((UINT_PTR)(managedReference + pFieldOffsets[fieldIndex].ulOffset), length, elementType, elementToken, moduleId, value, ARRAY_LEN(value) - 1); } } delete [] pFieldOffsets; pMetaDataImport-\u0026gt;Release(); The only thing that differs from the blob signature parsing you have seen earlier is that it starts with a “calling convention” (yes I know that we are talking about field and not method!) equal to IMAGE_CEE_CS_CALLCONV_FIELD.\nThe field value is stored in memory at ulOffset bytes after the address pointed to by the given managed reference.\nThe next episode will describe how to dump value type instances, the return values and exceptions handling: a good way to start 2022!\nReferences Arrays and the CLR — a Very Special Relationship by Matt Warren Managed object internals, Part 3. The layout of a managed array by Sergey Tepliakov Managed object internals, Part 4. Fields layout by Sergey Tepliakov Episode 1: Start a journey into the .NET Profiling APIs Episode 2: Dealing with Modules, Assemblies and Types with CLR profiling API Episode 3: Decyphering methods signature with .NET Profiling APIs Episode 4: Reading parameters value with the .NET Profiling APIs ","cover":"https://chrisnas.github.io/posts/2021-12-18_accessing-arrays-and-class/1_gVi3qmL-iMZpQR2a89SkHQ.png","date":"2021-12-18","permalink":"https://chrisnas.github.io/posts/2021-12-18_accessing-arrays-and-class/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAfter getting basic and strings parameters, it is time to look at arrays and reference types.\u003c/p\u003e\n\u003ch2 id=\"accessing-managedarrays\"\u003eAccessing managed arrays\u003c/h2\u003e\n\u003cp\u003eYou check against null array parameter the same way as for string:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cdiv class=\"chroma\"\u003e\n\u003ctable class=\"lntable\"\u003e\u003ctr\u003e\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode\u003e\u003cspan class=\"lnt\"\u003e 1\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 2\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 3\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 4\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 5\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 6\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 7\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 8\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e 9\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e10\n\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\n\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-csharp\" data-lang=\"csharp\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003ecase\u003c/span\u003e \u003cspan class=\"n\"\u003eELEMENT_TYPE_SZARRAY\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"c1\"\u003e// look at the reference stored at the given address\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"n\"\u003eunsigned\u003c/span\u003e \u003cspan class=\"n\"\u003e__int64\u003c/span\u003e\u003cspan class=\"p\"\u003e*\u003c/span\u003e \u003cspan class=\"n\"\u003epAddress\u003c/span\u003e \u003cspan class=\"p\"\u003e=\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eunsigned\u003c/span\u003e \u003cspan class=\"n\"\u003e__int64\u003c/span\u003e\u003cspan class=\"p\"\u003e*)\u003c/span\u003e\u003cspan class=\"n\"\u003eaddress\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"kt\"\u003ebyte\u003c/span\u003e\u003cspan class=\"p\"\u003e*\u003c/span\u003e \u003cspan class=\"n\"\u003emanagedReference\u003c/span\u003e \u003cspan class=\"p\"\u003e=\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"kt\"\u003ebyte\u003c/span\u003e\u003cspan class=\"p\"\u003e*)(*\u003c/span\u003e\u003cspan class=\"n\"\u003epAddress\u003c/span\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"k\"\u003eif\u003c/span\u003e \u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003emanagedReference\u003c/span\u003e \u003cspan class=\"p\"\u003e==\u003c/span\u003e \u003cspan class=\"n\"\u003eNULL\u003c/span\u003e\u003cspan class=\"p\"\u003e)\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"p\"\u003e{\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e      \u003cspan class=\"n\"\u003estrcpy_s\u003c/span\u003e\u003cspan class=\"p\"\u003e(\u003c/span\u003e\u003cspan class=\"k\"\u003evalue\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003echarCount\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e \u003cspan class=\"s\"\u003e\u0026#34;null array\u0026#34;\u003c/span\u003e\u003cspan class=\"p\"\u003e);\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e      \u003cspan class=\"k\"\u003ebreak\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e   \u003cspan class=\"p\"\u003e}\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\u003cp\u003eThe \u003cstrong\u003eELEMENT_TYPE_SZARRAY\u003c/strong\u003e applies to single dimension arrays including jagged arrays. \u003cstrong\u003eELEMENT_TYPE_ARRAY\u003c/strong\u003e is used for matrice :\u003c/p\u003e","title":"Accessing arrays and class fields with .NET profiling APIs"},{"content":" Introduction From the list of arguments with their type, it becomes possible to figure out their value when a method gets called. The rest of this post describes how to access method call parameters and get the value of numbers and strings.\nWhere are my parameters? When you pass COR_PRF_ENABLE_FUNCTION_ARGS to ICorProfilerInfo::SetEventMask, the runtime prepares a COR_PRF_FUNCTION_ARGUMENT_INFO structure before your enter callback is called:\n1 2 3 4 5 typedef struct _COR_PRF_FUNCTION_ARGUMENT_INFO { ULONG numRanges; ULONG totalArgumentSize; COR_PRF_FUNCTION_ARGUMENT_RANGE ranges[1]; } COR_PRF_FUNCTION_ARGUMENT_INFO; I have to admit that the Microsoft Docs did not really help me to figure out what is the meaning of each field of this structure because the word “range” is very confusing here…\nBased on my experiments, numRanges gives you the number of parameters; including the implicit this parameter in case of a non-static method. It is different from the signature that we have already parsed from the metadata where this is not mentioned. The ranges fields is an array of COR_PRF_FUNCTION_ARGUMENT_RANGE ; one per parameter:\n1 2 3 4 typedef struct _COR_PRF_FUNCTION_ARGUMENT_RANGE { UINT_PTR startAddress; ULONG length; } COR_PRF_FUNCTION_ARGUMENT_RANGE; The startAddress points to where the parameter value is stored in memory.\nHowever, in addition to the FunctionID, you only receive a COR_PRF_ELT_INFO in your enter callback. You need to call ICorProfilerInfo3:: GetFunctionEnter3Info to get the corresponding COR_PRF_FUNCTION_ARGUMENT_INFO you are interested in. As often with COM, you need to call a first time to get the size of the buffer to allocate and a second time to fill it up:\n1 2 3 4 5 6 ULONG argumentInfoSize = 0; COR_PRF_FRAME_INFO frameInfo; _pInfo-\u0026gt;GetFunctionEnter3Info(functionId, eltInfo, \u0026amp;frameInfo, \u0026amp;argumentInfoSize, NULL); byte* pBuffer = new byte[argumentInfoSize]; _pInfo-\u0026gt;GetFunctionEnter3Info(functionId, eltInfo, \u0026amp;frameInfo, \u0026amp;argumentInfoSize, (COR_PRF_FUNCTION_ARGUMENT_INFO*)pBuffer); COR_PRF_FUNCTION_ARGUMENT_INFO* pArgumentInfo = (COR_PRF_FUNCTION_ARGUMENT_INFO*)pBuffer; Before iterating on the parameters, you need to deal with non-static method and their implicit this parameter stored in pArgumentInfo-\u0026gt;ranges[0]:\n1 2 3 4 5 6 7 8 ULONG hiddenThisParameterIndexOffset = 0; if (!pSignature-\u0026gt;IsStatic) { hiddenThisParameterIndexOffset++; // deal with the \u0026#34;this\u0026#34; hidden parameter for non static method // ex: show its address (i.e. pArgumentInfo-\u0026gt;ranges[0].startAddress) } Next, write a loop to iterate on each parameter based on the parameter count obtained previously from the metadata:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 char value[128]; for (ULONG currentParameterInSignature = 0; currentParameterInSignature \u0026lt; parameterCount; currentParameterInSignature++) { // Note: pParameter contains detail extracted from the metadata signature UINT_PTR pStartValue = pArgumentInfo-\u0026gt;ranges[currentParameterInSignature + hiddenThisParameterIndexOffset].startAddress; ULONG length = pArgumentInfo-\u0026gt;ranges[currentParameterInSignature + hiddenThisParameterIndexOffset].length; if (IsPdOut(pParameter-\u0026gt;Attributes)) { // if [out] parameter, nothing to get from it } else { value[0] = \u0026#39;\\0\u0026#39;; // call a helper function to extract the value of the parameter // a string from its address and type pHelpers-\u0026gt;GetObjectValue(pStartValue, length, pParameter-\u0026gt;ElementType , pParameter-\u0026gt;TypeToken, pSignature-\u0026gt;ModuleId, value, ARRAY_LEN(value) - 1); } } Simple type parameters case The GetObjectValue() helper function looks like the following:\n1 2 3 4 5 6 7 8 9 10 11 12 void CorProfilerHelpers::GetObjectValue(UINT_PTR address, ULONG length, ULONG elementType, mdToken elementTypeToken, ModuleID moduleId, char* value, ULONG charCount) { ULONG numberValue; strcpy_s(value, charCount, \u0026#34;???\u0026#34;); switch(elementType) { ... default: sprintf_s(value, charCount, \u0026#34;unknown type 0x%x\u0026#34;, elementType); break; } The way to get the value of a parameter really depends on its type. I know that a length is provided by the COR_PRF_FUNCTION_ARGUMENT_INFO structure but I did not used it except for sanity check.\nThe value for simple types are easy to compute because they are mostly stored at the given address :\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 case ELEMENT_TYPE_BOOLEAN: { bool* pBool = (bool*)address; if (*pBool) strcpy_s(value, charCount, \u0026#34;true\u0026#34;); else strcpy_s(value, charCount, \u0026#34;false\u0026#34;); } break; case ELEMENT_TYPE_CHAR: { WCHAR* pChar = (WCHAR*)address; sprintf_s(value, charCount, \u0026#34;%C\u0026#34;, *pChar); } break; case ELEMENT_TYPE_I1: // int8 { char* pNumber = (char*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; case ELEMENT_TYPE_U1: // unsigned int8 { unsigned char* pNumber = (unsigned char*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; case ELEMENT_TYPE_I2: // int16 { short int* pNumber = (short int*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; case ELEMENT_TYPE_U2: // unsigned int16 { short unsigned int* pNumber = (short unsigned int*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; case ELEMENT_TYPE_I4: // int32 { __int32* pNumber = (__int32*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; case ELEMENT_TYPE_U4: // unsigned int32 { unsigned __int32* pNumber = (unsigned __int32*)address; numberValue = *pNumber; sprintf_s(value, charCount, \u0026#34;%d\u0026#34;, numberValue); } break; // NOTE: %lld might not work on linux case ELEMENT_TYPE_I8: // int64 { __int64* pNumber = (__int64*)address; sprintf_s(value, charCount, \u0026#34;%lld\u0026#34;, *pNumber); } break; case ELEMENT_TYPE_U8: { // unsigned int64 unsigned __int64* pNumber = (unsigned __int64*)address; sprintf_s(value, charCount, \u0026#34;%lld\u0026#34;, *pNumber); } break; And guess what? It is the same for float and double because it is stored by the CLR the same way as in C++: case ELEMENT_TYPE_R4: { float* pFloat = (float*)address; sprintf_s(value, charCount, \u0026#34;%f\u0026#34;, *pFloat); } break; case ELEMENT_TYPE_R8: { double* pDouble = (double*)address; sprintf_s(value, charCount, \u0026#34;%g\u0026#34;, *pDouble); } break; The other types require more knowledge about how the CLR stores their value.\nThe string case This is the first reference type we meet and, as for all reference types, the given address points to the memory where the reference (i.e. address of the object in the managed heap) is stored. It allows you to check against null parameter before looking at the “real” managed reference:\n1 2 3 4 5 6 7 8 9 10 11 12 case ELEMENT_TYPE_STRING: { // look at the reference stored at the given address unsigned __int64* pAddress = (unsigned __int64*)address; byte* managedReference = (byte*)(*pAddress); // easily check for null string if (managedReference == NULL) { strcpy_s(value, charCount, \u0026#34;null string\u0026#34;); break; } At that point, you need to know how an instance of a reference type instance is stored by the CLR in the managed heap. Hopefully, Sergey Tepliakov, a software engineer at Microsoft, has provided a lot of details about that, especially where does the address stored by a managed reference point to:\nIt means that you have to skip the Method Table pointer pointed to by the address you have. This applies to any reference types instance!\nBut for our string current case, you still need to know how a string is stored (i.e. its length followed by the buffer of UTF16 characters). I recommend that you read the post from Matt Warren about the subject because it also covers a lot of interesting details related to string implementation. However, you should simply rely on the implementation details provided by the CLR via ICorProfilerInfo3:: GetStringLayout2:\n1 2 3 ULONG _stringLengthOffset; ULONG _stringBufferOffset; hr = _pProfilerInfo-\u0026gt;GetStringLayout2(\u0026amp;_stringLengthOffset, \u0026amp;_stringBufferOffset); These two variables give you the offsets to use to access both the string size and the beginning of the array of WCHAR storing each character.\n1 2 3 4 5 6 7 8 9 // ----------------------------------------------------------------- // | MethodTable address | string length | buffer // ----------------------------------------------------------------- // 64 8 bytes 4 bytes (length+1) x WCHAR // off 8 12 // ----------------------------------------------------------------- // 32 4 bytes 4 bytes (length+1) x WCHAR // off 4 8 // ----------------------------------------------------------------- As shown in this table, you need to skip 8/4 bytes to read the length. It is just the confirmation that you need to jump over the address of the Method Table stored as a 64/32 bit value (i.e. an address in x64/x86). The length itself is stored as a 32 bit number (4 bytes) both in x64 an x86. So the array containing the consecutive UTF16 characters just follows (i.e. its offset from the reference address is 12/8 bytes). For example, here is what you get in Visual Studio Memory panel with the reference to the 3 characters “CLR” string for a 64 bit application:\nWith this information in hand, it is easy to detect empty strings or copy the UNICODE string into a simple char* buffer:\n1 2 3 4 5 6 7 8 9 10 11 byte* pLength = GetPointerAfterNBytes(managedReference, _stringLengthOffset); ULONG stringLength = *pLength; if (stringLength == 0) { strcpy_s(value, charCount, \u0026#34;empty string\u0026#34;); break; } byte* pBuffer = GetPointerAfterNBytes(managedReference, _stringBufferOffset); // V-- to copy the trailing \\0 ::WideCharToMultiByte(CP_ACP, 0, (WCHAR*)pBuffer, stringLength + 1, value, charCount, NULL, NULL); The GetPointerAfterNBytes function simply helps me dealing with pointer arithmetic in C++\n1 2 3 4 byte* GetPointerAfterNBytes(void* pAddress, ULONG byteCount) { return ((byte*)pAddress) + byteCount; } The next post will describe how to get the value of array parameters and the basics of extracting fields value from a reference type instance.\nReferences Sample: A Signature Blob Parser for your Profiler Strings and the CLR — a Special Relationship by Matt Warren Episode 1: Start a journey into the .NET Profiling APIs Episode 2: Dealing with Modules, Assemblies and Types with CLR profiling API Episode 3: Decyphering methods signature with .NET Profiling APIs ","cover":"https://chrisnas.github.io/posts/2021-11-16_reading-parameters-value-with/1_H5QzJWaMyhx13cA8bFbV2Q.png","date":"2021-11-16","permalink":"https://chrisnas.github.io/posts/2021-11-16_reading-parameters-value-with/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eFrom \u003ca href=\"/posts/2021-10-12_decyphering-method-signature-w/\"\u003ethe list of arguments with their type\u003c/a\u003e, it becomes possible to figure out their value when a method gets called. The rest of this post describes how to access method call parameters and get the value of numbers and strings.\u003c/p\u003e\n\u003ch2 id=\"where-are-my-parameters\"\u003eWhere are my parameters?\u003c/h2\u003e\n\u003cp\u003eWhen you pass \u003cstrong\u003eCOR_PRF_ENABLE_FUNCTION_ARGS\u003c/strong\u003e to \u003cstrong\u003eICorProfilerInfo::SetEventMask\u003c/strong\u003e, the runtime prepares a \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/cor-prf-function-argument-info-structure?WT.mc_id=DT-MVP-5003325\"\u003e\u003cstrong\u003eCOR_PRF_FUNCTION_ARGUMENT_INFO\u003c/strong\u003e\u003c/a\u003e structure before your enter callback is called:\u003c/p\u003e\n\u003cdiv class=\"highlight\"\u003e\u003cdiv class=\"chroma\"\u003e\n\u003ctable class=\"lntable\"\u003e\u003ctr\u003e\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode\u003e\u003cspan class=\"lnt\"\u003e1\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e2\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e3\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e4\n\u003c/span\u003e\u003cspan class=\"lnt\"\u003e5\n\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\n\u003ctd class=\"lntd\"\u003e\n\u003cpre tabindex=\"0\" class=\"chroma\"\u003e\u003ccode class=\"language-c\" data-lang=\"c\"\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"k\"\u003etypedef\u003c/span\u003e \u003cspan class=\"k\"\u003estruct\u003c/span\u003e \u003cspan class=\"n\"\u003e_COR_PRF_FUNCTION_ARGUMENT_INFO\u003c/span\u003e \u003cspan class=\"p\"\u003e{\u003c/span\u003e  \n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eULONG\u003c/span\u003e \u003cspan class=\"n\"\u003enumRanges\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e  \n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eULONG\u003c/span\u003e \u003cspan class=\"n\"\u003etotalArgumentSize\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e  \n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e    \u003cspan class=\"n\"\u003eCOR_PRF_FUNCTION_ARGUMENT_RANGE\u003c/span\u003e \u003cspan class=\"n\"\u003eranges\u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"p\"\u003e];\u003c/span\u003e  \n\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"line\"\u003e\u003cspan class=\"cl\"\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e \u003cspan class=\"n\"\u003eCOR_PRF_FUNCTION_ARGUMENT_INFO\u003c/span\u003e\u003cspan class=\"p\"\u003e;\u003c/span\u003e\n\u003c/span\u003e\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\u003cp\u003eI have to admit that the Microsoft Docs did not really help me to figure out what is the meaning of each field of this structure because the word “range” is very confusing here…\u003c/p\u003e","title":"Reading parameters value with the .NET Profiling APIs"},{"content":" Introduction After introducing the CLR profiling API by tracing managed methods calls, then dealing with assemblies and types, it is time to look at methods signatures. Remember that the starting point is the FunctionID received by the Enter callback each time a method is executed.\nThe question answered by this post is how to build the signature of the method given a FunctionID.\nA method signature is built from its return value (or void), its name and a list of parameters. All these details are stored in the module metadata generated by the C# compiler. So the first step is to get the metadata token corresponding to a FunctionID thanks to ICorProfilerInfo::GetFunctionInfo:\n1 2 mdToken token; HRESULT hr = pInfo-\u0026gt;GetFunctionInfo(functionId, \u0026amp;classId, \u0026amp;moduleId, \u0026amp;token); Next, use the IMetaDataImport corresponding to the module to call GetMethodProps and pass the function metadata token:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 mdTypeDef type; WCHAR name[260]; ULONG size; ULONG attributes; PCCOR_SIGNATURE pSig; ULONG blobSize; ULONG codeRva; DWORD flags; hr = pMetaDataImport-\u0026gt;GetMethodProps( token, \u0026amp;type, name, ARRAY_LEN(name) - 1, \u0026amp;size, \u0026amp;attributes, \u0026amp;pSig, \u0026amp;blobSize, \u0026amp;codeRva, \u0026amp;flags); In addition to the function name, you will be able to check the attributes to figure out if the function is static or not: if ((attributes \u0026amp; mdStatic) == mdStatic) { oss \u0026lt;\u0026lt; \u0026#34; static \u0026#34;; } The return type and parameters type of the function are encoded in a binary format defined in the ECMA-335 specification. This binary blob is pointed to by the pSig parameter. Hopefully, you don’t have to implement a blob signature parser yourself. This has been done my Rico Mariani or Peter Sollich and it relies on low level helpers from cor.h such as CorSigUncompressData.\nHere is an example of a signature blob for a non-static method returning void and accepting a float and a double as parameters:\nThe well-known types are encoded and available as ELEMENT_TYPE_xxx constants from corhdr.h.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ELEMENT_TYPE_VOID, \u0026#34;Void\u0026#34;, ELEMENT_TYPE_BOOLEAN, \u0026#34;Boolean\u0026#34;, ELEMENT_TYPE_CHAR, \u0026#34;Char\u0026#34;, ELEMENT_TYPE_I1, \u0026#34;SByte\u0026#34;, ELEMENT_TYPE_U1, \u0026#34;Byte\u0026#34;, ELEMENT_TYPE_I2, \u0026#34;Int16\u0026#34;, ELEMENT_TYPE_U2, \u0026#34;UInt16\u0026#34;, ELEMENT_TYPE_I4, \u0026#34;Int32\u0026#34;, ELEMENT_TYPE_U4, \u0026#34;UInt32\u0026#34;, ELEMENT_TYPE_I8, \u0026#34;Int64\u0026#34;, ELEMENT_TYPE_U8, \u0026#34;UInt64\u0026#34;, ELEMENT_TYPE_R4, \u0026#34;Single\u0026#34;, ELEMENT_TYPE_R8, \u0026#34;Double\u0026#34;, ELEMENT_TYPE_STRING, \u0026#34;String\u0026#34;, ELEMENT_TYPE_I, \u0026#34;IntPtr\u0026#34;, ELEMENT_TYPE_U, \u0026#34;UIntPtr\u0026#34;, ELEMENT_TYPE_OBJECT, \u0026#34;Object\u0026#34;, For custom types identified as ELEMENT_TYPE_CLASS for reference types or ELEMENT_TYPE_VALUETYPE for value types, the metadata token of the type is “compressed” as part of the signature (see CorSigUncompressToken in cor.h for implementation details). If the type is defined in the same assembly as the method, you get a TypeDef token (starting with 02) used to call IMetaDataImport::GetTypeDefProps. If not, it will be a TypeRef token (starting with 01) used to call IMetaDataImport::GetTypeRefProps.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 case ELEMENT_TYPE_CLASS: case ELEMENT_TYPE_VALUETYPE: { mdToken token; char classname[260]; classname[0] = \u0026#39;\\0\u0026#39;; signature += CorSigUncompressToken(signature, \u0026amp;token); if (typeToken != NULL) { *typeToken = token; } HRESULT hr; WCHAR zName[260]; if (TypeFromToken(token) == mdtTypeRef) { mdToken resScope; ULONG length; hr = pMDImport-\u0026gt;GetTypeRefProps(token, \u0026amp;resScope, zName, 260, \u0026amp;length); } else { hr = pMDImport-\u0026gt;GetTypeDefProps(token, zName, 260, NULL, NULL, NULL); } … More on typeRef and typeDef later.\nThis is nice but since the parameter name is not encoded in the signature blob, you have to work more to get it. First, you have to call IMetaDataImport::EnumParams, to get the metadata token mdParamDef for each parameter:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 HCORENUM hEnum = 0; ULONG paramCount; mdParamDef* paramDefs = new mdParamDef[argCount]; pMetaData-\u0026gt;EnumParams(\u0026amp;hEnum, token, paramDefs, argCount, \u0026amp;paramCount); pMetaData-\u0026gt;CloseEnum(hEnum); The next step is to iterate on the array of metadata parameter definition and call IMetaDataImport::GetParamProps to get its name, if it is a value type and the ParseElementType helper extracts the type from the signature blob: ULONG pos; WCHAR name[260]; ULONG length; ULONG attributes; // values from CorParamAttr in CorHdr.h DWORD bIsValueType; for (ULONG i = 0; (SUCCEEDED(hr) \u0026amp;\u0026amp; (pSigBlob != NULL) \u0026amp;\u0026amp; (i \u0026lt; (argCount))); i++) { // get the parameter name hr = pMetaData-\u0026gt;GetParamProps(paramDefs[i], NULL, \u0026amp;pos, name, ARRAY_LEN(name)-1, \u0026amp;length, \u0026amp;attributes, \u0026amp;bIsValueType, NULL, NULL); // note that we need to convert from WCHAR* to char* for the name // get the parameter type buffer[0] = \u0026#39;\\0\u0026#39;; pSigBlob = ParseElementType(pMetaData, pSigBlob, classTypeArgs, methodTypeArgs, \u0026amp;elementType, buffer, ARRAY_LEN(buffer) - 1); } Generic methods have more complicated signatures to compute The following figure shows how to handle generic methods:\nThe main change for such a generic method is that, in the signature blob, you will get the number of generic arguments just after the total parameters count and the first “calling convention” data will be IMAGE_CEE_CS_CALLCONV_GENERIC. The other difference is how to deal with generic parameters in the blob that will all share the same ELEMENT_TYPE_MVAR value followed by a position (starting from 0). This is the position in the array returned by ICorProfilerInfo2::GetFunctionInfo2 for the ClassID.\nThe final code should look like the following:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 DWORD bIsValueType; ULONG currentGenericParam = 0; for (ULONG i = 0; (SUCCEEDED(hr) \u0026amp;\u0026amp; (pSigBlob != NULL) \u0026amp;\u0026amp; (i \u0026lt; (argCount))); i++) { // get the parameter name hr = pMetaData-\u0026gt;GetParamProps(paramDefs[i], NULL, \u0026amp;pos, name, ARRAY_LEN(name)-1, \u0026amp;length, \u0026amp;attributes, \u0026amp;bIsValueType, NULL, NULL); // note that we need to convert from WCHAR* to char* for the name // get the parameter type buffer[0] = \u0026#39;\\0\u0026#39;; // in case of generic function, get the type details from the runtime and not from the metadata // !! we don\u0026#39;t know in advance which parameter is a generic parameter and this is given by elementType == ELEMENT_TYPE_MVAR pSigBlob = ParseElementType(pMetaData, pSigBlob, classTypeArgs, methodTypeArgs, \u0026amp;elementType, buffer, ARRAY_LEN(buffer) - 1); if ((methodTypeArgs != NULL) \u0026amp;\u0026amp; (elementType == ELEMENT_TYPE_MVAR)) { ModuleID moduleId; mdTypeDef mdType; hr = pInfo-\u0026gt;GetClassIDInfo2(methodTypeArgs[currentGenericParam], \u0026amp;moduleId, \u0026amp;mdType, NULL, 0, NULL, NULL); if (SUCCEEDED(hr)) { WCHAR paramTypeName[260]; IMetaDataImport2* pImport2; hr = pInfo-\u0026gt;GetModuleMetaData(moduleId, ofRead, IID_IMetaDataImport2, reinterpret_cast\u0026lt;IUnknown**\u0026gt;(\u0026amp;pImport2)); ULONG sigBlobLen = 0; // NOTE: get elementType from type name because the metadata can\u0026#39;t give us the instanciated generic in the blob signature hr = pImport2-\u0026gt;GetTypeDefProps(mdType, paramTypeName, ARRAY_LEN(paramTypeName)-1, 0, NULL, NULL); pImport2-\u0026gt;Release(); } currentGenericParam++; } } One last detail before looking for the parameters value in the next post: for ALL reference types, the same implementation is generated by the JIT compiler. If you think about it, there is no need to implement a List in a different way than a List of any other reference type: they all deal with references (i.e. addresses). The name picked by the CLR team to identify this “generic reference type” is System.__Canon that stands for “canonical”. So expect to receive that type name a lot!\nReferences Sample: A Signature Blob Parser for your Profiler Episode 1: Start a journey into the .NET Profiling APIs Episode 2: Dealing with Modules, Assemblies and Types with CLR profiling API ","cover":"https://chrisnas.github.io/posts/2021-10-12_decyphering-method-signature-w/1_ZjGy-P8JWGn9aw_5khX2Pw.png","date":"2021-10-12","permalink":"https://chrisnas.github.io/posts/2021-10-12_decyphering-method-signature-w/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAfter \u003ca href=\"/posts/2021-08-07_start-journey-into-the/\"\u003eintroducing\u003c/a\u003e the CLR profiling API by tracing managed methods calls, then \u003ca href=\"/posts/2021-09-06_dealing-with-modules-assemblie/\"\u003edealing with assemblies and types\u003c/a\u003e, it is time to look at methods signatures. Remember that the starting point is the \u003cstrong\u003eFunctionID\u003c/strong\u003e received by the \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/functionenter2-function?WT.mc_id=DT-MVP-5003325\"\u003e\u003cstrong\u003eEnter\u003c/strong\u003e\u003c/a\u003e callback each time a method is executed.\u003c/p\u003e\n\u003cp\u003eThe question answered by this post is how to build the signature of the method given a \u003cstrong\u003eFunctionID\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eA method signature is built from its return value (or void), its name and a list of parameters. All these details are stored in the module metadata generated by the C# compiler. So the first step is to get the metadata token corresponding to a \u003cstrong\u003eFunctionID\u003c/strong\u003e thanks to \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo-getfunctioninfo-method?WT.mc_id=DT-MVP-5003325\"\u003e\u003cstrong\u003eICorProfilerInfo::GetFunctionInfo\u003c/strong\u003e\u003c/a\u003e:\u003c/p\u003e","title":"Decyphering methods signature with .NET profiling APIs"},{"content":" Introduction In the first post of this series dedicated to CLR Profiling API, you have seen how to get a FunctionID each time a managed method is executed in a .NET application. As David Broman (source of most of the profiling implementation details at Microsoft) explains, a FunctionID is a pointer to an internal data structure of the CLR called a MethodDesc. For us, it is just an opaque value that is usable in different CLR APIs. So what if you would like to know the name of the method behind this FunctionID?\nUnlike what you might think, this first question is not an easy one, especially if you would like to get the complete signature of the method such as what you get in Visual Studio Call Stack panel:\nProfilingTest.dll!PublicClass.ClassParamReturnClass(ClassType obj)\nYou will have to get the module name (i.e. the assembly where the method type is defined), the type name, the method name and the list of its parameters type and name.\nThis post deals with the notions of module, assembly and type in addition to introducing the .NET Metadata API.\nIdentifying the module and assembly I’m sure that most of you know what an assembly is: this is what gets generated when you compile a Class Library in Visual Studio. Easy answer. However, .NET (unlike Visual Studio) supports the notion of multi-module assembly creation bound to several “modules”. Each module can contain types and resources and the assembly contains the manifest listing all the modules defining the assembly.\nThis is why the profiling API allows you to get both assembly and module. Let’s use ICorProfilerInfo::GetFunctionInfo to find out which module and assembly is implementing the type of a given FunctionID.\n1 2 3 4 5 ClassID classId; ModuleID moduleId; mdToken mdtokenFunction; pInfo-\u0026gt;GetFunctionInfo(functionId, \u0026amp;classId, \u0026amp;moduleId, \u0026amp;mdtokenFunction); Now that you have a ModuleID, you can call ICorProfilerInfo::GetModuleInfo to get its name, load address and assembly. The usage pattern of this API is common in COM: first you call it to get the size of the buffer to copy the name and then you call it a second time with the newly allocated buffer:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 LPCBYTE loadAddress; ULONG nameLen = 0; AssemblyID assemblyId; hr = pInfo-\u0026gt;GetModuleInfo(moduleId, \u0026amp;loadAddress, nameLen, \u0026amp;nameLen, NULL, \u0026amp;assemblyId); if (SUCCEEDED(hr)) { WCHAR* pszName = new WCHAR[nameLen]; // count the trailing \\0 pInfo-\u0026gt;GetModuleInfo(moduleId, \u0026amp;loadAddress, nameLen, \u0026amp;nameLen, pszName, \u0026amp;assemblyId); oss \u0026lt;\u0026lt; L\u0026#34;(\u0026#34; \u0026lt;\u0026lt; pszName \u0026lt;\u0026lt; L\u0026#34;)\u0026#34;; delete [] pszName; } else oss \u0026lt;\u0026lt; L\u0026#34;(UNKNOWN)\u0026#34;; Note that the module name is the full path name of the module file.\nHere is the code that calls ICorProfilerInfo::GetAssemblyInfo to get the assembly name now that you have the AssemblyID:\n1 2 3 4 5 6 7 8 9 10 hr = pInfo-\u0026gt;GetAssemblyInfo(assemblyId, 0, \u0026amp;nameLen, NULL, NULL, NULL); if (SUCCEEDED(hr)) { WCHAR* pszName = new WCHAR[nameLen]; // count the trailing \\0 hr = pInfo-\u0026gt;GetAssemblyInfo(assemblyId, nameLen, \u0026amp;nameLen, pszName, NULL, NULL); oss \u0026lt;\u0026lt; pszName; delete [] pszName; } else oss \u0026lt;\u0026lt; L\u0026#34;\u0026lt;UNKNOWN\u0026gt;\u0026#34;; The assembly name does not contain the file extension such as .dll or .so.\nID or Token: it depends on which profiling API to use it is important to discuss what kind of information you get from the different profiling APIs. Like FunctionID, ClassID and ModuleID are opaque pointers to CLR internal data structures. They are used by the runtime to map into memory metadata generated by the compiler. The metadata identifiers are usually referenced as “token” and the mdToken type simply stands for “metadata token”. Unlike the different xxxID types with values different each time the code runs, the metadata tokens stay the same because they come from the compiled assembly. While debugging, it is good to be able to compare what token you get against their corresponding value in an assembly. As an example, here is what you get with ILSpy while browsing the medatata:\nEach kind of metadata is encoded into the first 2 digits so it is easy to see what you are manipulating. The 06 prefix tells you that you are dealing with a method:\nInstead of ICorProfilerInfo, you need to use IMetaDataImport to access information behind the metadata tokens. Since the metadata is bound to a given module, you have to call ICorProfilerInfo:: GetModuleMetaData to get the implementation corresponding to a given ModuleID.\n1 2 3 IMetaDataImport* pMetaDataImport; HRESULT hr = pInfo-\u0026gt;GetModuleMetaData( moduleId, ofRead, IID_IMetaDataImport, reinterpret_cast\u0026lt;IUnknown**\u0026gt;(\u0026amp;pMetaDataImport)); For the rest of the series, I will do my best to present which profiling/metadata API to use for what purpose. And in some cases, you will need both.\nIdentifying the type After the module details, let’s see what we can get for the type that implements a given FunctionID. For one of my test, I defined the following C# generic type:\n1 2 3 4 5 6 7 8 public class GenericPublicClass\u0026lt;K, V\u0026gt; { [MethodImpl(MethodImplOptions.NoInlining)] public string Store(K key, IEnumerable\u0026lt;V\u0026gt; val) { return $\u0026#34;{key} = {val.Count()} items\u0026#34;; } } It can be used like the following:\n1 2 var g1 = new GenericPublicClass\u0026lt;string, int\u0026gt;(); Console.WriteLine(g1.Store(\u0026#34;secret\u0026#34;, new int[] { 1, 2, 3, 4, 5 })); Why am I starting with a generic type? That way, you will better understand that this feature has been added after the initial profiling API shipped and is not that well integrated. Basically, the first iteration of ICorProfilerInfo did not deal with generics but the second one ICorProfilerInfo2 does.\nBut first, let’s summarize a few basics about generics. When you define a generic type and generic methods such as for my GenericPublicClass, the C# compiler generates the metadata for the generic type definition that acts as a template. The generic type parameters (K and V in my case) are placeholders that will be instanciated by generic type arguments to get a final generic constructed type.\nThe important part to understand for our purpose is the fact that metadata will only contain generic type definitions\nThe name stored in the metadata ends with the ` character followed by the number of generic type parameters. This is what you get when you call GetType().Name on a generic instance in C#.\nAs shown earlier, ICorProfilerInfo::GetFunctionInfo is used to get the ClassID of the type implementing the given FunctionID. Unfortunately, in case of a generic type, it returns S_OK but the ClassID you get is 0. In that case, you know you have to call ICorProfilerInfo2::GetFunctionInfo2:\n1 2 3 4 5 6 7 8 9 10 HRESULT GetFunctionInfo2( [in] FunctionID funcId, [in] COR_PRF_FRAME_INFO frameInfo, [out] ClassID *pClassId, [out] ModuleID *pModuleId, [out] mdToken *pToken, [in] ULONG32 cTypeArgs, [out] ULONG32 *pcTypeArgs, [out] ClassID typeArgs[] ); You have the FunctionID but not the COR_PRF_FRAME_INFO… You need to call ICorProfilerInfo3::GetFunctionEnter3Info to get it from the COR_PRF_ELT_INFO given by the enter stub. Here is the final code to get a ClassID for a generic type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 if (classId == 0) { // Call GetFunctionEnter3Info to get the COR_PRF_FRAME_INFO* needed by GetFunctionInfo2 // as a second parameter and get the instanciated generic argument types. // Otherwise will get \u0026lt;K, V\u0026gt; instead of \u0026lt;int, string\u0026gt; for example COR_PRF_FRAME_INFO frameInfo = NULL; ULONG nbArgumentInfo = 0; // NOTE: it is needed to pass \u0026amp;nbArgumentInfo or the method will return INVALIDARGUMENT error hr = pInfo-\u0026gt;GetFunctionEnter3Info(functionId, eltInfo, \u0026amp;frameInfo, \u0026amp;nbArgumentInfo, NULL); // NOTE: hr will fail will insuffisant buffer size in case of generic but the frameInfo will be correct hr = pInfo-\u0026gt;GetFunctionInfo2(functionId, frameInfo, \u0026amp;classId, \u0026amp;moduleId, \u0026amp;mdtokenFunction, 0, NULL, NULL); } // from here, we are sure to have a valid ClassID Here is a summary of the relationships between the different IDs with the corresponding APIs to call:\nFrom a ClassID to a class name It is time to enter a complicated part of the story: how to get the “name” of the type that hides behind a ClassID. As you might guess, the first step is to figure out if it is a generic type and what are the corresponding type arguments. You have to call ICorProfilerInfo2::GetClassIDInfo2 with the ClassID to get the metadata token of the type, the number of type arguments and the ClassID of these types if any. As usual with this kind of API, a first call is needed to get the number of type arguments so you can allocate the right sized array of ClassID. The second call will fill up the newly allocated array:\n1 2 3 4 5 6 7 8 9 10 11 12 mdTypeDef mdType; ClassID parentClassId; // not needed in our scenario ULONG32 numGenericTypeArgs = 0; ClassID* genericTypeArgs = NULL; pInfo-\u0026gt;GetClassIDInfo2(classId, NULL, \u0026amp;mdType, \u0026amp;parentClassId, 0, \u0026amp;numGenericTypeArgs, NULL); if (numGenericTypeArgs \u0026gt; 0) { genericTypeArgs = new ClassID[numGenericTypeArgs]; pInfo-\u0026gt;GetClassIDInfo2(classId, NULL, \u0026amp;mdType, \u0026amp;parentClassId, numGenericTypeArgs, \u0026amp;numGenericTypeArgs, genericTypeArgs); } Since you obtained a metadata token, you will need the IMetaDataImport of the module where the type is defined to get details such as… its name. The IMetaDataImport2 is required to enumerate the parameter types:\n1 2 IMetaDataImport2* pMetaDataImport = NULL; pInfo-\u0026gt;GetModuleMetaData(moduleId, ofRead, IID_IMetaDataImport2, reinterpret_cast\u0026lt;IUnknown**\u0026gt;(\u0026amp;pMetaDataImport)); Getting the type “name” is done by a call to IMetaDataImport::GetTypeDefProps, passing the metadata token corresponding to the ClassID:\n1 2 3 4 5 6 7 ULONG length = bufferLen; DWORD flags; mdTypeDef mdBaseType; std::wostringstream oss; pszName[0] = L\u0026#39;\\0\u0026#39;; hr = pMetaDataImport-\u0026gt;GetTypeDefProps(mdType, pszName, length, \u0026amp;length, \u0026amp;flags, \u0026amp;mdBaseType); But before jumping into the name, you need to take care of the case where you are dealing with a nested type (i.e. a type defined in another type). Checking the flags parameter is exactly what you need:\n1 2 3 4 5 6 7 8 9 10 11 if (IsTdNested(flags)) { mdToken mdEnclosingClass; pMetaDataImport-\u0026gt;GetNestedClassProps(mdType, \u0026amp;mdEnclosingClass); // create a new buffer to get the enclosing type name WCHAR* pszEnclosingTypeName = new WCHAR[bufferLen]; GetTypeName(pInfo, pMetaDataImport, mdEnclosingClass, numGenericTypeArgs, genericTypeArgs, pszEnclosingTypeName, bufferLen); oss \u0026lt;\u0026lt; pszEnclosingTypeName \u0026lt;\u0026lt; \u0026#34;+\u0026#34;; delete pszEnclosingTypeName; } A call to IMetaDataImport::GetNestedClassProps returns the metadata token of the enclosing type and you simply recursively call the GetTypeName method that we are implementing in case of multi-nested types.\nIf this is not a generic type, we are done. However, as already mentioned in case of a generic type, it will end with the ` character followed by the number of type parameters. The following helper function swiftly gets rid of it:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 void FixGenericSyntax(WCHAR* name) { ULONG currentCharPos = 0; while (name[currentCharPos] != L\u0026#39;\\0\u0026#39;) { if (name[currentCharPos] == L\u0026#39;`\u0026#39;) { // skip `xx name[currentCharPos] = L\u0026#39;\\0\u0026#39;; return; } currentCharPos++; } } The next step is to rebuild the list of generic argument types using the array of ClassID return by ICorProfilerInfo2::GetClassIDInfo2. The most complicated part of the loop is avoid to add a “,” after the last argument type:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 if (numGenericTypeArgs \u0026gt; 0) { // replace \u0026#34;`xx\u0026#34; by \u0026#34;\u0026lt;\u0026#34; FixGenericSyntax(pszName); oss \u0026lt;\u0026lt; pszName; oss \u0026lt;\u0026lt; L\u0026#34;\u0026lt;\u0026#34;; for (size_t currentGenericArg = 0; currentGenericArg \u0026lt; numGenericTypeArgs; currentGenericArg++) { ClassID argClassId = genericTypeArgs[currentGenericArg]; ModuleID argModuleId; pInfo-\u0026gt;GetClassIDInfo2(argClassId, \u0026amp;argModuleId, NULL, 0, NULL, NULL, NULL); WCHAR argTypeName[260]; GetTypeName(pInfo, argClassId, argModuleId, argTypeName, ARRAY_LEN(argTypeName) - 1)); oss \u0026lt;\u0026lt; argTypeName; if (currentGenericArg \u0026lt; numGenericTypeArgs - 1) oss \u0026lt;\u0026lt; L\u0026#34;, \u0026#34;; } oss \u0026lt;\u0026lt; L\u0026#34;\u0026gt;\u0026#34;; } You call ICorProfilerInfo2::GetClassIDInfo2 on each parameter type ClassID to obtain the ModuleID where the type is defined and call our GetTypeName helper method.\nThe next episode will analyze methods signature.\nReferences Basics about generic types Episode 1: Start a journey into the .NET Profiling APIs ","cover":"https://chrisnas.github.io/posts/2021-09-06_dealing-with-modules-assemblie/1_U7o7D7K4u2OztCqK-xYrIQ.png","date":"2021-09-06","permalink":"https://chrisnas.github.io/posts/2021-09-06_dealing-with-modules-assemblie/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn\u003ca href=\"/posts/2021-08-07_start-journey-into-the/\"\u003e the first post\u003c/a\u003e of this series dedicated to CLR Profiling API, you have seen how to get a \u003cstrong\u003eFunctionID\u003c/strong\u003e each time a managed method is executed in a .NET application. As David Broman (source of most of the profiling implementation details at Microsoft) explains, a \u003cstrong\u003eFunctionID\u003c/strong\u003e is a pointer to an internal data structure of the CLR called a \u003cstrong\u003eMethodDesc\u003c/strong\u003e. For us, it is just an opaque value that is usable in different CLR APIs. So what if you would like to know the name of the method behind this \u003cstrong\u003eFunctionID\u003c/strong\u003e?\u003c/p\u003e","title":"Dealing with Modules, Assemblies and Types with CLR Profiling APIs"},{"content":" Introduction When I want to dig into a new API, I implement a real world scenario. This is exactly what I did for the .NET native profiling API. I want to know how to get parameters and return value of any method call during any .NET application life. The expected result would be something like :\nEnter PublicClass::ClassParamReturnClass this = 0x6f97e190 (8) ClassType obj = 0x6f97e488 (8) | int32 \u0026lt;IntProperty\u0026gt;k__BackingField = 84 | int32 intField = 42 | String stringField = 43 ClassType obj = 0x0000023A475BBAD8 Leave PublicClass::ClassParamReturnClass | int32 \u0026lt;IntProperty\u0026gt;k__BackingField = 170 | int32 intField = 85 | String stringField = 86 returns 0x0000023A475BBBB0 when the following method is executed:\n1 2 3 4 public ClassType ClassParamReturnClass(ClassType obj) { return new ClassType(obj.IntProperty + 1); } You will have to write native C/C++ code to leverage the .NET Profiling API as John Robbins explained in his 2003 Debugging Applications book. Even though Microsoft is providing a code sample for that, it does just show how to get notified when a function is called or exited but nothing about its type, its name, what are its parameters value and its return value. This series will detail both the .NET profiling API and the metadata API (i.e. the native reflection API).\nLet’s start with the basics of .NET Profiling. As shown in the following figure from the Microsoft Profiling documentation, with the right environment configuration, the CLR will load a COM-like object implementing ICorProfilerCallback interface to notify almost everything happening in a .NET application from startup to shutdown.\nSince .NET was launched, more and more notifications have been added by versioning the interface up to ICorProfilerCallback9. Welcome to the usual COM world! I recommend that you watch Pavel Yosifovich session about writing a CLR Profiler in an hour to get an overview. I will use his sample solution as a starting point.\nFor performance sake, you tell the runtime which events you are interested in; i.e. which ICorProfilerCallback functions will be called and you simply have to return S_OK from all other functions. This setup is done in your implementation of ICorProfilerCallback::Initialize. This first function called by the CLR provides a parameter from which you need to QueryInterface a version of ICorProfilerInfo interface (the current one is ICorProfilerInfo10). This interface provides functions to query information about parameters passed to your ICorProfilerCallback functions (such as AppDomain, assembly, type, function, thread and so on).\nThe first ICorProfilerInfo function you will use is SetEventMask to filter the ICorProfilerCallback functions that will be called by the runtime. It accepts a flag combination of values from the COR_PRF_MONITOR enumeration.\nAssembly code is needed for Enter/Leave/TailCall To get notified when a managed method is called or exits, you should pass:\nCOR_PRF_MONITOR_ENTERLEAVE | COR_PRF_ENABLE_FUNCTION_ARGS | COR_PRF_ENABLE_FUNCTION_RETVAL | COR_PRF_ENABLE_FRAME_INFO to ICorProfilerInfo::SetEventMask. The first flag tells the runtime to call static callbacks (i.e. not exposed as functions of ICorProfilerCallback) when a managed method gets executed or returns. The other three flags ensure that these callbacks will receive enough information to extract method arguments and return value.\nUnlike the other notifications that end up calling your ICorProfilerCallback functions, you need to register three special callbacks to the runtime via ICorProfilerInfo3::SetEnterLeaveFunctionHooks3WithInfo. For performance reasons, the .NET team asks you to write the prolog and epilog (i.e. saving/restoring CPU registers on/from the stack) yourself in assembly code instead of relying on well-defined calling conventions supported by the C/C++ compiler.\nThis is why you have to call ICorProfilerInfo3:: SetEnterLeaveFunctionHooks3WithInfo and pass pointers to these “naked” functions. The Microsoft ELTProfiler sample implements the stubs both for x86 (as inlined assembly embedded in a C++ file) and x64 (defined in .asm file). In x64, you need to update your project file to add the following:\n1 2 3 4 5 6 7 8 9 10 \u0026lt;ImportGroup Label = \u0026#34;ExtensionSettings\u0026#34;\u0026gt; \u0026lt;Import Project = \u0026#34;$(VCTargetsPath)\\BuildCustomizations\\masm.props\u0026#34; / \u0026gt; \u0026lt;/ ImportGroup\u0026gt; \u0026lt;ItemGroup\u0026gt; \u0026lt;MASM Include = \u0026#34;../DotNext.Profiler.Shared/asm/windows/nakedcallbacks.asm\u0026#34; Condition = \u0026#34;\u0026#39;$(Platform)\u0026#39; == \u0026#39;x64\u0026#39;\u0026#34; / \u0026gt; \u0026lt;/ ItemGroup\u0026gt; \u0026lt;ImportGroup Label = \u0026#34;ExtensionTargets\u0026#34;\u0026gt; \u0026lt;Import Project = \u0026#34;$(VCTargetsPath)\\BuildCustomizations\\masm.targets\u0026#34; / \u0026gt; \u0026lt;/ ImportGroup\u0026gt; The nakedcallbacks.asm file contains the assembly code to call stub functions wrapped by the expected prolog and epilog written in assembly code. Here is the signature of the functions from where you will be able to start working in C++:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 PROFILER_STUB EnterStub(FunctionIDOrClientID functionId, COR_PRF_ELT_INFO eltInfo) { ... } PROFILER_STUB LeaveStub(FunctionID functionId, COR_PRF_ELT_INFO eltInfo) { ... } PROFILER_STUB TailcallStub(FunctionID functionId, COR_PRF_ELT_INFO eltInfo) { ... } A function is identified by a FunctionID and this is where you start your adventure (I will come back later to the COR_PRF_ELT_INFO parameter). Note that for the EnterStub function, you need to get the FunctionID from the FunctionIDOrClientID.functionId field.\nOnce your hook callbacks have been registered via ICorProfilerInfo3:: SetEnterLeaveFunctionHooks3WithInfo, it is still possible to decide whether or not a managed method call should trigger them. For that, it is needed to register a “mapper” function that is called once per managed method with one of these functions:\nHRESULT ICorProfilerInfo::SetFunctionIDMapper([in] FunctionIDMapper* pFunc); with UINT_PTR __stdcall Mapper(FunctionID functionId, BOOL* pHookFunction) HRESULT ICorProfilerInfo3::SetFunctionIDMapper2([in] FunctionIDMapper2* pFunc, [in] void* clientData); with UINT_PTR __stdcall Mapper2(FunctionID functionId, void* clientData, BOOL* pHookFunction) The latter allows you to pass some “client data” to the mapper function such as a helper class to manipulate the received FunctionID or your profiler state.\nIf pHookFunction is set to TRUE, your enter/leave functions will be called and the returned UINT_PTR will be passed as the FunctionID parameter. This allows you to handle function name or signature computation at one single place outside of the real profiling work done each time a method is called. If pHookFunction is set to FALSE, the enter/leave functions will never be called for that FunctionID. The mapper callback is called only once per FunctionID: this could be a good way to avoid performance impact if you just want to profile a small subset of methods.\nHow to debug your profiler Before going any further, it is needed to know how to debug your C++ profiler code with Visual Studio. The first step is to write a simple .NET application to execute the method calls you want to intercept. The second natural step would be to setup the environment variables needed to inject your profiler:\nand also check the Enable native code debugging option:\nIf you start a debug session, the Visual Studio debugger with use both managed and native debugging APIs. Unfortunately, the managed debugging API does not allow breakpoints sets in ICorProfilerCallback functions.\nInstead, you need to Debug | Start Without Debugging (CTRL+F5), not Debug | Start Debugging (F5) the C# test application with the same environment variables and attach the debugger via Debug | Attach to Process. Select the process and click Select button in the Attach to section:\nTo make attachment simple, add a Console.ReadLine in a console application before starting the method calls to test.\nThe next post will show you how to extract information from a FunctionID.\nReferences Writing a .NET Core cross platform profiler in an hour video and the corresponding source code from Pavel Yosifovich Microsoft Profiling documentation Microsoft Enter/Leave code sample ","cover":"https://chrisnas.github.io/posts/2021-08-07_start-journey-into-the/1_Qw5R9VvFCPePeH_g-6IgKg.png","date":"2021-08-07","permalink":"https://chrisnas.github.io/posts/2021-08-07_start-journey-into-the/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhen I want to dig into a new API, I implement a real world scenario. This is exactly what I did for the .NET native profiling API. I want to know how to get parameters and return value of any method call during any .NET application life. The expected result would be something like :\u003c/p\u003e\n\u003cpre tabindex=\"0\"\u003e\u003ccode\u003eEnter PublicClass::ClassParamReturnClass\nthis = 0x6f97e190 (8)\nClassType obj = 0x6f97e488 (8)\n| int32 \u0026lt;IntProperty\u0026gt;k__BackingField = 84\n| int32 intField = 42\n| String stringField = 43\nClassType obj = 0x0000023A475BBAD8\n\u003c/code\u003e\u003c/pre\u003e\u003cpre tabindex=\"0\"\u003e\u003ccode\u003eLeave PublicClass::ClassParamReturnClass\n| int32 \u0026lt;IntProperty\u0026gt;k__BackingField = 170\n| int32 intField = 85\n| String stringField = 86\nreturns 0x0000023A475BBBB0\n\u003c/code\u003e\u003c/pre\u003e\u003cp\u003ewhen the following method is executed:\u003c/p\u003e","title":"Start a journey into the .NET Profiling APIs"},{"content":" I have already explained how to write your own allocation monitoring tool. Each time 100 cumulated KB are allocated, the CLR emits an AllocationTick event with the name of the last allocated type before the 100 KB threshold and if it is in the LOH or not. This post shows you how to get these events with the corresponding callstacks thanks to the Microsoft Perfview free tool.\nOn Linux, things are a little bit more complicated because the Kernel provider does not exist to emit callstacks events. Microsoft provides the perfcollect script to get a zip file containing both the CLR events (collected by LTTng) and the callstacks (collected via perf). If you want, like dotnet-trace, to rely on EventPipe instead of LTTng, you could use Criteo fork of the perfcollect script and the corresponding updated version of Perfview to open the generated .trace.zip file. Note that our Pull Request to the Microsoft Perfview repository is still pending…\nIf you are on Windows, you could use Perfview (menu Collect | Collect or Alt+C shortcut)\nand click the Start Collection button. When the workflow you want to analyze is finished, click the same button (with a Stop Collection text this time) and the corresponding file should open up as a tree:\nNote that with a Linux collection, the nodes might be different:\nIn both cases, you are interested in the events visible when you double-click the Events node. You could use the Filter textbox to easily find the AllocationTick line in the left panel. Then, right-click and select Open Any Stacks:\nThis action opens up a new windows that is different between a Windows and a Linux collection:\nThe reason is simple: on Windows, events from ALL processes have been collected while only process is targeted on Linux. This is why you have to double-click the process you want on Windows while it is already selected for Linux. You could also keep only events from a process by entering its ID in the IncPats combo-box:\nInstead of staying on the CallTree tab, I recommend to select the By Name tab instead and double-click the AllocationTick line:\nThis action moves to the Caller tab with AllocationTick selected:\nEach line under the AllocationTick node starts with EventData TypeName followed by the allocation type name. EventData is the name of the event payload used by Perfview and TypeName is the property name in the payload.\nThe Inc columns gives an hint about the split of the different allocations. Remember that the AllocationTick events are providing a sampling of the allocations, not an exact picture but it should be enough. In the previous screenshot, the majority of allocations are byte arrays Byte[].\nClick the corresponding checkbox to open the Byte[] node:\nAs you can guess, each line represent a different value of the Size property in the EventData payload. You don’t really care about the different values so type EventData Size in the FoldPats combo-box to make them disappear:\nMost of the allocations are not in the LOH (i.e. less that 85.000 bytes with the default LOH threshold) and when you click the Small checkbox, the different callstacks leading to these allocations appear in the tree such as the Run method of the RandomAllocationAction class in the following screenshot:\nDon’t be scared by the raw state of the stack frame: read this previous post to see how to better understand async/await callstacks and make them more readable.\nHappy memory profiling!\nInterested in working on this topic? Check out our open positions:\nCareers at Criteo | Criteo jobs Find opportunities everywhere. ​careers.criteo.comSenior Site Reliability Engineer - PRE - Performance (remote flexibility with base in France) job… careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2021-07-20_profile-memory-allocations-wit/1_WDSmxes74rkDmJ-7qErfsg.jpeg","date":"2021-07-20","permalink":"https://chrisnas.github.io/posts/2021-07-20_profile-memory-allocations-wit/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2021-07-20_profile-memory-allocations-wit/1_WDSmxes74rkDmJ-7qErfsg.jpeg\"\u003e\u003c/p\u003e\n\u003cp\u003eI have already explained how to \u003ca href=\"/posts/2020-04-18_build-your-own-net/\"\u003ewrite your own allocation monitoring tool\u003c/a\u003e. Each time 100 cumulated KB are allocated, the CLR emits an \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events#gcallocationtick_v2-event?WT.mc_id=DT-MVP-5003325\"\u003eAllocationTick event\u003c/a\u003e with the name of the last allocated type before the 100 KB threshold and if it is in the LOH or not. This post shows you how to get these events with the corresponding callstacks thanks to the Microsoft \u003ca href=\"https://github.com/microsoft/perfview/releases/\"\u003ePerfview free tool\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eOn Linux, things are a little bit more complicated because the Kernel provider does not exist to emit callstacks events. Microsoft provides the \u003ca href=\"https://github.com/microsoft/perfview/tree/main/src/perfcollect\"\u003eperfcollect script\u003c/a\u003e to get a zip file containing both the CLR events (collected by LTTng) and the callstacks (collected via perf). If you want, like \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace?WT.mc_id=DT-MVP-5003325\"\u003edotnet-trace\u003c/a\u003e, to rely on EventPipe instead of LTTng, you could use \u003ca href=\"https://github.com/criteo-forks/perfview\"\u003eCriteo fork of the perfcollect script\u003c/a\u003e and the corresponding updated version of Perfview to open the generated .trace.zip file. Note that our \u003ca href=\"https://github.com/microsoft/perfview/pull/1291\"\u003ePull Request\u003c/a\u003e to the Microsoft Perfview repository is still pending…\u003c/p\u003e","title":"Profile memory allocations with Perfview"},{"content":" In the context of helping the teams at Criteo to clean up our code base, I gathered and documented a few C# anti-patterns similar to Kevin’s publication about performance code smell. Here is an extract related to good/bad memory patterns.\nEven though the garbage collector is doing its works out of the control of the developers, the less allocations are done, the less the GC will impact an application. So the main goal is to avoid writing code that allocates unnecessary objects or references them too long.\nFinalizer and IDisposable usage Let’s start with a hidden way to referencing an object: implementing a “finalizer”. In C#, you write a method whose name is the name of the class prefixed by ~. The compiler generates an override for the virtual Object.Finalize method. An instance of such a type is treated in a particular way by the Garbage Collector:\nafter it is allocated, a reference is kept in a** Finalization** internal queue after a collection, if it is no more referenced, this reference is moved into another fReachable internal queue and treated as a root until a dedicated thread calls its finalizer code As Konrad Kokosa details in one of his free GC Internals video, instances of a type implementing a finalizer stay much longer in memory than needed; waiting for the next collection of the generation in which the previous collection left it (i.e. gen1 if it was in gen0 or even worse, gen2 if it was in gen1).\nSo the first question people are usually asking is: do you really need to implement a finalizer? Most of the time, the answer should be no. The code of a finalizer is responsible for cleaning up ONLY resources that are NOT managed. It usually means “stuff” received from COM interop or P/Invoke calls to native functions such as handles, native memory or memory allocated via Marshal helpers. If your class has IntPtr fields, it is a good sign that their lifetime finishes in a finalizer via Marsal helpers or P/Invoke cleanup calls. Look for SafeHandle-derived class if you need to manipulate kernel object handles instead of raw IntPtr and avoiding finalizers. So in 99.9% of the cases, you don’t need a finalizer.\nThe second question is how implementing a finalizer relates to implementing IDisposable? Unlike a finalizer, implementing the unique Dispose() method of IDisposable interface in a class means nothing for the Garbage Collector. So there is no side effect to extend the lifetime of its instances. This is only a way to allow the users of instances of this class to explicitly cleanup such an instance at a certain point in time instead of waiting for a garbage collection to be triggered.\nLet’s take an example: when you want to write to a file, behind the scene, .NET will call native APIs that operate on real file (via kernel object handles on Windows) with limited concurrent access (i.e. two processes can’t corrupt a file by writing different things at the same time — this is a very high level view of the situation but valid enough for this discussion). Another class would allow access to databases via a limited number of connections that should be released as soon as possible. In all these scenarios, as a user of these classes, you want to be able to “release” the resources used behind the scene as quickly as possible when you don’t need to access them anymore. This translates into the well known **using **pattern in C#:\n1 2 3 4 using (var disposableInstance = new MyDisposable()) { DoSomething(disposableInstance); }; // the instance will be cleanup and its resources released that is transformed by the C# compiler into:\n1 2 3 4 5 6 7 8 9 var disposableInstance = new MyDisposable(); try { DoSomething(disposableInstance); } finally { disposableInstance?.Dispose(); } So when should you implement IDisposable? My answer is simple: when the class owns fields of classes that implement IDisposable and if it implements a finalizer (for the good reasons already explained). Don’t use IDisposable.Dispose for other reasons such as logging (like what we used to do in C++ destructor): prefer to implement another explicit interface dedicated to that purpose.\nIn term of implementation, I have to say that I never understood why Microsoft decided to provide such a confusing implementation in its documentation. You have to implement the following method to “free” unmanaged and managed resources. It should be called by both the finalizer and IDisposable.Dispose():\n1 2 3 4 protected virtual void Dispose(bool disposing) { // free unmanaged and managed resources } You also need to have a _disposed field to allow IDisposable.Dispose() to be called more than once without problem. In all methods and properties of the class, don’t forget to throw an ObjectDisposedException if _disposed is true to catch usage of already disposed objects.\nAsk a group of developers when disposing should be true or false: half will say when called from the finalizer and the other half from Dispose (and I’m not counting those who are not sure). Why giving the same name to the method that already exists in IDisposable? Why picking “disposing” as parameter name? I don’t think it could been possible to find a more confusing solution: too many “dispose” kills the pattern…\nHere is my own implementation that does exactly the same thing but with much less confusion:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 class DisposableMe : IDisposable { private bool _disposed = false; // 1. field that implements IDisposable // 2. field that stores \u0026#34;native resource\u0026#34; (ex: IntPtr) ~DisposableMe() { Cleanup(\u0026#34;called from GC\u0026#34; != null); } // = true public void Dispose() { Cleanup(\u0026#34;not from GC\u0026#34; == null); } // = false ... } I also rename Dispose(bool disposing) into Cleanup(bool from GC):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 private void Cleanup(bool fromGC) { if (_disposed) return; try { // always clean up the NATIVE resources if (fromGC) return; // clean up managed resources ONLY if not called from GC } finally { _disposed = true; if (!fromGC) GC.SuppressFinalize(this); } } The rules you have to keep in mind are simple:\nnative resources (i.e. IntPtr fields) must always be cleaned up managed resources (i.e. IDisposable fields) should be disposed when called from Dispose (not from GC) The _disposed boolean field is used to cleanup resources only once. In this implementation, it is set to true even if an exception happens because I’m assuming that if it just happened, it will also happen if called another time.\nLast but not least, the call to GC.SuppressFinalize(this) simply tells the GC to remove the disposed object from the Finalization internal queue:\nit is only meaningful when called from Dispose (not from GC) to avoid extending its lifetime. it means that the finalizer will never be called. If it were, it would have called Cleanup that would have returned immediately because _disposed is true. The rest of the post describes typical anti-patterns. However, as usual with performance related topic, remember that the impact might not be noticeable if it does not run in a hot path. Always balance between readability/ease of maintenance/understanding and performance gain.\nProvide list capacity when possible It is recommended to provide a capacity when creating a List or a collection instance. The .NET implementation of such classes usually stores the values in an array that need to be resized when new elements are added: it means that:\nA new array is allocated The former values are copied to the new array The former array is no more referenced In the following example, the capacity of resultList is otherList.Count\n1 2 3 4 5 var resultList = new List\u0026lt;...\u0026gt;(); foreach (var item in otherList) { resultList.Add(...); } Prefer StringBuilder to +/+= for string concatenation Creating temporary objects will increase the number of garbage collections and impact performances. Since the string class is immutable, each time you need to get an updated version of a string of characters, the .NET framework ends up creating a new string.\nFor string concatenation, avoid using Concat, + or +=. This is especially important in loop or methods called very often. For example in the following code, a StringBuilder should be used:\n1 2 3 4 5 6 var productIds = string.Empty; while (match.Success) { productIds += match.Groups[2].Value + \u0026#34;\\n\u0026#34;; match = match.NextMatch(); } Again in loops, avoid creating temporary string such as in the following code where SearchValue.ToUpper() do not change in the loop:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 if (SelectedColumn == Resources.Journaux.All \u0026amp;\u0026amp; !String.IsNullOrEmpty(SearchValue)) source = model.DataSource.Where(x =\u0026gt; x.ItemId.Contains(SearchValue) || x.ItemName.ToUpper().Contains(SearchValue.ToUpper()) || x.ItemGroupName.ToUpper().Contains(SearchValue.ToUpper()) || x.CountingGroupName.ToUpper().Contains(SearchValue.ToUpper())); if (SelectedColumn == Resources.Journaux.ItemNumber) source = model.DataSource.Where(x =\u0026gt; x.ItemId.ToUpper().Contains(SearchValue.ToUpper())); if (SelectedColumn == Resources.Journaux.ItemName) source = model.DataSource.Where(x =\u0026gt; x.ItemName.ToUpper().Contains(SearchValue.ToUpper())); if (SelectedColumn == Resources.Journaux.ItemGroup) source = model.DataSource.Where(x =\u0026gt; x.ItemGroupName.ToUpper().Contains(SearchValue.ToUpper())); if (SelectedColumn == Resources.Journaux.CountingGroup) source = model.DataSource.Where(x =\u0026gt; x.CountingGroupName.ToUpper().Contains(SearchValue.ToUpper())); The effect is even worse due to the Where() clause that create a new temporary upper string for each element of the sequence!\nThis recommendation also applies to types that provides string-based direct access to characters such as in the following code:\n1 if (!uriBuilder.ToString().EndsWith(\u0026#34;.\u0026#34;, true, invCulture)) where ToString() is not needed because it is possible to directly access the last character:\n1 if (uriBuilder[uriBuilder.Length - 1] != \u0026#39;.\u0026#39;) Caching strings and interning Prefer static cache of read-only objects to recreating them in each call such as in the following example:\n1 2 3 var allCampaignStatuses = ((CampaignActivityStatus[])Enum.GetValues(typeof(CampaignActivityStatus))) .ToList(); (Replace by a static list since the enumeration elements won’t change)\nLast but not least, when string keys (with only a few different values) are used, you could “intern” them (i.e. ask the CLR to cache a value and always return the same reference). Read the corresponding Microsoft Docs for more details.\nDon’t (re)create objects The first pattern to use is the static classes with static methods to avoid the creation of temporary objects just to call fields-less methods. It is also recommended to pre-compute read-only list instead of re-creating it each time a method gets called like in the following example :\n1 2 3 4 var allCampaignStatuses = ((CampaignActivityStatus[])Enum.GetValues(typeof(CampaignActivityStatus))) .ToList(); // use allCampaignStatuses in the rest of the method This list could have been computed once as a static field of the class because the enumeration will not change during the application lifetime.\nAvoid repeated calls and keep values in local variables when used in a loop; this is particularly easy to forget when dealing with string ToLower() and ToUpper().\n1 2 3 4 5 6 7 var found = elements.Any( // ToLower() is called in each test k =\u0026gt; string.Compare( k.ToLower(), key.ToLower(), StringComparison.OrdinalIgnoreCase ) == 0); (a new temporary string will be created by key.ToLower() by each test)\nPrefer String.Compare(…, StringComparison.OrdinalIgnoreCase) to avoid calling ToLower()/ToUpper() just for string comparison such as in the following example:\n1 if (transactionIdAsString != null \u0026amp;\u0026amp; transactionIdAsString.ToLowerInvariant() == \u0026#34;undefined\u0026#34;) becomes:\n1 if (transactionIdAsString != null \u0026amp;\u0026amp; string.Compare(transactionIdAsString, \u0026#34;undefined\u0026#34;, StringComparison.OrdinalIgnoreCase) == 0) Best practices with LINQ The LINQ syntax is used extensively all over the source code. However, several patterns are found very often and might impact overall performance.\nPrefer IEnumerable to IList Most of the methods are iterating on sequences represented by IEnumerable either via foreach() or thanks to System.Linq.Enumerable extension methods. IList should be used only when sequence modification is required:\nIt is also recommended to use IEnumerable instead of IList as method parameters if there is no need to add/remove elements to the sequence. That way, the client code don’t have to use ToList() before calling the method. The same comment applies to return types that should be IEnumerable rather than IList because most of the time, the sequence will simply be iterated via a foreach statement.\nFirstOrDefault and Any are your friends… but might not be needed First, there is no need to call Any (or even worse ToList().Count \u0026gt; 0) before foreach such as in the following code:\n1 2 3 4 5 if (sequence != null \u0026amp;\u0026amp; sequence.Any()) { foreach (var item in sequence) ... } Avoid unnecessary ToList()/ToArray() calls LINQ queries are supposed to defer their execution until the corresponding sequence is iterated such as with a **foreach **statement. This is also the case when ToList() or ToArray() are called on such a query:\n1 2 3 4 5 6 7 8 9 var resourceNames = resourceAssembly .GetManifestResourceNames() .Where(r =\u0026gt; r.StartsWith($\u0026#34;{resourcePath}.i18n\u0026#34;)) .ToArray(); foreach (var resourceName in resourceNames) { ... } The ToList() method builds a List\u0026lt;\u0026gt; instance that contains all elements of the given sequence. It should be used carefully because the cost of creating a list from a large sequence of objects could be high both in term of memory and performance due to the implementation of element addition in List\u0026lt;\u0026gt;.\nThe only recommended usages are:\noptimization sake to avoid executing the underlying query several times when it is expensive removing/adding elements from a sequence storing the result of a query execution in a class field However, most of the times, you don’t need to call ToList() to iterate on a IEnumerable. If you do so, you hurt the runtime execution both in term of memory consumption (because of the unneeded List that is just temporary) and in term of performance because the sequence gets iterated twice.\nThe base of LINQ to Object is the IEnumerable interface used to iterate on a sequence of objects. All LINQ extension methods are taking IEnumerable instances as parameter in addition to foreach constructs. It is also not needed to call ToList() when an IEnumerable is expected (this is a good reason to prefer IEnumerable to IList/List/[] in method signatures)\nSome methods are calling ToList() before Where clauses are applied to an IEnumerable sequence: it is more efficient to stack the Where clauses and call ToList() at the end.\nLast but not least, it is not needed to call ToList() to get the number of elements in a sequence such as in the following code sample:\n1 2 3 4 5 productInfos .Select(p =\u0026gt; p.Split(DisplayProductInfoSeparator)[0]) .Distinct() .ToList() .Count; becomes:\n1 2 3 4 productInfos .Select(p =\u0026gt; p.Split(DisplayProductInfoSeparator)[0]) .Distinct() .Count(); Prefer IEnumerable\u0026lt;\u0026gt;.Any to List\u0026lt;\u0026gt;.Exists When manipulating IEnumerable, it is recommended to use Any instead of ToList().Exists() such as in the following code:\n1 if (sequence.ToList().Exists(…)) becomes:\n1 if (sequence.Any(...)) Prefer Any to Count when checking for emptiness The **Any **extension methods should be preferred to count computation on IEnumerable because the iteration on the sequence stops as soon as the condition (if any) is fulfilled without allocating any temporary list:\n1 2 3 4 5 var nonArchivedCampaigns = campaigns .Where(c =\u0026gt; c.Status != CampaignActivityStatus.Archived) .ToList(); if (nonArchivedCampaigns.Count == 0) becomes:\n1 if (!campaigns.Where(c =\u0026gt; c.Status != CampaignActivityStatus.Archived).Any()) Note that it is also valid to use if (!campaigns.Any(filter))\nOrder in extension methods might matter When operators are applied to sequences (i.e. IEnumerable), their order might have an impact on the performance of the resulting code. One important rule is to always filter first so the resulting sequences get smaller and smaller to iterate. This is why it is recommended to start a LINQ query by Where filters.\nWith LINQ, the code you write to define a query might be misleading in term of execution. For example, what is the difference between:\n1 2 3 4 var filteredElements = sequence .Where(first filter) .Where(second filter) ; and:\n1 2 3 var filteredElements = sequence .Where(first filter \u0026amp;\u0026amp; second filter) ; It depends on the query executor. For LINQ for Objects, it seems that there is no difference in term of the filters execution: the first and second filters will be executed the same number of times as shown by the following code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 var integers = Enumerable.Range(1, 6); var set1 = integers .Where(i =\u0026gt; IsEven(i)) .Where(i =\u0026gt; IsMultipleOf3(i)); foreach (var current in set1) { Console.WriteLine($\u0026#34;--\u0026gt; {current}\u0026#34;); } Console.WriteLine(\u0026#34;--------------------------------\u0026#34;); var set2 = integers .Where(i =\u0026gt; IsEven(i) \u0026amp;\u0026amp; IsMultipleOf3(i)) ; foreach (var current in set2) { Console.WriteLine($\u0026#34;--\u0026gt; {current}\u0026#34;); } When you run it, you get the exact same lines in the console:\nIsEven(1) IsEven(2) IsMultipleOf3(2) IsEven(3) IsEven(4) IsMultipleOf3(4) IsEven(5) IsEven(6) IsMultipleOf3(6) --\u0026gt; 6 -------------------------------- IsEven(1) IsEven(2) IsMultipleOf3(2) IsEven(3) IsEven(4) IsMultipleOf3(4) IsEven(5) IsEven(6) IsMultipleOf3(6) --\u0026gt; 6 However, when you run it under Benchmark.NET,\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 private int[] _myArray; [Params(10, 1000, 10000)] public int Size { get; set; } [GlobalSetup] public void Setup() { _myArray = new int[Size]; for (var i = 0; i \u0026lt; Size; i++) _myArray[i] = i; } [Benchmark(Baseline = true)] public void Original() { var set = _myArray .Where(i =\u0026gt; IsEven(i)) .Where(i =\u0026gt; IsMultipleOf3(i)) ; int i; foreach (var current in set) { i = current; } } [Benchmark] public void Merged() { var set = _myArray .Where(i =\u0026gt; IsEven(i) \u0026amp;\u0026amp; IsMultipleOf3(i)) ; int i; foreach (var current in set) { i = current; } } the results are significantly better for the single “merged” Where clause:\nAfter looking at the implementation in the .NET Framework with my colleague Jean-Philippe, the additional cost seems to be related to the underlying IEnumerator corresponding to the first Where.\nRemember to never assume and always measure.\nInteresting in joining the team? Check out our latest job posts:\nSenior Site Reliability Engineer - PRE Team (remote flexibility with base in France) job in Paris careers.criteo.comSenior Site Reliability Engineer - PRE - Performance (remote flexibility with base in France) careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2021-07-01_memory-anti-patterns-in/1_IC8UKl9GdNwj-CzbTLaoXg.jpeg","date":"2021-07-01","permalink":"https://chrisnas.github.io/posts/2021-07-01_memory-anti-patterns-in/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2021-07-01_memory-anti-patterns-in/1_IC8UKl9GdNwj-CzbTLaoXg.jpeg\"\u003e\u003c/p\u003e\n\u003cp\u003eIn the context of helping the teams at Criteo to clean up our code base, I gathered and documented a few C# anti-patterns similar to \u003ca href=\"https://twitter.com/KooKiz\"\u003eKevin\u003c/a\u003e’s publication about \u003ca href=\"https://kevingosse.medium.com/performance-best-practices-in-c-b85a47bdd93a\"\u003eperformance code smell\u003c/a\u003e. Here is an extract related to good/bad memory patterns.\u003c/p\u003e\n\u003cp\u003eEven though the garbage collector is doing its works out of the control of the developers, the less allocations are done, the less the GC will impact an application. So the main goal is to avoid writing code that allocates unnecessary objects or references them too long.\u003c/p\u003e","title":"Memory Anti-Patterns in C#"},{"content":" Introduction In the previous post, I described why you might get weird reversed callstacks in Visual Studio when analyzing or debugging async/await code. And if you are using Perfview to profile the same application, you should also get the same reverse continuation flow:\nThe rest of the post describes how to easily profile with Perfview and more interestingly, how to leverage grouping/folding features to get much more readable asynchronous callstacks.\nPerfview 101 Here are the different steps to get the previous tree-like representation of a profiling session results.\nFirst, start a data collection by clicking the Start Collection button from the Collect | Collect dialog box and check Kernel Base, CPU Samples, and .NET boxes:\nStop the collection when the application ends and double-click the CPU Stacks node :\nAfter selecting the application in the Select Process Window\nclick the CallTree tab:\nBefore entering the dreaded yellow/white CPU Stacks window, let’s spend some time detailing its vast toolbar in the following figure:\nThe most powerful elements are the *Pats combo-boxes. Each of them supports a “simple” matching pattern syntax for different purposes (don’t worry, you will see how to use them in many more examples later):\nGroupPats: merge sibling matching frames into one. FoldPats: matching frames are folded into parent frame. IncPats: non matching frames are removed (used for process filtering for example). ExcPats: matching frames are excluded. Let’s see what we get with all combo-box set as empty for the CallTree tab:\nFor server applications, we are usually not interested in making any difference between threads so it would be nice to group all threads under a single AllThreads node. This is exactly what the first choice of the GroupPats combo-box provides:\nThe effect is simple: all lines at the same level containing “Thread” in the text are merged into a new line with “AllThreads” as new text\nYou can now get the same kind of tree based representation as in Visual Studio: the difference is that you need to open each node by clicking a checkbox (or right-click + Expand All to see the whole tree)\nMost columns meaning are quite self-explicit except maybe the When column that provides the CPU usage graph over time in a “textual” way:\nThe CallTree representation obviously displays the frames sorted by Inc% column.\nGoing further with Perfview When expanding the calltree, you usually get lost in the async/await implementation details. The following screenshot shows the signal/noise ratio for my simple test application where we don’t really care about the blue lines!\nThis is where the different Perfview combo-boxes are coming to the rescue. Some frames and their children are clearly not interesting such as the last two coreclr!ThePreStub. In that case, select the frame and copy the text from the status bar (yes: this is possible and so handy!)\nand paste it into the ExcPats combo-box\nto make the corresponding frames disappear.\nUnfortunately, you can’t do the same for the other Task-related frames because these ExcPats matching frames are completely removed with their children where your async method calls appear.\nFolding patterns are your friends This time, the FoldPats combo-box will be your friend: each frame that maps one of its ; separated substring will disappear and its occurrence count will be added to its parent frame. Since all these Task-related frame do not appear a lot, the impact in the Inc/Exc columns of the parent frames should be minimal. After I used the following substrings:\nTasks.Task+DelayPromise.CompleteTimedOut(;Tasks.Task.FinishContinuations(;Tasks.Task.RunOrQueueCompletionAction;Tasks.Task.RunContinuations(;Tasks.Task+DelayPromise+;Tasks.AwaitTaskContinuation.RunOrScheduleAction;Runtime.CompilerServices.AsyncTaskMethodBuilder`;CompilerServices.AsyncTaskMethodBuilder.Start(;CompilerServices.AsyncMethodBuilderCore.Start(;ExecutionContext.RunInternal(;CompilerServices.AsyncTaskMethodBuilder.SetResult(;Tasks.VoidTaskResult].TrySetResult;.TrySetResult(System.Threading.Tasks.;Tasks.Task.TrySetResult(\nthe callstack was much more readable:\nMorph the frames The final step is to transform the frames text into something more meaningful thanks to the GroupPats combo-box. At the beginning of this post, I picked the predefined [fold threads] Thread -\u0026gt; AllThreads grouping pattern. The starting text between [] is used as a title by Perfview to allow the user to more easily figure out what its role is. The rest of the string defines how parts of each frame should match and be grouped. The corresponding contextual help has already been shown earlier when the toolbar was detailed.\nHere, I don’t want to group all sibling frames into a group but rather morph the text into something more readable. The part before -\u0026gt; or =\u0026gt; is used as matching pattern to be replaced by the part after the sign. It is also possible to “extract” elements between {} from the matching pattern to be used to build the replacement string. Each matched element is identified as $1, $2,… based on its position in the pattern.\nIn my example, I would like to apply the following transformation:\nTo write the pattern, you should focus on the separators (!, +\u0026lt; and \u0026gt;d__ in this case):\n{%}!{%}+\u0026lt;{%}\u0026gt;d__*\nAnd the items to extract are what is left in between; identified as {%}.\nThe building of the replacement string is simply counting the matching item position (starting at 1):\n($1) $2 ~~~async back to~~~ $3()\nAnd here is the corresponding final result:\nDon’t lose your xxxPats! It is interesting to note that you could define your own patterns preset via the Preset menu item\nAs you can see here, I have defined my own Criteo Arbitrage preset. If you want to reuse the content of GroupPats, FoldPats, and Fold% combo-boxes, click the Save As Preset (or even Set As Startup Preset to get them when you start Perfview) and pick a name\nFeel free to use the Manage Presets dialog for easier preset manipulation:\nI hope that, now, you better understand the value of Perfview to analyze complicated callstacks.\nRead more from Christophe on our Medium blog!\nConsul Streaming: What’s behind it? *Let’s look at new hidden feature for Consul large or very dynamic clusters of Consul 1.9: Streaming.*medium.com\nLike what you are reading? Join us and make an impact!\nCareers at Criteo | Criteo jobs *Find opportunities everywhere. ​Choose your next challenge. Find the job opportunities at Criteo in Product, research \u0026amp;…*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2021-03-02_how-to-ease-async/1_ATK-AQ6S7ImVxHgw5iF_kQ.jpeg","date":"2021-03-02","permalink":"https://chrisnas.github.io/posts/2021-03-02_how-to-ease-async/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2021-03-02_how-to-ease-async/1_ATK-AQ6S7ImVxHgw5iF_kQ.jpeg\"\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn the \u003ca href=\"/posts/2021-01-19_understanding-reversed-callsta/\"\u003eprevious post\u003c/a\u003e, I described why you might get weird reversed callstacks in Visual Studio when analyzing or debugging async/await code. And if you are using Perfview to profile the same application, you should also get the same reverse continuation flow:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2021-03-02_how-to-ease-async/1_uI53gXuqEuy7tDlp7xw1mA.png\"\u003e\u003c/p\u003e\n\u003cp\u003eThe rest of the post describes how to easily profile with Perfview and more interestingly, how to leverage grouping/folding features to get much more readable asynchronous callstacks.\u003c/p\u003e","title":"How to ease async callstacks analysis in Perfview"},{"content":" Introduction With my colleague Eugene, we spent a long time analyzing performances of one of Criteo main applications with Perfview. The application is processing thousand of requests in an asynchronous pipeline full of async/**await **calls. During our research, we ended up with weird callstacks that looked kind of “reversed”. The goal of this post is to describe why this could happen (even in Visual Studio).\nLet’s see the result of profiling in Visual Studio I wrote a simple .NET Core application that simulates a few async/**await **calls:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 static async Task Main(string[] args) { Console.WriteLine($\u0026#34;pid = {Process.GetCurrentProcess().Id}\u0026#34;); Console.WriteLine(\u0026#34;press ENTER to start...\u0026#34;); Console.ReadLine(); await ComputeAsync(); Console.WriteLine(\u0026#34;press ENTER to exit...\u0026#34;); Console.ReadLine(); } private static async Task ComputeAsync() { await Task.WhenAll( Compute1(), ... Compute1() ); } ComputeAsync is starting a bunch of tasks that will await other **async **methods:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private static async Task Compute1() { ConsumeCPU(); await Compute2(); ConsumeCPUAfterCompute2(); } private static async Task Compute2() { ConsumeCPU(); await Compute3(); ConsumeCPUAfterCompute3(); } private static async Task Compute3() { await Task.Delay(1000); ConsumeCPUinCompute3(); Console.WriteLine(\u0026#34;DONE\u0026#34;); } Unlike the Compute1 and Compute2 methods, the last Compute3 is waiting 1 second before consuming some CPU with square root computation in CompusumeCPUXXX helpers:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [MethodImpl(MethodImplOptions.NoInlining)] private static void ConsumeCPUinCompute3() { ConsumeCPU(); } [MethodImpl(MethodImplOptions.NoInlining)] private static void ConsumeCPUAfterCompute3() { ConsumeCPU(); } [MethodImpl(MethodImplOptions.NoInlining)] private static void ConsumeCPUAfterCompute2() { ConsumeCPU(); } private static void ConsumeCPU() { for (int i = 0; i \u0026lt; 1000; i++) for (int j = 0; j \u0026lt; 1000000; j++) { Math.Sqrt((double)j); } } From Visual Studio, profile the CPU usage of this test program via Debug | Performance Profiler…\nIn the summary result panel, click the Open Details… link\nAnd pick the Call Tree view\nYou should see two paths of execution:\nIf you open the last one, you should see the expected chain of calls:\n… if the methods were synchronous; which is not the case. So Visual Studio did a great job in dealing with the implementation details of async/**await **to present a nice call stack.\nHowever, if you open the first node, you get something more disturbing:\n… if you don’t know how async/**await **is implemented. My Compute3 code is definitively not calling Compute2 which is not calling Compute1! This is where Visual Studio smart frame/callstack reconstruction brings more confusion than anything else. So what’s going on?\nUnderstanding async/await implementation Unlike Visual Studio that is hiding real calls, you should be able to see what methods are really called when analyzing a memory dump with dotnet-dump and the pstacks command:\nIf you follow the arrows from the bottom to the top, you should see the following synchronous (because as frame in thread callstacks) calls:\na timer callback is calling d__4.MoveNext() : this corresponds to the end of the Task.Delay inCompute3 method. d__3.MoveNext() gets called to continue the code after await Compute3 d__.MoveNext() gets called to continue the code after await Compute2 ConsumeCPUAfterCompute2() gets called as expected ComputeCPU()** **or ConsumeCPUInCompute3() get called as expected All the fancy methods names are due to “state machine” types that is generated by the C# compiler when you (1) define **async **methods that (2) await other **async **methods (or any “awaitable” object). Their role is to manage a “state machine” to execute code synchronously up to an await call, and again up to the next await call, and again and again until the method returns.\nAll these d__* types contains fields corresponding to each async method local variables and parameters if any. For example, here is what is generated for the ComputeAsync** **and Compute1/2/3 async methods without any local or parameter:\nThe integer \u0026lt;\u0026gt;1__state field keeps track of the “execution state” of the machine. For example, after the state machine is created in Compute1, this field is set to -1:\nI don’t want to dig into the builder details but just let’s just say that the MoveNext method of the state machine d__2 gets executed (by the same thread).\nBefore looking at the MoveNext implementation corresponding to the Compute1 method (without exception handling), keep in mind that it has to :\nrun all code up to an **await **call, change the “execution state” (more on this later) do some magic to execute that code in another thread (if needed — more on this later) come back to continue the execution of the code after the await call and do that up to the next await call again and again Since \u0026lt;\u0026gt;1__state is -1, the first “synchronous” part of the code is executed (i.e. calling ComsumeCPU method).\nThe Compute2 method is then called to get the corresponding awaitable object (here a Task). If the task runs immediately (i.e. no await call such as a simple Task.FromResult() in the async method), IsCompleted() will return true and the code after the await call will be run by the same thread. Yes it means that async/await calls could be run synchronously by the same thread: why creating a thread when it is not needed?\nIf the Task is passed to the ThreadPool to be executed by a worker thread, the \u0026lt;\u0026gt;1__state value is set to 0 (so the next time MoveNext is called, the next “synchronous” part (i.e. after the await call) will be executed). The code now calls awaitUnsafeOnCompleted to do its magic: adding a continuation to the Compute2 task (the first awaiter parameter) so that MoveNext will be called on that same state machine (the second this parameter) when the task ends. The current thread then quietly returns.\nSo when the Compute2 task ends, its continuation runs to call MoveNext this time with \u0026lt;\u0026gt;1__state as 0 so the last two lines are executed: awaiter.GetResult() returns immediately because the Task returned by Compute2 already ended and the last CinsumeCPUAfterCompute2 method is now called.\nHere is a summary of what is happening:\nEach time you see an async method, the C# compiler is generating a dedicated state machine type with a MoveNext method that is responsible for executing your code synchronously between await calls each time you see an await call, it means that a continuation will be added to the Task wrapping the async method to be executed. That continuation code will call the MoveNext method of the state machine of the calling method to execute the next piece of code up to its next await call. This is why Visual Studio, trying to smartly match each async method state machine MoveNext frame to the method itself, shows reversed callstacks: the shown frames are the ones corresponding to the continuations after the await calls (in green in the previous figure).\nNote that I described in more details how async/await is working and the action of AwaitUnsageOnCompleted during a DotNext conference session with Kevin so feel free to watch the recording at that particular time if you want to go deeper.\nThe next post will describe what to do in Perfview to get more readable callstacks.\nStay tuned! Check out our latest posts on Medium:\nTop Applications of Graph Neural Networks 2021 *GNNs have come a long way in academia. But do we have good applications of them in industry?*medium.comBuild your own .NET CPU profiler in C# *After describing memory allocation profiling it is now time to dig into the CPU sample profiling in C#!*medium.com\nJoin the crowd!\nCareers at Criteo | Criteo jobs *Find opportunities everywhere. ​Choose your next challenge. *careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2021-01-19_understanding-reversed-callsta/1_Zf0qb_2Y8oKs9KO1Fli1rg.png","date":"2021-01-19","permalink":"https://chrisnas.github.io/posts/2021-01-19_understanding-reversed-callsta/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWith my colleague \u003ca href=\"https://twitter.com/ezsilmar\"\u003eEugene\u003c/a\u003e, we spent a long time analyzing performances of one of Criteo main applications with Perfview. The application is processing thousand of requests in an asynchronous pipeline full of \u003cstrong\u003easync\u003c/strong\u003e/**await **calls. During our research, we ended up with weird callstacks that looked kind of “reversed”. The goal of this post is to describe why this could happen (even in Visual Studio).\u003c/p\u003e\n\u003ch2 id=\"lets-see-the-result-of-profiling-in-visualstudio\"\u003eLet’s see the result of profiling in Visual Studio\u003c/h2\u003e\n\u003cp\u003eI wrote a simple .NET Core application that simulates a few \u003cstrong\u003easync\u003c/strong\u003e/**await **calls:\u003c/p\u003e","title":"Understanding “reversed” callstacks in Visual Studio and Perfview with async/await code"},{"content":" The last series was describing how to get details about your .NET application allocation patterns in C#.\nGet a sampling of .NET application allocations A simple way to get the call stack Getting the call stack by hand It is now time to do the same but for the CPU consumption of your .NET applications.\nThanks you Mr Windows Kernel! Under Windows, the kernel ETW provider allows you to get notified every milli-second with the call stack of all threads running on a core. Without any surprise, it is easy with TraceEvent to listen to these events. As explained in an old posts, you simply need to create a session, enable providers and listen to the right event.\nFor sampled CPU profiling, I’m using the TraceLogEventSource to wrap the event source and automatically get the stack frames symbol resolution:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 string sessionName = \u0026#34;Cpu_Profiling_Session+\u0026#34; + Guid.NewGuid().ToString(); _session = new TraceEventSession(sessionName, TraceEventSessionOptions.Create); if (!EnableProviders(_session)) { _session.Dispose(); _session = null; return false; } _profilingTask = Task.Factory.StartNew(() =\u0026gt; { using (TraceLogEventSource source = TraceLog.CreateFromTraceEventSession(_session)) { // CPU sampling kernel events source.Kernel.PerfInfoSample += (SampledProfileTraceData data) =\u0026gt; { ... }; // this call exits when the session is stopped source.Process(); } }); You need to enable three providers:\nKernel: get the profiling event every milli-second and be notified when a dll gets loaded by a process to let TraceEvent manage the symbols Clr: get JIT events describing managed method details ClrRundown: get already JITted methods details 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 protected bool EnableProviders(TraceEventSession session) { session.BufferSizeMB = 256; // Note: it could fail if the user does not have the required privileges var success = session.EnableKernelProvider( KernelTraceEventParser.Keywords.ImageLoad | KernelTraceEventParser.Keywords.Process | KernelTraceEventParser.Keywords.Profile, stackCapture: KernelTraceEventParser.Keywords.Profile ); if (!success) return false; // this call always returns false :^( session.EnableProvider( ClrTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, (ulong)( // events related to JITed methods ClrTraceEventParser.Keywords.Jit | // Turning on JIT events is necessary to resolve JIT compiled code ClrTraceEventParser.Keywords.JittedMethodILToNativeMap | // This is needed if you want line number information in the stacks ClrTraceEventParser.Keywords.Loader // You must include loader events as well to resolve JIT compiled code. ) ); // this provider will send events of already JITed methods session.EnableProvider( ClrRundownTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, (ulong)( ClrTraceEventParser.Keywords.Jit | // We need JIT events to be rundown to resolve method names ClrTraceEventParser.Keywords.JittedMethodILToNativeMap | // This is needed if you want line number information in the stacks ClrTraceEventParser.Keywords.Loader | // As well as the module load events. ClrTraceEventParser.Keywords.StartEnumeration // This indicates to do the rundown now (at enable time) )); return true; } The code to handle the event is really simple:\n1 2 3 4 5 6 7 8 9 source.Kernel.PerfInfoSample += (SampledProfileTraceData data) =\u0026gt; { if (data.ProcessID != Pid) return; var callstack = data.CallStack(); if (callstack == null) return; MergeCallStack(callstack, Reader); }; I’m only interested in profiling a given process (hence the check on process id) and events with a call stack. The callstack is returned by the extension method CallStack() (see the previous post for more details). The main processing is done by the MergeCallStack() method. But before looking at the only complicated part, it is time to discuss a useful tip.\nTip: use ETLx Luke! Like the previous posts about memory profiling, my goal is to demonstrate how to monitor applications as they run. However when you monitor an application CPU consumption, you would like to avoid any noisy neighbor that could highjack some cores. So minimizing the work of your profiling code is always a good idea. In addition, it could also be valuable to record the events and analyze them later. Microsoft Perfview is the open source tool that I’m using the most to dig into CPU consumption. So the solution is to simply record the events and generate an .etlx file for Perfview.\nThe first code change is small: the session is created with a filename.\n1 2 string sessionName = \u0026#34;Cpu_Profiling_Session+\u0026#34; + Guid.NewGuid().ToString(); _session = new TraceEventSession(sessionName, _filename); I’m using a naming convention that contains the process ID I want to monitor so it will be easy to remember when I will analyze the recording in Perfview:\n1 profiler = new EtlCpuSampleProfiler($\u0026#34;trace-{parameters.pid}.etl\u0026#34;); The second step to generate the .etlx file is a one liner:\n1 var traceLog = TraceLog.OpenOrConvert(_filename, new TraceLogOptions() { ConversionLog = SymbolMessages }); The ConversionLog TraceLogOptions property is expecting a TextWriter to log all possible messages related to symbols resolution.\nThe parsing of kernel profiling samples is done on the TraceLog in a more manual way by selecting the events based on TaskGuid corresponding to the kernel profiling task and the OpCode:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // parse profiling kernel events // from https://github.com/microsoft/perfview/blob/master/src/TraceEvent/Samples/41_TraceLogMonitor.cs#L150 // from https://docs.microsoft.com/en-us/windows/win32/etw/perfinfo // from https://github.com/microsoft/perfview/blob/master/src/TraceEvent/Parsers/KernelTraceEventParser.cs#L3128 // and https://github.com/microsoft/perfview/blob/master/src/TraceEvent/Parsers/KernelTraceEventParser.cs#L2298 // Guid perfInfoTaskGuid = new Guid(0xce1dbfb4, 0x137e, 0x4da6, 0x87, 0xb0, 0x3f, 0x59, 0xaa, 0x10, 0x2c, 0xbc); int profileOpcode = 46; foreach (var data in traceLog.Events) { if (data.ProcessID != Pid) continue; if (data.TaskGuid != perfInfoTaskGuid) continue; if ((uint)data.Opcode != profileOpcode) continue; var callstack = data.CallStack(); if (callstack == null) continue; MergeCallStack(callstack, Reader); } How to “merge” call stacks In both live and file based implementations, I end up merging call stacks by calling the MergeCallStack() method. Instead of jumping directly into the C# code, I prefer to describe what I’m expecting from “merging“ call stacks.\nIf you think about what frames (i.e. method call) would appear at the beginning all these threads call stacks, it seems obvious that they should start with the same code: either the main thread startup, timer/thread pool initialization or custom thread bootstrap. In case of server applications, the same request processing calls would lead to specific handlers or controllers code. Each time a common group of frames appears in different call stacks, it would be more readable to see them as different branches starting from the same trunk like in Visual Studio Parallel Stack panel.\nIn order to build a “visual” representation, I have to count the number of time each frame appears at the same place in the recorded call stacks. My data structure looks like a tree where each node contains the current frame, the sampling count (as node or as leaf) and a list of different child frames corresponding to the different execution branches:\n1 2 3 4 5 6 7 8 9 10 11 public class MergedSymbolicStacks { private int _countAsNode; private int _countAsLeaf; public ulong Frame { get; private set; } public string Symbol { get; private set; } public int CountAsNode =\u0026gt; _countAsNode; public int CountAsLeaf =\u0026gt; _countAsLeaf; public List\u0026lt;MergedSymbolicStacks\u0026gt; Stacks { get; set; } Each frame contains both the address and the method signature that have been extracted from the callstack retrieved from the events:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 protected void MergeCallStack(TraceCallStack callStack, SymbolReader reader) { var currentFrame = callStack.Depth; var frames = new SymbolicFrame[callStack.Depth]; // the first element of callstack is the last frame: we need to iterate on each frame // up to the first one before adding them into the MergedSymbolicStack while (callStack != null) { var codeAddress = callStack.CodeAddress; if (codeAddress.Method == null) { var moduleFile = codeAddress.ModuleFile; if (moduleFile != null) { // TODO: this seems to trigger extremely slow retrieval of symbols // through HTTP requests: see how to delay it AFTER the user // stops the profiling if (!_missingSymbols.TryGetValue(moduleFile, out var _)) { codeAddress.CodeAddresses.LookupSymbolsForModule(reader, moduleFile); if (codeAddress.Method == null) { _missingSymbols[moduleFile] = true; } } } } frames[--currentFrame] = new SymbolicFrame( codeAddress.Address, codeAddress.FullMethodName ); callStack = callStack.Caller; } _stackCount++; _stacks.AddStack(frames); } The MergedSymbolicStack.AddStack() method is doing the real merging. The idea of merging call stacks is to start from the bottom and if the frame has already been seen (at this position), increment its sampling count. If not, remember it before incrementing the count. Look at the next frame and do the same match/remember + increment up to the top of the stack.\nHere is an animation of what it would look like on a piece of paper (like the one I wrote down before starting to write the C# implementation :^)\nHere is the corresponding C# code to merge a stack (i.e. an array of frames)\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 public void AddStack(SymbolicFrame[] frames, int index = 0) { _countAsNode++; var firstFrame = frames[index]; // search if the frame to add has already been seen var callstack = Stacks.FirstOrDefault(s =\u0026gt; string.CompareOrdinal(s.Symbol, firstFrame.Symbol) == 0); // if not, we are starting a new branch if (callstack == null) { callstack = new MergedSymbolicStacks(frames[index].Address, frames[index].Symbol); Stacks.Add(callstack); } // it was the last frame of the stack if (index == frames.Length - 1) { callstack._countAsLeaf++; return; } callstack.AddStack(frames, index + 1); } Last but not least, the constructors of the class reflect how to (1) create the root instance and (2) each node in the tree:\n1 2 3 4 5 6 7 8 9 10 11 12 13 public MergedSymbolicStacks() : this(0, string.Empty) { // this will be the root of all stacks } private MergedSymbolicStacks(ulong frame, string symbol) { Frame = frame; Symbol = symbol; _countAsNode = 0; _countAsLeaf = 0; Stacks = new List\u0026lt;MergedSymbolicStacks\u0026gt;(); } The code to render the merged stack\nis not that complicated because everything is already in the tree of frames.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 private static void RenderStack(MergedSymbolicStacks stack, IRenderer visitor, bool isRoot, int increment) { var alignment = new string(\u0026#39; \u0026#39;, Padding * increment); var padding = new string(\u0026#39; \u0026#39;, Padding); var currentFrame = stack.Frame; // special root case if (isRoot) visitor.WriteCount($\u0026#34;{Environment.NewLine}{alignment}{stack.CountAsNode, Padding} \u0026#34;); else visitor.WriteCount($\u0026#34;{Environment.NewLine}{alignment}{stack.CountAsLeaf + stack.CountAsNode, Padding} \u0026#34;); visitor.WriteMethod(stack.Symbol); var childrenCount = stack.Stacks.Count; if (childrenCount == 0) { visitor.WriteFrameSeparator(\u0026#34;\u0026#34;); return; } foreach (var nextStackFrame in stack.Stacks.OrderByDescending(s =\u0026gt; s.CountAsNode + s.CountAsLeaf)) { // increment when more than 1 children var childIncrement = (childrenCount == 1) ? increment : increment + 1; RenderStack(nextStackFrame, visitor, false, childIncrement); if (increment != childIncrement) { visitor.WriteFrameSeparator($\u0026#34;{Environment.NewLine}{alignment}{padding}{nextStackFrame.CountAsNode + nextStackFrame.CountAsLeaf, Padding} \u0026#34;); visitor.WriteFrameSeparator($\u0026#34;~~~~ \u0026#34;); } } } The IRenderer interface implementations are simply changing foreground color depending on what kind of information to display:\nI have used the same “Visitor” pattern for the pstack tool/extension for WinDBG.\nNot for Admin only I always thought that I needed to be a member of the Administrator group and running elevated to be allowed to start a kernel profiling session. Well… This is in fact not the case! You have to dig into the documentation for configuring and starting a SystemTraceProvider session to read the following note:\nIf you want a non-administrators or a non-TCB process to be able to start a profiling trace session using the SystemTraceProvider on behalf of third party applications, then you need to grant the user profile privilege and then add this user to both the session GUID (created for the logger session) and the system trace provider GUID to enable the system trace provider. For more information, see the EventAccessControl function.\nLong story short, you need a user to be part of the Performance Log Users group (makes sense) or grant her the TRACELOG_ACCESS_REALTIME permission. Obviously, you need an administrator account to do both but this can be done once on a machine by your IT in a secure way.\nI wrapped a managed implementation of the corresponding code to add the permission in a ProfilingPermission class that hides all the P/Invoke and weird marshalling stuff to the native Windows API. Simply pass a user name to EnableProfileUser() and it should work just fine.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 public static class ProfilingPermission { private const uint TRACELOG_GUID_ENABLE = 0x0080; private const int NO_ERROR = 0; // ERROR_SUCCESS in C++ private const int ERROR_INSUFFICIENT_BUFFER = 122; // read https://docs.microsoft.com/en-us/windows/win32/etw/configuring-and-starting-a-systemtraceprovider-session // for more details public static void EnableProfilerUser(string accountName) { // Kernel provider from https://github.com/microsoft/perfview/blob/master/src/TraceEvent/Parsers/KernelTraceEventParser.cs#L43 Guid kernelProviderGuid = new Guid(\u0026#34;{9e814aad-3204-11d2-9a82-006008a86939}\u0026#34;); byte[] sid = LookupSidByName(accountName); // from https://docs.microsoft.com/en-us/windows/win32/etw/configuring-and-starting-a-systemtraceprovider-session uint operation = (uint)EventSecurityOperation.EventSecurityAddDACL; uint rights = TRACELOG_GUID_ENABLE; bool allowOrDeny = (\u0026#34;Allow\u0026#34; != null); uint result = EventAccessControl( ref kernelProviderGuid, operation, sid, rights, allowOrDeny ); if (result != NO_ERROR) { var lastErrorMessage = new Win32Exception((int)result).Message; throw new InvalidOperationException($\u0026#34;Failed to add ACL ({result.ToString()}) : {lastErrorMessage}\u0026#34;); } } private static byte[] LookupSidByName(string accountName) { byte[] sid = null; uint cbSid = 0; StringBuilder referencedDomainName = new StringBuilder(); uint cchReferencedDomainName = (uint)referencedDomainName.Capacity; SID_NAME_USE sidUse; int err = NO_ERROR; if (!LookupAccountName(null, accountName, sid, ref cbSid, referencedDomainName, ref cchReferencedDomainName, out sidUse)) { err = Marshal.GetLastWin32Error(); if (err == ERROR_INSUFFICIENT_BUFFER) { sid = new byte[cbSid]; referencedDomainName.EnsureCapacity((int)cchReferencedDomainName); err = NO_ERROR; if (!LookupAccountName(null, accountName, sid, ref cbSid, referencedDomainName, ref cchReferencedDomainName, out sidUse)) err = Marshal.GetLastWin32Error(); } } if (err != NO_ERROR) { var lastErrorMessage = new Win32Exception(err).Message; throw new InvalidOperationException($\u0026#34;LookupAccountName fails ({err.ToString()}) : {lastErrorMessage}\u0026#34;); } // display the SID associated to the given user IntPtr ptrSid; if (!ConvertSidToStringSid(sid, out ptrSid)) { err = Marshal.GetLastWin32Error(); var lastErrorMessage = new Win32Exception(err).Message; Console.WriteLine($\u0026#34;No SID string associated to user {accountName} ({err.ToString()}) : {lastErrorMessage}\u0026#34;); } else { string sidString = Marshal.PtrToStringAuto(ptrSid); ProfilingPermission.LocalFree(ptrSid); Console.WriteLine($\u0026#34;Account ({referencedDomainName}){accountName} mapped to {sidString}\u0026#34;); } return sid; } [DllImport(\u0026#34;Sechost.dll\u0026#34;, SetLastError = true)] static extern uint EventAccessControl( ref Guid providerGuid, uint operation, [MarshalAs(UnmanagedType.LPArray)] byte[] Sid, uint right, bool allowOrDeny // true means ALLOW ); [DllImport(\u0026#34;kernel32.dll\u0026#34;)] static extern IntPtr LocalFree(IntPtr hMem); [DllImport(\u0026#34;advapi32.dll\u0026#34;, SetLastError = true)] static extern bool LookupAccountName( string systemName, string accountName, [MarshalAs(UnmanagedType.LPArray)] byte[] Sid, ref uint cbSid, StringBuilder referencedDomainName, ref uint cchReferencedDomainName, out SID_NAME_USE nameUse); [DllImport(\u0026#34;advapi32.dll\u0026#34;, CharSet = CharSet.Auto, SetLastError = true)] static extern bool ConvertSidToStringSid( [MarshalAs(UnmanagedType.LPArray)] byte[] pSID, out IntPtr ptrSid); // can\u0026#39;t be an out string because we need to explicitly call LocalFree on it; // the marshaller would call CoTaskMemFree in case of a string // from http://pinvoke.net/default.aspx/advapi32/LookupAccountName.html enum SID_NAME_USE { SidTypeUser = 1, SidTypeGroup, SidTypeDomain, SidTypeAlias, SidTypeWellKnownGroup, SidTypeDeletedAccount, SidTypeInvalid, SidTypeUnknown, SidTypeComputer } // from evntcons.h enum EventSecurityOperation { EventSecuritySetDACL = 0, EventSecuritySetSACL, EventSecurityAddDACL, EventSecurityAddSACL, EventSecurityMax } // EVENTSECURITYOPERATION } You are now ready to profile your application memory allocation patterns and CPU consumption!\nThanks for checking in with us again on our C# series. Like what you are reading? Head over to our latest blog posts on the topic:\nBuild your own .NET memory profiler in C# *This post explains how to collect allocation details by writing your own memory profiler in C#.*medium.comBuild your own .NET memory profiler in C# — call stacks (2/2–1) *This post explains how to get the call stack corresponding to the allocations with CLR events.*medium.com\nIf you are interested in joining our team, check out our open positions and apply today!\nCareers at Criteo | Criteo jobs *Find opportunities everywhere. ​Choose your next challenge. Find the job opportunities at Criteo in Product, research \u0026amp;…*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-12-08_build-your-own-net/1_WIzDdkN_0nbUFiNnmKXe7A.png","date":"2020-12-08","permalink":"https://chrisnas.github.io/posts/2020-12-08_build-your-own-net/","summary":"\u003chr\u003e\n\u003cp\u003eThe last series was describing how to get details about your .NET application allocation patterns in C#.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"/posts/2020-04-18_build-your-own-net/\"\u003eGet a sampling of .NET application allocations\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"/posts/2020-05-18_build-your-own-net/\"\u003eA simple way to get the call stack\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href=\"/posts/2020-06-19_build-your-own-net/\"\u003eGetting the call stack by hand\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eIt is now time to do the same but for the CPU consumption of your .NET applications.\u003c/p\u003e\n\u003ch2 id=\"thanks-you-mr-windowskernel\"\u003eThanks you Mr Windows Kernel!\u003c/h2\u003e\n\u003cp\u003eUnder Windows, the kernel ETW provider allows you to get notified every milli-second with the call stack of all threads running on a core. Without any surprise, it is easy with TraceEvent to listen to these events. As explained in an \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eold posts\u003c/a\u003e, you simply need to create a session, enable providers and listen to the right event.\u003c/p\u003e","title":"Build your own .NET CPU profiler in C#"},{"content":" In the previous post, I presented the new commands that were added to dotnet-dump and how to use them. It is now time to show how to implement such a command.\nBut before jumping into the code, you should first ensure that you have a valid use case that the Diagnostics team is not currently working on. I recommend to create an issue in the Diagnostics repository to explain what is missing for which scenario and propose to implement the corresponding command.\nWhat is a dotnet-dump command? Here is the directory structure related to the dotnet-dump tool in the Diagnostics repository:\nThe built binaries are generated under artifacts\\bin\\dotnet-dump\u0026lt;Release or Debug\u0026gt;\\netcoreapp2.1 folder if you need to test them outside of Visual Studio.\nThe eng folder contains the versions.props file that lists the versions for nuget dependencies. In my case, I had to reference the ParallelStacks.Runtime nuget so I added the following line:2.0.1\nAnd in the dotnet-dump.csproj, this nuget is referenced with the same variable: `\nMissed the first part of the story? Read it here:\nHow to extend dotnet-dump (1/2) — What are the new commands? *This first post describes the new commands, when to use them, and the git setup I used to implement them.*medium.com\nWant to work with Christophe or other teams? Check out our open positions:\nCareers at Criteo | Criteo jobs *Find opportunities everywhere. ​Choose your next challenge. Find the job opportunities at Criteo in Product, research \u0026amp;…*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-11-09_how-to-write-commands/1_G3KU-d3fRouyh-jxCGdLNw.png","date":"2020-11-09","permalink":"https://chrisnas.github.io/posts/2020-11-09_how-to-write-commands/","summary":"\u003chr\u003e\n\u003cp\u003eIn the \u003ca href=\"/posts/2020-09-29_how-to-extend-dotnet/\"\u003eprevious post\u003c/a\u003e, I presented the new commands that were added to dotnet-dump and how to use them. It is now time to show how to implement such a command.\u003c/p\u003e\n\u003cp\u003eBut before jumping into the code, you should first ensure that you have a valid use case that the Diagnostics team is not currently working on. I recommend to create an issue in the Diagnostics repository to explain what is missing for which scenario and propose to implement the corresponding command.\u003c/p\u003e","title":"How to write your own commands in dotnet-dump (2/2)"},{"content":" Introduction To ease our troubleshooting sessions at Criteo, new high level commands for WinDBG have been written and grouped in the gsose extension. As we moved to Linux, Kevin implemented the plumbing to be able to load gsose into LLDB. In both cases, our extension commands are based on ClrMD to dig into a memory dump.\nAs Microsoft pushed for dotnet-dump as the new cross-platform way to deal with memory dump, it was obvious that we would have to be able to use our extension commands in dotnet-dump. Unfortunately dotnet-dump does not support any extension mechanism. In May this year, Kevin updated a minimum of code to load a list of extension assemblies at the startup of dotnet-dump. I followed another direction by adding a “load” command to dynamically add extensions commands.\nHowever, the Diagnostics team was focusing on supporting .NET Core 5 release and adding extensibility was not planned before next year. Due to our own time constraints at Criteo, gsose extension commands were really needed so we agreed with the Diagnostics team to implement these commands directly into dotnet-dump as pull requests.\nThis first post describes the new commands, when to use them, and the git setup I used to implement them.\nCommands details and challenges Here is the list of extension commands as shown by the help command:\nAs of today with the last 3.1.141901 official version, only timerinfo is available but feel free to clone the repository and rebuild it to get all commands.\npstacks: almost Parallel Stacks a la Visual Studio When you start an investigation, you are usually interested in getting a high level view of what the running threads are doing and this is what pstacks provides. When hundreds of threads are running, commands such as clrthreads are useless. Instead, pstacks merges the common parts of each call stack and present them in a tree:\nDon’t be scared by the layout; here is a description of each section:\nIf you look at the threads 9c6c and 4020 (after the last ~~~~), it means that 2 threads (only the first 4 threads id are shown except if you pass — all as a parameter) share the same callstack:\n~~~~ 9c6c,4020 2 System.Threading.Monitor.Wait(Object, Int32, Boolean) 4 System.Threading.Tasks.Task+c.cctor\u0026gt;b__278_1(Object) ... 4 System.Threading.Tasks.Task.ExecuteEntryUnsafe() 4 System.Threading.Tasks.Task.ExecuteWorkItem() 7 System.Threading.ThreadPoolWorkQueue.Dispatch() 7 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() Compared to the “list” representation, the “tree” representation is useful when you have a lot of threads to deal with and see the different branches in the groups of stack frames.\nti: see all your timers If you don’t find anything interesting in the running threads (i.e. not hundreds of threads with the same code stack blocked on a Wait call for example), you should look at what will run as timer callbacks with the ti command.\nHere is example of what you will get:\n\u0026gt; ti (S) 0x00007F7D84529CD0 @ 1000 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.FetchMessage\u0026gt;.Tick ... (L) 0x00007F7D844FA7A0 @ 1000 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.OffsetMessage\u0026gt;.Tick ... (L) 0x00007F7D842042F8 @ 999 ms every 1000 ms | 0000000000000000 () -\u0026gt; Criteo.DevKit.TimeStamp+\u0026lt;\u0026gt;c.cctor\u0026gt;b__35_0 ... (L) 0x00007F7D846314C8 @ 999 ms every 1000 ms | 00007F7D8462ED70 (Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.Heartbeat) -\u0026gt; ... 190 timers ----------------------------------------------- 1 | (S) @ 999 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.FetchMessage\u0026gt;.Tick 1 | (L) @ 996 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.OffsetMessage\u0026gt;.Tick 1 | (L) @ 993 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.FetchMessage\u0026gt;.Tick 1 | (L) @ 999 ms every 1000 ms | 0000000000000000 () -\u0026gt; Criteo.DevKit.TimeStamp+\u0026lt;\u0026gt;c.cctor\u0026gt;b__35_0 ... 9 | (L) @ 1000 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.FetchMessage\u0026gt;.Tick 33 | (L) @ 999 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.FetchMessage\u0026gt;.Tick 34 | (L) @ 2000 ms every 2000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopicByPartition\u0026lt;Kafka.Cluster.ProduceMessage\u0026gt;.Tick 38 | (L) @ 999 ms every 1000 ms | 0000000000000000 () -\u0026gt; Kafka.Batching.AccumulatorByTopic\u0026lt;Kafka.Routing.OffsetMessage\u0026gt;.Tick The first part of the output lists each instance of Timer with start/repeat information followed by the callback parameter and callback if available. The second list groups timer so you could identify cases where the same timers have been created many times.\ntpq: what is in the ThreadPool queues If no culprit is identified in timers, it is time to look at what is in the ThreadPool with the tpq command.\n\u0026gt; tpq global work item queue________________________________ 0x000002AC3C1DDBB0 Work | (ASP.global_asax)System.Web.HttpApplication.ResumeStepsWaitCallback ... 0x000002AABEC19148 Task | System.Threading.Tasks.Dataflow.Internal.TargetCore\u0026lt;System.Action\u0026gt;.\u0026lt;ProcessAsyncIfNecessary_Slow\u0026gt;b__3 local per thread work items_____________________________________ 0x000002AE79D80A00 System.Threading.Tasks.ContinuationTaskFromTask ... 0x000002AB7CBB84A0 Task | System.Net.Http.HttpClientHandler.StartRequest 7 Task System.Threading.Tasks.Dataflow.Internal.TargetCore\u0026lt;System.Action\u0026gt;.\u0026lt;ProcessAsyncIfNecessary_Slow\u0026gt;b__3 ... 84 Task System.Net.Http.HttpClientHandler.StartRequest ---- 6039 1810 Work (ASP.global_asax) System.Web.HttpApplication.ResumeStepsWaitCallback ---- 1810 Both work items from the global queue and the local queues are displayed with each identified callback. The final summary spits work items and tasks.\nMiscellaneous commands Two other helper commands are also available.\ntks: Task State Pass a Task object reference to tks to get its human readable state.\n\u0026gt; help tks ------------------------------------------------------------------------------- TaskState [hexa address] [-v \u0026lt;decimal state value\u0026gt;] TaskState translates a Task m_stateFlags field value into human readable format. It supports hexadecimal address corresponding to a task instance or -v \u0026lt;decimal state value\u0026gt;. \u0026gt; tks 000001db16cf98f0 Running \u0026gt; tks -v 73728 WaitingToRun dcq : Dump ConcurrentQueue Dump elements stored in a ConcurrentQueue. Due to implementation details, more manual steps are required in case of value types.\n\u0026gt; help dcq ------------------------------------------------------------------------------- DumpConcurrentQueue Lists all items in the given concurrent queue. For simple types such as numbers, boolean and string, values are shown. \u0026gt; dcq 00000202a79320e8 System.Collections.Concurrent.ConcurrentQueue\u0026lt;System.Int32\u0026gt; 1 - 0 2 - 1 3 - 2 In case of reference types, the command to dump each object is shown. \u0026gt; dcq 00000202a79337f8 System.Collections.Concurrent.ConcurrentQueue\u0026lt;ForDump.ReferenceType\u0026gt; 1 - dumpobj 0x202a7934e38 2 - dumpobj 0x202a7934fd0 3 - dumpobj 0x202a7935078 For value types, the command to dump each array segment is shown. The next step is to manually dump each element with dumpvc \u0026lt;the Element Methodtable\u0026gt; \u0026lt;[item] address\u0026gt;. \u0026gt; dcq 00000202a7933370 System.Collections.Concurrent.ConcurrentQueue\u0026lt;ForDump.ValueType\u0026gt; 1 - dumparray 202a79334e0 2 - dumparray 202a7938a88 Note that for reference type items, the dumpobj command is provided to get the value of the item fields: you just copy and paste them to get instance fields.\nGIT hell Before digging into the implementation details, I want to spend some time on how to properly create a pull request. Since the Diagnostics repository is hosted on Github, you have to create a fork and push your changes on a dedicated branch to then submit a pull request.\nDue to my previous Team Foundation Server experience, it seems that I have problems with Git commands: too many different ways to do simple things maybe. So I was faithfully relying on the documentation to configure/sync a fork and I ended up merging the Microsoft Master branch to our Criteo fork Master before creating my own branch dedicated to my pull request:\nAfter a while, in the next pull requests, I started to have unrelated commits in the pull requests:\nIt was due to the fact that I was merging the Criteo fork master with Diagnostics master to stay in sync.\nMy coworker Guillaume explained to me that I should follow a different path. This time, I’m just creating a branch on the Microsoft repository (so no need to merge) and I’m passing by the Criteo fork just to push upstream:\ngit clone https://github.com/dotnet/diagnostics cd diagnostics git checkout -b PR_dotnet-dump_pstacks Now I can change my local source code before pushing upstream to Criteo fork\ngit add/gui git commit git remote add upstream https://github.com/criteo-forks/diagnostics git push upstream PR_dotnet-dump_pstacks That way, I can synchronize the Criteo fork without “impacting” the history of commits brought with my dedicated pull request branch.\nThe next episode will cover command implementation details.\nWhat’s next? Read the second part of this article:\nHow to write commands for dotnet-dump This post describes the different steps, tips and tricks to write your own commands for dotnet-dumpmedium.com\nLike what you are reading? Check out more articles from Christophe on our medium account.\nThe .NET Core Journey at Criteo *This post shows the challenges we faced during the migration to .NET Core on containerized Linux for our main…*medium.com\nAre you interested to work with Christophe and other talented Engineers at Criteo? Take a look at our open positions:\nProduct, Research \u0026amp; Development | Criteo Careers careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-09-29_how-to-extend-dotnet/1_L-MKjPDVEpW5g8nJ6aGvMg.png","date":"2020-09-29","permalink":"https://chrisnas.github.io/posts/2020-09-29_how-to-extend-dotnet/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eTo ease our troubleshooting sessions at Criteo, new high level commands for WinDBG have been written and grouped in the \u003ca href=\"https://github.com/chrisnas/DebuggingExtensions/blob/master/Documentation/gsose.md\"\u003egsose extension\u003c/a\u003e. As we moved to Linux, \u003ca href=\"https://twitter.com/KooKiz\"\u003eKevin\u003c/a\u003e implemented the plumbing to be able to load gsose into LLDB. In both cases, our extension commands are based on ClrMD to dig into a memory dump.\u003c/p\u003e\n\u003cp\u003eAs Microsoft pushed for dotnet-dump as the new cross-platform way to deal with memory dump, it was obvious that we would have to be able to use our extension commands in dotnet-dump. Unfortunately dotnet-dump does not support any extension mechanism. In May this year, Kevin updated \u003ca href=\"https://github.com/kevingosse/diagnostics/commit/4abf635b6313bf733b2450a2ffee8fa06befd7b6\"\u003ea minimum of code\u003c/a\u003e to load a list of extension assemblies at the startup of dotnet-dump. I followed another direction by \u003ca href=\"https://github.com/chrisnas/diagnostics/commit/235a3347fcf2369e408664137cd29a721879b42d\"\u003eadding a “load” command\u003c/a\u003e to dynamically add extensions commands.\u003c/p\u003e","title":"How to extend dotnet-dump (1/2) — What are the new commands?"},{"content":" Introduction When I arrived at Criteo in late 2016, I joined the .NET Core “guild” (i.e. group of people from different teams dedicated to a specific topic). The first meeting I attended included Microsoft folks led by Scott Hunter (head of .NET program management) and including David Fowler (SignalR and ASP.NET Core). The goal for Criteo was simple: Moving a set of C# applications from Windows/.NET Framework to Linux/.NET Core. I guess that for Microsoft we were a customer with workloads that could be interesting to support with .NET Core. At that time, I did not realize how strong their commitment to work with us was. Our Open Source mindset was the selling point.\nHow complicated could it be? Well… this post will show you the challenges that we had to face to run, monitor and debug our applications.\nTry it Once we got a build of all .NET Core assemblies (more on this in a forthcoming blog post), it was time to run a few applications. The first issues that we faced were related to missing features between .NET Framework and .NET Core. For example, we need cryptography support of 3DES and AES with cypher mode CFB but it is (still) not available in .NET Core for Linux. Thanks to the Open Source status of .NET Core, we were able to add it to CoreFx. However, since we did not implement it on MacOS/Windows as Microsoft requested for our change to be accepted as a Pull Request, we had to keep our Criteo-forked branch.\nThe second class of runtime problems we had to solve were due to differences between Windows and Linux but also with the “containerization” of the runtime environment. Let’s take two examples involving the .NET Garbage Collector. First, our containers were using Linux cgroups to manage quotas including memory and number of CPU cores usable by applications. However, at CLR startup, the GC was counting the total count of CPU cores to compute the number of heaps to allocate instead of the one defined at the cgroup level: We ended up with instant Out Of Memory automatic killing. This time our fix was done and merged in the CLR repository.\nThe second example is related to a GC optimization: During background generation 2 collections, the CLR threads working underneath are affinitized to each different CPU core to avoid locks. We were lucky enough to welcome Maoni Stephens (Lead Dev on the GC) in our Paris office early 2018 to share our weird allocation patterns that impacted the GC. During her stay, she was kind enough to help us investigate a behavior on our servers: When SysInternals ProcessExplorer was running, the garbage collections were taking more time than usual. Maoni found out ProcessExplorer had an affinitized high priority thread conflicting with GC threads. During investigations related to longer response time on Linux compared to Windows. We realized that GC threads were not affinitized like it was the case on Windows and the issue was fixed by Jan Vorlicek.\nHere is our lesson: Sometimes fixes are merged into the official release and sometimes they are not. If your workloads are pushing .NET to its limits, you will probably have to build and manage your own Core fork and make it available to your deployments.\nMonitor it At Criteo, our Grafana dashboards measuring .NET Framework application health were based on metrics computed from Windows performance counters. Even without going to Linux, .NET Core is no more exposing performance counters so we had to entirely rebuild our metrics collection system!\nBased on Microsoft feedbacks, we decided to listen to CLR events emitted via ETW on Windows and LTTng on Linux. In addition to work for both Operating Systems, these events are also providing accurate details about thread contention, exceptions and garbage collections not available with Performance counters. Please refer to our series of blog posts for more details and reusable code samples to integrate these events into your own systems.\nOur first Linux metrics collection implementation was based on LTTng and we presented our journey during the Tracing Summit in 2017. Microsoft already built TraceEvent, an assembly allowing .NET code to parse CLR events for both Windows and Linux. Unfortunately for us, the Linux part was only able to load traces files but we needed live session like on Windows where you can listen to events emitted by running applications. Since this code is Open Source, Gregory was able to add the live session feature to TraceEvent.\nWith .NET Core 3.0 Microsoft provided a way to exchange events common to Linux and Windows called EventPipes. So… we moved our collection implementation from LTTng to EventPipe (look at our blog series and DotNext conference session for more details and reusable code sample). With the new EventPipe implementation in the CLR came performance issues not seen by Microsoft. The reason is simple: Some of our applications are running hundreds of threads to process thousands of requests per second and allocate memory like crazy. In that kind of context, the CLR has a lot to do and so, has a lot of events to generate and emit via LTTng or EventPipes.\nThe initial implementation was lacking some filtering and too many events were generated or expensive event payload was created even though the events were not emitted. Based on our feedback, the Microsoft Diagnostic team was very responsive and quickly fixed the problem.\nMicrosoft did not “just” move to Open Source, the teams are working deeply integrated with the issue/pull request model of GitHub. So don’t be shy and if you find a problem, create an issue with a detailed reproduction and even better, provide a pull request with the fix. Everyone in the community will benefit!\nRun it With these metrics, we started to investigate some performance differences (mostly response time) between Windows and containerized Linux.\nWe saw a huge performance difference on Linux: Both response time (x2) and scalability (timeout increase with QPS). Our team spent a lot of time to improve the situation up to the point where it was possible to send the applications to production.\nIn the new containerized environment we faced the same kind of noisy neighbor symptoms that we had with Process Explorer. If the CPU cores are not dedicated to a container (as it was for us at the beginning), this scenario happens a lot. So we updated the scheduling system to dedicate CPU cores to containers.\nOn a totally different area, we found out that the way .NET Core handles network I/O continuation had an impact on our main application. To give a bit of context, this application has to handle a lot of requests and is response-time driven. During the processing of a request, the current thread might have to send an HTTP request before continuing its processing. Since this is done asynchronously, the thread is now available to process more incoming requests and this is good for throughput. However, it means that when the inner HTTP request comes back, all available threads might be processing new incoming requests and it will take time to complete the old one. The net effect is to increase the median response time and this is not something we want!\nThe .NET Core implementation is relying on the .NET ThreadPool that shares its threads with all the async/await magic and the incoming requests processing (The .NET Framework implementation is using a totally different implementation based on I/O completion ports on Windows). To solve the issue, Kevin implemented a custom thread pool to handle network I/O and we keep on optimizing it. When you work on this kind of deep area of code-shared by so many different workloads, you realize that it is impossible to find the silver bullet.\nDebug it What would you do if something would go wrong in an application? On Windows, with Visual Studio, we are able to remote debug a rogue application to set a breakpoint, look at fields and properties or even have a high-level view of what threads are doing with the ParallelStacks view. In the worst case, SysInternals procdump allows us to take a snapshot of the application and analyze it on our developer’s machine with WinDBG or Visual Studio.\nIn terms of remote debugging a Linux application, Microsoft provides an SSH-based solution to attach to a running application. However, for security reasons, it is not allowed to run an SSH server in our Criteo containers. The solution was to implement the communication protocol with VsDbg for Linux on top of WebSockets.\nWell… this was not enough. Hosting architecture (Marathon and Mesos in our case) ensures that applications in containers are running smoothly by sending requests to health check endpoints. If the application replies that everything is fine, then the container is safe. If the application does not answer as expected (including retries), then Marathon/Mesos kills the application and cleans up the container. Now think about what will happen if you set a breakpoint in the application and you dig into the data structures content in Visual Studio Watch/Quick Watch panels for a few minutes. Behind the scene, the debugger has to freeze all application threads, including the ones from the thread pool responsible to answer health checks. As you have probably guessed already, the debugging session will not end well.\nThis is why the previous figure shows an arrow between Marathon and the Remote Debugger which acts as a proxy for the application health check. When a debugging session starts (i.e. when the WebSockets code executes the protocol), the Remote Debugger knows that it should answer OK instead of calling the application endpoint that might never answer.\nWhen remote debugging is not enough, how do you take a memory snapshot of the application? For example, if the health check does not answer after a series of retry, the Remote Debugger is calling the createdump tool installed with the .NET Core runtime to generate a dump file. Again, since the memory dump creation of 40+ GB applications could take several minutes, the same health check proxy mechanism has been put in place.\nOnce the dump file is created, the remote debugger let Marathon kill the application. But wait! This is not enough because in that case, the container will be cleaned up and the disk storage will disappear. Not a problem, after a dump has been generated by createdump, the file is sent to a “Dump Navigator” application (one per data center). This application is providing a simple HTML user interface to get high-level details of the application state such as thread stacks or managed heap content.\nOn Windows, we have built our own set of extension commands that allow us to investigate memory, threadpool starvation, thread contention, or timer leak scenarios in a Windows memory dump with WinDBG as shown during this NDC Oslo conference session. Note that they are also usable with LLDB on Linux. These commands are leveraging the ClrMD Microsoft library that gives you access to a live process or a memory dump in C#. Thanks to the Linux support that has been added to this library by Microsoft developers, it was easy to reuse the code into our Dump Navigator application. I definitively recommend to look at the API provided by ClrMD to automate and build your own tools. The long Criteo blog series is a good start in addition to my DotNext conference session.\nConclusion Even though some of our main applications moved to .NET Core running on containerized Linux with a large set of monitoring/debugging tools, the journey is not over. We are now testing the preview of .NET Core 5.0 (like we did for 3.0) to check if it supports Criteo specific needs. If this is not the case, we will figure out why and find solutions to integrate into the code. Same for the tools: I have started to add our extension commands to Microsoft dotnet-dump CLI tool used to analyze both Windows and Linux dumps.\nAt least we could say that we not only helped ourselves but also Microsoft to understand how far .NET Core could go and even the whole .NET Windows and Linux community. This is where Open Source shines!\nStay tuned for the next article in our mini-series. Don’t forget to head over to our previous articles of this journey:\nMigrating Arbitrage to Apache Mesos *Lessons learned from migrating our largest application to our container platform.*medium.comMoving .NET to Linux at Scale *The story of a multi-year migration: How we changed Criteo’s whole foundation.*medium.com\nInterested in joining the challenge? Head over to our career site!\nProduct, Research \u0026amp; Development | Criteo Careers careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-07-31_the-net-core-journey/1_WFbx_DPjik2EQydCiAahUA.jpeg","date":"2020-07-31","permalink":"https://chrisnas.github.io/posts/2020-07-31_the-net-core-journey/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2020-07-31_the-net-core-journey/1_WFbx_DPjik2EQydCiAahUA.jpeg\"\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eWhen I arrived at Criteo in late 2016, I joined the .NET Core “guild” (i.e. group of people from different teams dedicated to a specific topic). The first meeting I attended included Microsoft folks led by Scott Hunter (head of .NET program management) and including David Fowler (SignalR and ASP.NET Core). The goal for Criteo was simple: Moving a set of C# applications from Windows/.NET Framework to Linux/.NET Core. I guess that for Microsoft we were a customer with workloads that could be interesting to support with .NET Core. At that time, I did not realize how strong their commitment to work with us was. Our Open Source mindset was the selling point.\u003c/p\u003e","title":"The .NET Core Journey at Criteo"},{"content":" In the past two episodes of this series I have explained how to get a sampling of .NET application allocations and one way to get the call stack corresponding to the allocations; all with CLR events. In this last episode, I will detail how to transform addresses from the stack into methods name and possibly signature.\nFrom managed address to method signature In order to transform an address on the stack into a managed method name, you need to know where in memory (i.e. at which address) is stored the method JITted assembly code and what is its size:\nFor each JITted method, the MethodLoadVerbose/MethodDCStartVerboseV2 events are providing this information in addition to 3 properties to rebuild the full method name and signature (more on this later). I’m storing each method description as a MethodInfo into a MethodStore per process.\n1 2 3 4 5 6 7 8 9 10 public class PerProcessProfilingState : IDisposable { ... private readonly Dictionary\u0026lt;int, MethodStore\u0026gt; _methods = new Dictionary\u0026lt;int, MethodStore\u0026gt;(); public class MethodStore : IDisposable { // JITed methods information (start address + size + signature) private readonly List\u0026lt;MethodInfo\u0026gt; _methods; ... The only interesting part of the MethodInfo class is the computation of the full method name stored in the _fullName field:\n1 2 3 4 5 6 public class MethodInfo { private readonly ulong _startAddress; private readonly int _size; private readonly string _fullName; ... The ComputeFullName helper merges together the 3 properties given by the MethodxxxVerbose events including special processing for constructors:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 private string ComputeFullName(ulong startAddress, string namespaceAndTypeName, string name, string signature) { var fullName = signature; // constructor case: name = .ctor | namespaceAndTypeName = A.B.typeName | signature = ... (parameters) // --\u0026gt; A.B.typeName(parameters) if (name == \u0026#34;.ctor\u0026#34;) { return $\u0026#34;{namespaceAndTypeName}{ExtractParameters(signature)}\u0026#34;; } // general case: name = Foo | namespaceAndTypeName = A.B.typeName | signature = ... (parameters) // --\u0026gt; A.B.Foo(parameters) fullName = $\u0026#34;{namespaceAndTypeName}.{name}{ExtractParameters(signature)}\u0026#34;; return fullName; } private string ExtractTypeName(string namespaceAndTypeName) { var pos = namespaceAndTypeName.LastIndexOf(\u0026#34;.\u0026#34;, StringComparison.Ordinal); if (pos == -1) { return namespaceAndTypeName; } // skip the . pos++; return namespaceAndTypeName.Substring(pos); } Only the parameters (not the return type) are extracted from the “return type SPACE SPACE (parameters)” signature format:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 private string ExtractParameters(string signature) { var pos = signature.IndexOf(\u0026#34; (\u0026#34;); if (pos == -1) { return \u0026#34;(???)\u0026#34;; } // skip double space pos += 2; var parameters = signature.Substring(pos); return parameters; } With the starting address and the size of each JITted methods, it is easy to find the one corresponding to a given address on the stack: look for the MethodInfo where this address could be between the start address and the start address + the code size:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public string GetFullName(ulong address) { if (_cache.TryGetValue(address, out var fullName)) return fullName; // look for managed methods for (int i = 0; i \u0026lt; _methods.Count; i++) { var method = _methods[i]; if ((address \u0026gt;= method.StartAddress) \u0026amp;\u0026amp; (address \u0026lt; method.StartAddress + (ulong)method.Size)) { fullName = method.FullName; _cache[address] = fullName; return fullName; } } // look for native methods fullName = GetNativeMethodName(address); _cache[address] = fullName; return fullName; } For performance sake, the _cache dictionary property speeds up the process by keeping track of the address/full name mappings.\nIt is now time to look at the details of the GetNativeMethodName helper that takes care of the native functions scenario.\nThe native part of the symbols story Unlike for JITted methods, the CLR does not send events to describe native functions even for the CLR itself. Instead, you need to find a way to map a call stack address to a native function by yourself. Unlike Perfview, I will be using the dbghelp native API instead of DIA mostly because my scenario is to get the stacks while the applications are still running:\nAfter reading the march 2002 MSDN article about DBGHELP by Matt Pietrek, the updated symbols related Microsoft Docs and the dbghelp.h include a file from the Windows SDK, I wrote a C# wrapper around the dbghelp function needed to get a method name from an address in a process address space:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 internal static class NativeDbgHelp { // from C:\\Program Files (x86)\\Windows Kits\\10\\Debuggers\\inc\\dbghelp.h public const uint SYMOPT_UNDNAME = 0x00000002; public const uint SYMOPT_DEFERRED_LOADS = 0x00000004; [StructLayout(LayoutKind.Sequential)] public struct SYMBOL_INFO { public uint SizeOfStruct; public uint TypeIndex; // Type Index of symbol private ulong Reserved1; private ulong Reserved2; public uint Index; public uint Size; public ulong ModBase; // Base Address of module containing this symbol public uint Flags; public ulong Value; // Value of symbol, ValuePresent should be 1 public ulong Address; // Address of symbol including base address of module public uint Register; // register holding value or pointer to value public uint Scope; // scope of the symbol public uint Tag; // pdb classification public uint NameLen; // Actual length of name public uint MaxNameLen; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 1024)] public string Name; } [DllImport(\u0026#34;dbghelp.dll\u0026#34;, SetLastError = true)] public static extern bool SymInitialize(IntPtr hProcess, string userSearchPath, bool invadeProcess); [DllImport(\u0026#34;dbghelp.dll\u0026#34;, SetLastError = true)] public static extern uint SymSetOptions(uint symOptions); [DllImport(\u0026#34;dbghelp.dll\u0026#34;, SetLastError = true, CharSet = CharSet.Ansi)] public static extern ulong SymLoadModule64(IntPtr hProcess, IntPtr hFile, string imageName, string moduleName, ulong baseOfDll, uint sizeOfDll); // use ANSI version to ensure the right size of the structure // read https://docs.microsoft.com/en-us/windows/win32/api/dbghelp/ns-dbghelp-symbol_info [DllImport(\u0026#34;dbghelp.dll\u0026#34;, SetLastError = true, CharSet = CharSet.Ansi)] public static extern bool SymFromAddr(IntPtr hProcess, ulong address, out ulong displacement, ref SYMBOL_INFO symbol); [DllImport(\u0026#34;dbghelp.dll\u0026#34;, SetLastError = true)] public static extern bool SymCleanup(IntPtr hProcess); } Note that you will need to download the dbghelp.dll (SymSrv.dll if needed) from the Windows SDK and copy it next to your memory profiler binaries.\nThe usage of the dbghelp API is straightforward. First, for each new process, call SymSetOptions/SymInitialize** **with a handle of the process:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 private bool SymInitialize(IntPtr hProcess) { // read https://docs.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-symsetoptions for more details // maybe SYMOPT_NO_PROMPTS and SYMOPT_FAIL_CRITICAL_ERRORS could be used NativeDbgHelp.SymSetOptions( NativeDbgHelp.SYMOPT_DEFERRED_LOADS | // performance optimization NativeDbgHelp.SYMOPT_UNDNAME // C++ names are not mangled ); // https://docs.microsoft.com/en-us/windows/win32/api/dbghelp/nf-dbghelp-syminitialize // search path for symbols: // - The current working directory of the application // - The _NT_SYMBOL_PATH environment variable // - The _NT_ALTERNATE_SYMBOL_PATH environment variable // // passing false as last parameter means that we will need to call SymLoadModule64 // each time a module is loaded in the process return NativeDbgHelp.SymInitialize(hProcess, null, false); } private IntPtr BindToProcess(int pid) { try { _process = Process.GetProcessById(pid); if (!SymInitialize(_process.Handle)) return IntPtr.Zero; return _process.Handle; } catch (Exception x) { Console.WriteLine($\u0026#34;Error while binding pid #{pid} to DbgHelp:\u0026#34;); Console.WriteLine(x.Message); return IntPtr.Zero; } } In the case of protected processes, Process.GetProcessById might throw an exception. The _hProcess field storing the process handle will be cleaned up in the IDisposible.Dispose implementation of the MethodStore:\n1 2 3 4 5 6 7 8 public void Dispose() { if (_hProcess == IntPtr.Zero) return; _hProcess = IntPtr.Zero; _process.Dispose(); } After a process has been bound, each time one of its modules is loaded, SymLoadModule64 must be called. You can be notified of such a loaded module by enabling the Kernel provider with the ImageLoad keyword.\n1 2 3 4 5 session.EnableKernelProvider( KernelTraceEventParser.Keywords.ImageLoad | KernelTraceEventParser.Keywords.Process, KernelTraceEventParser.Keywords.None ); The handler attached to the ImageLoaded event will be called each time a dll gets loaded.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private void SetupListeners(ETWTraceEventSource source) { ... // get notified when a module is load to map the corresponding symbols source.Kernel.ImageLoad += OnImageLoad; } const int ERROR_SUCCESS = 0; private void OnImageLoad(ImageLoadTraceData data) { if (FilterOutEvent(data)) return; GetProcessMethods(data.ProcessID).AddModule(data.FileName, data.ImageBase, data.ImageSize); } public void AddModule(string filename, ulong baseOfDll, int sizeOfDll) { var baseAddress = NativeDbgHelp.SymLoadModule64(_hProcess, IntPtr.Zero, filename, null, baseOfDll, (uint)sizeOfDll); if (baseAddress == 0) { // should work if the same module is added more than once if (Marshal.GetLastWin32Error() == ERROR_SUCCESS) return; Console.WriteLine($\u0026#34;SymLoadModule64 failed for {filename}\u0026#34;); } } Now everything is in place to get a native function name from an address on the stack:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 private string GetNativeMethodName(ulong address) { var symbol = new NativeDbgHelp.SYMBOL_INFO(); symbol.MaxNameLen = 1024; symbol.SizeOfStruct = (uint)Marshal.SizeOf(symbol) - 1024; // char buffer is not counted // the ANSI version of SymFromAddr is called so each character is 1 byte long if (NativeDbgHelp.SymFromAddr(_hProcess, address, out var displacement, ref symbol)) { var buffer = new StringBuilder(symbol.Name.Length); // remove weird \u0026#34;$##\u0026#34; at the end of some symbols var pos = symbol.Name.LastIndexOf(\u0026#34;$##\u0026#34;); if (pos == -1) buffer.Append(symbol.Name); else buffer.Append(symbol.Name, 0, pos); // add offset if any if (displacement != 0) buffer.Append($\u0026#34;+0x{displacement}\u0026#34;); return buffer.ToString(); } // default value is the just the address in HEX return $\u0026#34;0x{address:x}\u0026#34;; } Note that I needed to remove some unexpected $## strings are the end of some symbols.\nThis is the last episode of the series about building your own memory profiler in C#. In case you missed the first episodes, check them out on Medium:\nBuild your own .NET memory profiler in C# — call stacks (2/2–1) *This post explains how to get the call stack corresponding to the allocations with CLR events.*medium.comBuild your own .NET memory profiler in C# *This post explains how to collect allocation details by writing your own memory profiler in C#.*medium.com\nResources Source code available on Github. Download Debugging Tools for Windows for dbghelp.dll and SymSrv.dll Matt Pietrek article in MSDN Magazine about DBGHELP Dbghelp samples -http://www.debuginfo.com/examples/dbghelpexamples.html Join the crowd!\nCareers at Criteo | Criteo jobs *Find opportunities everywhere. ​Choose your next challenge. Find the job opportunities at Criteo in Product, research \u0026amp;…*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-06-19_build-your-own-net/1_v73Nx1IxWIEQ3NzDZF0rsQ.png","date":"2020-06-19","permalink":"https://chrisnas.github.io/posts/2020-06-19_build-your-own-net/","summary":"\u003chr\u003e\n\u003cp\u003eIn the past two episodes of this series I have explained how to \u003ca href=\"/posts/2020-04-18_build-your-own-net/\"\u003eget a sampling of .NET application allocations\u003c/a\u003e and \u003ca href=\"/posts/2020-05-18_build-your-own-net/\"\u003eone way to get the call stack\u003c/a\u003e corresponding to the allocations; all with CLR events. In this last episode, I will detail how to transform addresses from the stack into methods name and possibly signature.\u003c/p\u003e\n\u003ch2 id=\"from-managed-address-to-method-signature\"\u003eFrom managed address to method signature\u003c/h2\u003e\n\u003cp\u003eIn order to transform an address on the stack into a managed method name, you need to know where in memory (i.e. at which address) is stored the method JITted assembly code and what is its size:\u003c/p\u003e","title":"Build your own .NET memory profiler in C# — call stacks (2/2–2)"},{"content":" In the previous episode of this series, you have seen how to get a sampling of .NET application allocations thanks to the AllocationTick and GCSampleObjectAllocation(High/Low) CLR events. However, this is often not enough to investigate unexpected memory consumption: you would need to know which part of the code is triggering the allocations. This post explains how to get the call stack corresponding to the allocations, again with CLR events.\nIntroduction If you look carefully at the payload of the TraceEvent object mapped by Microsoft TraceEvent library (not my fault if they have the same name) for each CLR event, you won’t see anything related to a call stack. However, in the TraceEvent sample 41, the following line looks promising:\nvar callStack = data.CallStack();\nwith data being a TraceEvent object received for each CLR event!\nThis CallStack method is an extension method provided by the TraceLog special kind of event source. You might not have noticed but I have used it in the AllocationTick code sample from the previous post. This class (and many more helper classes) is doing a lot of work to :\n“attach” a call stack to each CLR event; i.e. a list of addresses of assembly code to translate addresses into string symbols (method names or full signatures), listen to a bunch of JIT related events for managed methods (more on this later), using COM-based Debug Interface Access (a.k.a. DIA) and MetadataReaderProvider** **for native functions Notice that since events from all managed processes on the machine are handled by TraceLog, the internal cache for JITted methods description could consume a lot of memory. During my tests with two Visual Studio running, my test profiler consumed more than 500 MB before even handling call stacks. If you are in such an environment with multiple .NET processes, I will show how to “manually” get the same stacks (+ symbols in the next episode) with CLR events and a few methods from dbghelp.dll in a cheaper way.\nThe new provider (more on ClrRundown later), keywords and events need to be received to make all this work:\nTraceLog: the easy way As you have seen in the previous posts, the TraceEventSession class exposes a Source property of ETWTraceEventSource type. This source has event parsers properties from which you register handler methods that will be called when CLR events are received. Instead of directly using this source, you should wrap it with a TraceLogEventSource object that provides the same event parsers.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 await Task.Factory.StartNew(() =\u0026gt; { using (_session) { SetupProviders(_session); using (TraceLogEventSource source = TraceLog.CreateFromTraceEventSession(_session)) { SetupListeners(source); source.Process(); } } }); What’s new with providers? The code for mySetupProviders method is a little bit different from the previous post even though no new event listeners are needed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 private void SetupProviders(TraceEventSession session) { // Note: the kernel provider MUST be the first provider to be enabled // If the kernel provider is not enabled, the callstacks for CLR events are still received // but the symbols are not found (except for the application itself) // TraceEvent implementation details triggered when a module (image) is loaded session.EnableKernelProvider( KernelTraceEventParser.Keywords.ImageLoad | KernelTraceEventParser.Keywords.Process, KernelTraceEventParser.Keywords.None ); session.EnableProvider( ClrTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, // this is needed in order to receive AllocationTick_V2 event (ulong)( // required to receive AllocationTick events ClrTraceEventParser.Keywords.GC | ClrTraceEventParser.Keywords.Jit | // Turning on JIT events is necessary to resolve JIT compiled code ClrTraceEventParser.Keywords.JittedMethodILToNativeMap |// This is needed if you want line number information in the stacks ClrTraceEventParser.Keywords.Loader | // You must include loader events as well to resolve JIT compiled code. ClrTraceEventParser.Keywords.Stack ) ); // this provider will send events of already JITed methods session.EnableProvider(ClrRundownTraceEventParser.ProviderGuid, TraceEventLevel.Informational, (ulong)( ClrTraceEventParser.Keywords.Jit | // We need JIT events to be rundown to resolve method names ClrTraceEventParser.Keywords.JittedMethodILToNativeMap | // This is needed if you want line number information in the stacks ClrTraceEventParser.Keywords.Loader | // As well as the module load events. ClrTraceEventParser.Keywords.StartEnumeration // This indicates to do the rundown now (at enable time) )); } The kernel provider needs to be enabled with the ImageLoad and Process keywords in order to let TraceEvent detect when a process loads “images” (i.e. dlls) and at which address (needed to convert Relative Virtual Addresses (RVA) to addresses in the address space). Note that this provider must be enabled before any other provider or your code will trigger an exception. The CLR provider needs to be enabled with Jit, JittedMethodILToNativeMap, and Loader (in addition to the usual GC one). The Stack keyword has to be set on the same CLR provider to receive call stacks events for “normal” CLR event (more on this later) The CLR Rundown provider is enabled with the same Jit, JittedMethodILToNativeMap, and Loader keywords. That way, JIT events corresponding to already JITted methods will be received (not only the new ones). This is important because otherwise, you won’t be able to map these methods with the address in memory of their JITted native code in the case of processes that have been started before the profiler. This is the case for my AllocationTickProfiler sample. Callstacks and symbols Now, when an AllocationTick event is received, calling the CallStack extension method on the GCAllocationTickTraceData argument returns a TraceCallStack object. This class is a linked list of TraceCodeAddress representing each stack frame (i.e. address in assembly code). These classes are at the heart of TraceEvent and Perfview callstack management. The method names and signatures are retrieved behind the scene thanks to JIT events and the SymbolReader class that digs into .pdb files.\nYou first need to initialize a SymbolReader instance:\nSet the path to find the .pdb; including the Microsoft HTTP endpoint for public .NET versions symbols, Allow pdb next to the executable to be loaded. 1 2 3 4 5 6 7 8 9 10 // By default a symbol Reader uses whatever is in the _NT_SYMBOL_PATH variable. However you can override // if you wish by passing it to the SymbolReader constructor. Since we want this to work even if you // have not set an _NT_SYMBOL_PATH, so we add the Microsoft default symbol server path to be sure/ var symbolPath = new SymbolPath(SymbolPath.SymbolPathFromEnvironment).Add(SymbolPath.MicrosoftSymbolServerPath); _symbolReader = new SymbolReader(_symbolLookupMessages, symbolPath.ToString()); // By default the symbol reader will NOT read PDBs from \u0026#39;unsafe\u0026#39; locations (like next to the EXE) // because hackers might make malicious PDBs. If you wish ignore this threat, you can override this // check to always return \u0026#39;true\u0026#39; for checking that a PDB is \u0026#39;safe\u0026#39;. _symbolReader.SecurityCheck = (path =\u0026gt; true); Then, displaying a TraceCallStack from a received CLR event in a human-readable format is simple:\nGet one frame after the other from the linked list, If the CodeAddress field is not cached yet, load the symbols for its module, Display the FullMethodName field of the frame (or the address if not found). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 private void DumpStack(TraceCallStack callStack) { while (callStack != null) { var codeAddress = callStack.CodeAddress; if (codeAddress.Method == null) { var moduleFile = codeAddress.ModuleFile; if (moduleFile == null) { Debug.WriteLine($\u0026#34;Could not find module for Address 0x{codeAddress.Address:x}\u0026#34;); } else { codeAddress.CodeAddresses.LookupSymbolsForModule(_symbolReader, moduleFile); } } if (!string.IsNullOrEmpty(codeAddress.FullMethodName)) Console.WriteLine($\u0026#34; {codeAddress.FullMethodName}\u0026#34;); else Console.WriteLine($\u0026#34; 0x{codeAddress.Address:x}\u0026#34;); callStack = callStack.Caller; } } Note that the first frame in the linked list is the last on the stack (i.e. last executed method).\nAs I mentioned at the beginning of the post, I have been facing OutOfMemory errors due to the TraceEvent symbols management large memory usage when a few other .NET applications were running. Let’s see how to get the call stacks in a less memory consuming way.\nManually rebuilding the allocations call stack Instead of using the call stack and symbol management provided by TraceLog in TraceEvent, I would prefer to manually get them. If you remember the last post, thanks to GCSampledObjectAllocation CLR events, it is possible to have a sampling of the allocation size and count per process and per type. What I would like to add to the type picture is the list of call stacks leading to these allocations.\nHow to manually get CLR events call stack The first step is to understand how to get the CLR events call stacks. If you use the TraceLog-based code just presented, you should see the following kind of call stack:\nThe ETWCallout CLR helper function is in charge of sending a special event containing the call stack of other normal events from the four supported CLR providers. If you set the Stack keyword to the CLR provider, each time an event is sent by a thread, a ClrStackWalk event will be sent just after. It means after each SampleObjectAllocation event, a ClrStackWalk event containing the call stack will be immediately received. In fact, since an application will probably be using more than one thread, it is required to do the mapping between the two events on a per-thread basis.\nEach allocation event received by the OnSampleObjectAllocation handler contains the ThreadID property so it is easy to keep track of the last received allocation event per thread. In my case, the ProcessAllocations class stores this information in its _perThreadLastAllocation field:\n1 2 3 4 5 public class ProcessAllocations { ... private readonly Dictionary\u0026lt;string, AllocationInfo\u0026gt; _allocations; private readonly Dictionary\u0026lt;int, AllocationInfo\u0026gt; _perThreadLastAllocation; Now, each time a SampleObjectAllocation event is received, the id of the sending thread is passed to the updatedProcessAllocations.AddAllocation() method:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public AllocationInfo AddAllocation(int pid, ulong size, ulong count, string typeName) { if (!_allocations.TryGetValue(typeName, out var info)) { info = new AllocationInfo(typeName); _allocations[typeName] = info; } info.AddAllocation(size, count); // the last allocation is still here without the corresponding stack if (_perThreadLastAllocation.TryGetValue(pid, out var lastAlloc)) { Console.WriteLine(\u0026#34;no stack for the last allocation\u0026#34;); } // keep track of the allocation for the given thread // --\u0026gt; will be used when the corresponding call stack event will be received _perThreadLastAllocation[pid] = info; return info; } The _perThreadLastAllocation dictionary stores the AllocationInfo per thread. If an allocation happens, it is added into the dictionary. When a ClrStackWalk event is received for a given thread, the stack will be associated with the last AllocationInfo and removed from the dictionary. If some events are missed (it never happens during my tests but who knows), error message could be logged.\nThe ClrStackWalkTraceData argument received by the ClrStackWalk listener has a FrameCount property that returns the number of frames in the call stack. In addition, its InstructionPointer() method takes a frame position in the stack (starting at 0) and returns the address (in assembly code) at this position on the call stack.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private void OnClrStackWalk(ClrStackWalkTraceData data) { if (FilterOutEvent(data)) return; var callstack = BuildCallStack(data); GetProcessAllocations(data.ProcessID).AddStack(data.ThreadID, callstack); } private AddressStack BuildCallStack(ClrStackWalkTraceData data) { var length = data.FrameCount; AddressStack stack = new AddressStack(length); // frame 0 is the last frame of the stack (i.e. last called method) for (int i = 0; i \u0026lt; length; i++) { stack.AddFrame(data.InstructionPointer(i)); } return stack; } The AddressStack class returned by BuildCallStack stores the frames as a list of addresses so it can be stored in AllocationInfo.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 public class AddressStack { // the first frame is the address of the last called method private readonly List\u0026lt;ulong\u0026gt; _stack; public AddressStack(int capacity) { _stack = new List\u0026lt;ulong\u0026gt;(capacity); } // No need to override GetHashCode because we don\u0026#39;t want to use it as a key in a dictionary public override bool Equals(object obj) { if (obj == null) return false; var stack = obj as AddressStack; if (stack == null) return false; var frameCount = _stack.Count; if (frameCount != stack._stack.Count) return false; for (int i = 0; i \u0026lt; frameCount; i++) { if (_stack[i] != stack._stack[i]) return false; } return true; } public IReadOnlyList\u0026lt;ulong\u0026gt; Stack =\u0026gt; _stack; public void AddFrame(ulong address) { _stack.Add(address); } } This class overrides the Equals method for a single reason: I want to be able to detect when the “same” stack (i.e. with the exact same frame addresses) is received for a given type allocation. That way, I just need to keep a counter for each different AddressStack and not all call stacks in AllocationInfo. Remember that AllocationInfo is used to keep track of allocations per type:\n1 2 3 4 5 6 public class AllocationInfo { private readonly string _typeName; private ulong _size; private ulong _count; private List\u0026lt;StackInfo\u0026gt; _stacks; The StackInfo class contains an AddressStack and how many times it led to this type of allocation.\n1 2 3 4 5 6 7 8 9 10 11 12 13 public class StackInfo { private readonly AddressStack _stack; public ulong Count; internal StackInfo(AddressStack stack) { Count = 0; _stack = stack; } public AddressStack Stack =\u0026gt; _stack; } So, when a stack event is received, AddStack is called on the last AllocationInfo for the same thread:\n1 2 3 4 5 6 7 8 9 public void AddStack(int tid, AddressStack stack) { if (_perThreadLastAllocation.TryGetValue(tid, out var lastAlloc)) { lastAlloc.AddStack(stack); _perThreadLastAllocation.Remove(tid); return; } } The job of AllocationInfo.AddStack() the method is to check if a previous allocation was made with the same call stack (hence the Equals override). If this is the case, just increment the corresponding StackInfo count. Otherwise, create a new StackInfo for this call stack with a count set to 1.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 internal void AddStack(AddressStack stack) { var info = GetInfo(stack); if (info == null) { info = new StackInfo(stack); _stacks.Add(info); } info.Count++; } private StackInfo GetInfo(AddressStack stack) { for (int i = 0; i \u0026lt; _stacks.Count; i++) { var info = _stacks[i]; if (stack.Equals(info.Stack)) return info; } return null; } Knowing the address in code of each frame for all events call stack is nice but it would be much more useful to translate them into method names… You have to deal with two different cases: managed and native methods. I will cover these topics in the next episode.\nResources Source code available on Github. TraceEvent sample 41 source code. Missed the first part of this story? Check this out:\nBuild your own .NET memory profiler in C# *This post explains how to collect allocation details by writing your own memory profiler in C#.*medium.com\nInterested in joining our journey? Check this out:\nProduct, Research \u0026amp; Development | Criteo Careers careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-05-18_build-your-own-net/1_lYXf1qgB1ctzgi5_RKSDEw.jpeg","date":"2020-05-18","permalink":"https://chrisnas.github.io/posts/2020-05-18_build-your-own-net/","summary":"\u003chr\u003e\n\u003cp\u003eIn the \u003ca href=\"/posts/2020-04-18_build-your-own-net/\"\u003eprevious episode\u003c/a\u003e of this series, you have seen how to get a sampling of .NET application allocations thanks to the \u003cstrong\u003eAllocationTick\u003c/strong\u003e and \u003cstrong\u003eGCSampleObjectAllocation\u003c/strong\u003e(\u003cstrong\u003eHigh\u003c/strong\u003e/\u003cstrong\u003eLow\u003c/strong\u003e) CLR events. However, this is often not enough to investigate unexpected memory consumption: you would need to know which part of the code is triggering the allocations. This post explains how to get the call stack corresponding to the allocations, again with CLR events.\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2020-05-18_build-your-own-net/1_lYXf1qgB1ctzgi5_RKSDEw.jpeg\"\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIf you look carefully at the payload of the \u003ccode\u003eTraceEvent\u003c/code\u003e object mapped by Microsoft \u003cstrong\u003eTraceEvent\u003c/strong\u003e library (not my fault if they have the same name) for each CLR event, you won’t see anything related to a call stack. However, in the \u003cstrong\u003eTraceEvent\u003c/strong\u003e \u003ca href=\"https://github.com/microsoft/perfview/blob/master/src/TraceEvent/Samples/41_TraceLogMonitor.cs#L204\"\u003esample 41\u003c/a\u003e, the following line looks promising:\u003c/p\u003e","title":"Build your own .NET memory profiler in C# — call stacks (2/2–1)"},{"content":" In a previous post, I explained how to get statistics about the .NET Garbage Collector such as suspension time or generation sizes. But what if you would need more details about your application allocations such as how many times instances of a given type were allocated and for what cumulated size or even the allocation rate? This post explains how to get access to such information by writing your own memory profiler. The next one will show how to collect each sampled allocation stack trace.\nIntroduction I have already used commercial tools to get detailed information about allocated type instances in an application; Visual Studio Profiler, dotTrace, ANTS memory profiler, or Perfview to name a few. With these tools in mind, I started to look at the .NET profiler API documentation and it reminded me the first time I read about the .NET profiler API. It was in December 2001 in Matt Pietrek’s MSDN Magazine article (I still have the paper version). When your application is starting, based on an environment variable, the .NET Framework (and now .NET Core) runtime is loading a profiler COM object that implements a specific ICorProfilerCallback interface (today, runtimes are supporting the 9th version ICorProfilerCallback9 interface). The methods of this interface will be called by the runtime at specific moments during the application lifetime. For example, the ObjectAllocated method is called each time an instance of a class is allocated: perfect for the job but it requires going back to COM and writing native code. Don’t be scared: I won’t go that way :^)\nHowever, if you would like to get more details about writing your own .NET profiler in C or C++, I would recommend looking at the Microsoft ClrProfiler initial .NET Framework implementation and also Pavel Yosifovich DotNext session about Writing a .NET Core cross platform profiler in an hour with the corresponding (more recent and cross platform) source code.\nInstead, several events that are emitted by the CLR are providing interesting details:\nThe GCSampledObjectAllocation events payload provides a type ID instead of a plain text type name. In order to retrieve the type name given its ID, we need to listen to TypeBulkType event that contains the mapping as I described in my post about finalizers. This is why the last two GCHeapAndTypeNames and Type keywords are needed.\nRemember that if both GCSampledObjectAllocationLow and GCSampledObjectAllocationHigh keywords are set, an event will be received for EACH allocation. This could be a performance issue both for the monitored application and the profiler. I would recommend starting with either low or high (more on this later).\nLast but not least, enabling at least one of these keywords is also switching the CLR to use “slower” allocators. This is why you should check that it does not impact your application performance. These slower allocators are also used when your ICorProfilerCallback.Initialize method calls SetEventMask with COR_PRF_ENABLE_OBJECT_ALLOCATED flag to receive allocation notifications.\nWhen you use Perfview for memory investigation, you are relying on these events without knowing it. In the Collect/Run dialog, three checkboxes are defining how to get the memory profiling details:\n.NET Alloc: use a custom native C++ ICorProfilerCallback implementation (noticeable impact on the profiled application performance). .NET SampAlloc: use the same custom native profiler but with sampled events. ETW .NET Alloc: use GCSampledObjectAllocationHigh events In all cases, the profiled application needs to be started after the collection begins.\nHow to listen to allocation events As I have already explained in previous posts, the Microsoft TraceEvent nuget helps you listening to CLR events. First, you create a TraceEventSession and setup the providers you want to receive events from:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 session.EnableProvider( ClrTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, // this is needed in order to receive AllocationTick_V2 event (ulong)( // required to receive AllocationTick events ClrTraceEventParser.Keywords.GC | // the CLR source code indicates that the provider must be set before the monitored application starts ClrTraceEventParser.Keywords.GCSampledObjectAllocationLow | //ClrTraceEventParser.Keywords.GCSampledObjectAllocationHigh | // required to receive the BulkType events that allows // mapping between the type ID received in the allocation events ClrTraceEventParser.Keywords.GCHeapAndTypeNames | ClrTraceEventParser.Keywords.Type | ); Second, you set up the handlers for the events you are interested in:\n1 2 3 4 5 6 7 8 9 10 private void SetupListeners(TraceLogEventSource source) { source.Clr.GCAllocationTick += OnAllocationTick; source.Clr.GCSampledObjectAllocation += OnSampleObjectAllocation; // required to receive the mapping between type ID (received in GCSampledObjectAllocation) // and their name (received in TypeBulkType) source.Clr.TypeBulkType += OnTypeBulkType; } And lastly, the processing of received events is done in a dedicated thread until the session is disposed of:\n1 2 3 4 5 6 7 8 9 10 await Task.Factory.StartNew(() =\u0026gt; { using (_session) { SetupProviders(_session); SetupListeners(_session.Source); _session.Source.Process(); } }); Now let’s see the difference between the two sets of events.\nThe AllocationTick way My first idea was to use the AllocationTick event because it seemed easy: one sampled event with a size, a type name, and LOH/ephemeral kind. However, how this sampling works makes it impossible to get an exact per type allocated size. Let’s have a look at this list of events received from a WPF test application:\nSmall | 105444 : FreezableContextPair[] Small | 111908 : FreezableContextPair[] Small | 106720 : System.String Small | 102488 : System.String Small | 107028 : System.TimeSpan[] Small | 106100 : System.String All allocations were small (i.e. not in the LOH: \u0026lt; 85.000 bytes) and the second column gives the cumulated size of all allocations to reach the 100 KB threshold but not for this particular type! There is no easy way to make a valid guess of the specific last allocation size for which we get the type name.\nFor example, the first array of FreezableContextPair triggered the event for a cumulated size of 105.444 bytes. But how big was this array? We don’t know: could have been 100.000 because only 5444 bytes were allocated before or only 10444 bytes because 95.000 were allocated before. It would have been so useful that the size of the last allocated object would be passed in the event payload…\nIt is a little bit different (but not that better) for objects allocated in LOH because they have to be at least 85.000 bytes long. For example, allocate 4-byte arrays, each one 85.000 bytes long and let’s see the corresponding events:\nLarge | 170064 : System.Byte[] Large | 170064 : System.Byte[] Two AllocationTick events are received with 170064 as cumulated size. Still hard to figure out what was the size of the last allocated array: the only thing we know is that it was larger (or equal) to 85.000 bytes because it was allocated in LOH.\nFor larger objects, it might seem a little bit more accurate. Let’s allocate 2 byte arrays, each one 110.000 bytes long:\nLarge | 195064 : System.Byte[] Large | 110032 : System.Byte[] There are ~85.000 bytes difference between the two events even though the same 110.000 bytes were allocated. You could remove 85.000 bytes from the value and have an approximation of the LOH allocated object: the larger the allocation the less the error. But still: could be 85.000 size error…\nSo we won’t be able to rely on the size provided by the AllocationTick event; only the type name. In addition, you get a view of objects allocated in LOH. Maybe the other events will provide better results.\nThe GCSampledObjectAllocation way When an object is allocated by the GC allocator, a GCSampledObjectAllocation event is emitted under certain conditions:\nBoth GCSampledObjectAllocationLow and GCSampledObjectAllocationHigh keywords are set on the CLR provider, The object size is larger than 10.000 bytes, At least 1000 instances of the type have been allocated, Just before the application exits, current statistics for all types are flushed, A complicated piece of code decides based on time since the last event and the type allocation rate. Picking one or the other keyword changes the maximum number of milliseconds between two events for a given type:\nHigh (10 ms) : 100 events / second Low (200 ms) : 5 events / second You should use low or high depending on the monitored application memory allocation workload to avoid impacting too much the profiler (and even the monitored application performance)\nThe interesting feature of these events is that, for a given type, the payload contains both the number of allocated instances since the last event and the cumulated size of these instances. Let’s take the same allocation of 4 arrays of byte, each 85000 long:\n226 | 103616 : System.Byte[] 1 | 85012 : System.Byte[] 1 | 85012 : System.Byte[] 1 | 85012 : System.Byte[] This time, we get the exact count in the first column (ObjectCountForTypeSample) and the exact cumulated size in the second column (TotalSizeForTypeSample). If the count is 1, we have the exact size of that allocation and if it is bigger than 85000 bytes, we know it has been allocated in the LOH. Same accuracy for the 2-byte array of 110.000 elements:\n198 | 123552 : System.Byte[] 1 | 110012 : System.Byte[] Sounds good. However, you have to remember that profiled applications need to be started after the session was created: it means that you can’t write a tool that will listen to a specific process ID like with AllocationTick. Three dictionaries are used by PerProcessProfilingState to keep track of per type allocations, type ID mappings, and process names:\n1 2 3 4 5 6 7 8 9 10 public class PerProcessProfilingState { private readonly Dictionary\u0026lt;int, string\u0026gt; _processNames = new Dictionary\u0026lt;int, string\u0026gt;(); private readonly Dictionary\u0026lt;int, ProcessTypeMapping\u0026gt; _perProcessTypes = new Dictionary\u0026lt;int, ProcessTypeMapping\u0026gt;(); private readonly Dictionary\u0026lt;int, ProcessAllocationInfo\u0026gt; _perProcessAllocations = new Dictionary\u0026lt;int, ProcessAllocationInfo\u0026gt;(); public Dictionary\u0026lt;int, string\u0026gt; Names =\u0026gt; _processNames; public Dictionary\u0026lt;int, ProcessTypeMapping\u0026gt; Types =\u0026gt; _perProcessTypes; public Dictionary\u0026lt;int, ProcessAllocationInfo\u0026gt; Allocations =\u0026gt; _perProcessAllocations; } The SampledObjectAllocationMemoryProfiler class uses it for the events processing:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public class SampledObjectAllocationMemoryProfiler { private readonly TraceEventSession _session; private readonly PerProcessProfilingState _processes; // because we are not interested in self monitoring private readonly int _currentPid; private int _started = 0; public SampledObjectAllocationMemoryProfiler(TraceEventSession session, PerProcessProfilingState processes) { _session = session; _processes = processes; _currentPid = Process.GetCurrentProcess().Id; } The constructor of the profiler keeps track of its own process ID in _currentPid to skip its own events.\nGathering type mapping The processing of TypeBulkType events is quite straightforward: store the type ID/name association into a per-process dictionary:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 private void OnTypeBulkType(GCBulkTypeTraceData data) { if (FilterOutEvent(data)) return; ProcessTypeMapping mapping = GetProcessTypesMapping(data.ProcessID); for (int currentType = 0; currentType \u0026lt; data.Count; currentType++) { GCBulkTypeValues value = data.Values(currentType); mapping[value.TypeID] = value.TypeName; } } private ProcessTypeMapping GetProcessTypesMapping(int pid) { ProcessTypeMapping mapping; if (!_processes.Types.TryGetValue(pid, out mapping)) { AssociateProcess(pid); mapping = new ProcessTypeMapping(pid); _processes.Types[pid] = mapping; } return mapping; } Remember that I choose to skip events from the current process detected by FilterOutEvent().\nHow to get process names Even though each event contains the ID of the emitting process, it would be better to display its name instead. You could use Process.GetProcessById(pid).ProcessName when analyzing the details but the process might be long gone at that time.\nAnother solution would be to enable the Kernel ETW provider and listen to the ProcessStart event. The ImageFileName field of the payload contains the process filename with the extension. However, it is obviously not working on Linux.\nThe easiest solution is to use GetProcessById but just when you receive the first type mapping for a given process. This is the role of the AssociateProcess method called in GetProcessTypesMapping shown previously:\n1 2 3 4 5 6 7 8 9 10 11 12 private void AssociateProcess(int pid) { try { _processes.Names[pid] = Process.GetProcessById(pid).ProcessName; } catch (Exception) { Console.WriteLine($\u0026#34;? {pid}\u0026#34;); // we might not have access to the process } } It is now time to process allocation events.\nCollecting allocation details The GCSampledObjectAllocationTraceData payload contains the size and count of instances since the last event. We just need to store them for the corresponding process:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 private void OnSampleObjectAllocation(GCSampledObjectAllocationTraceData data) { if (FilterOutEvent(data)) return; GetProcessAllocations(data.ProcessID) .AddAllocation( (ulong)data.TotalSizeForTypeSample, (ulong)data.ObjectCountForTypeSample, GetProcessTypeName(data.ProcessID, data.TypeID) ); } private string GetProcessTypeName(int pid, ulong typeID) { if (!_processes.Types.TryGetValue(pid, out var mapping)) { return typeID.ToString(); } var name = mapping[typeID]; return string.IsNullOrEmpty(name) ? typeID.ToString() : name; } The AddAllocation() helper method is simply accumulating these numbers for a given type in the ProcessAllocationInfo associated to the related process.\nDisplaying the results When the profiling session ends, it is easy to show the allocated count and size per type:\nThe code is using a Linq syntax to get top allocations sorted either by count or by size:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 private static void ShowResults(string name, ProcessAllocationInfo allocations, bool sortBySize, int topTypesLimit) { Console.WriteLine($\u0026#34;Memory allocations for {name}\u0026#34;); Console.WriteLine(); Console.WriteLine(\u0026#34;---------------------------------------------------------\u0026#34;); Console.WriteLine(\u0026#34; Count Size Type\u0026#34;); Console.WriteLine(\u0026#34;---------------------------------------------------------\u0026#34;); IEnumerable\u0026lt;AllocationInfo\u0026gt; types = (sortBySize) ? allocations.GetAllocations().OrderByDescending(a =\u0026gt; a.Size) : allocations.GetAllocations().OrderByDescending(a =\u0026gt; a.Count) ; if (topTypesLimit != -1) types = types.Take(topTypesLimit); foreach (var allocation in types) { Console.WriteLine($\u0026#34;{allocation.Count,9} {allocation.Size,11} {allocation.TypeName}\u0026#34;); } } Another usage could be a long-running monitoring system that shows the allocation rate: a nice complement to the other GC metrics. However, compared to the other profilers, one important feature is missing: if an unexpected number of instances are created, how to know which part of the code is responsible for the spike?\nThe next post will explain how to enhance such a sampled memory profiler with call stacks per sampled allocation.\nResources Source code available on Github. Spying on .NET Garbage Collector with TraceEvent Pavel Yosifovich — Writing a .NET Core cross-platform profiler in an hour Original Microsoft ClrProfiler source code and documentation Like what you read? Don’t forget to check out part 2 on this topic:\nBuild your own .NET memory profiler in C# — call stacks (2/2–1) *This post explains how to get the call stack corresponding to the allocations with CLR events.*medium.com\nInterested in joining our journey? Check this out:\nProduct, Research \u0026amp; Development | Criteo Careers *Product, Research \u0026amp; Development at Criteo. At Criteo, come and meet our teams and join our R \u0026amp; D and also enjoy…*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-04-18_build-your-own-net/1_8RzRelU9Rgux0TJRdFhzzw.png","date":"2020-04-18","permalink":"https://chrisnas.github.io/posts/2020-04-18_build-your-own-net/","summary":"\u003chr\u003e\n\u003cp\u003eIn a \u003ca href=\"/posts/2019-05-28_spying-on-net-garbage/\"\u003eprevious post\u003c/a\u003e, I explained how to get statistics about the .NET Garbage Collector such as suspension time or generation sizes. But what if you would need more details about your application allocations such as how many times instances of a given type were allocated and for what cumulated size or even the allocation rate? This post explains how to get access to such information by writing your own memory profiler. The next one will show how to collect each sampled allocation stack trace.\u003c/p\u003e","title":"Build your own .NET memory profiler in C# — Allocations (1/2)"},{"content":" Introduction Last Wednesday was a great day for Kevin and myself: We spent a lot of time investigating the reasons why a test was failing. Let’s share with you these frustrating but interesting minutes.\nOne of our colleagues came to us because an integration test would get stuck in some specific conditions. Here is the simplified code of the service that is supposed to do some background processing until it is stopped:\n1 2 3 4 5 6 7 8 public class Service { private CancellationTokenSource _cancellationSource; private Task _backgroundProcessing; private Task _cleanup; ... } In the test, the service is created: Two Task instances are created in the constructor (for background processing irrelevant for this discussion) that (1) are started when the service starts and (2) are canceled when the service is stopped. A CancellationTokenSource is used to cancel the tasks if needed.\n1 2 3 4 5 6 7 8 9 public Service() { _cancellationSource = new CancellationTokenSource(); var cancellationToken = _cancellationSource.Token; _backgroundProcessing = new Task(DoStuffInTheBackground, cancellationToken, TaskCreationOptions.LongRunning); _cleanup = new Task(DoStuffInTheBackground, cancellationToken, TaskCreationOptions.LongRunning); } However, in the specific conditions of this test, the Start method would never be called, skipping straight to the Stop method.\n1 2 3 4 5 6 7 8 9 10 11 12 public void Start() { _backgroundProcessing.Start(); _cleanup.Start(); } public async Task Stop() { _cancellationSource.Cancel(); await _backgroundProcessing; await _cleanup; } The task is never started because the test skips the Service.Start() method, but it should transition to “Cancelled” state as soon as the CancellationTokenSource associated with the CancellationToken passed to the Task gets canceled. In that particular situation, we expect the await _backgroundProcessing code to immediately return in the Stop() method.\nIn our colleague Visual Studio, the debugger never came back from await _backgroundProcessing, indicating that the task never completed…\nReproduce the problem I wanted to un-validate any side effect related to our test framework or custom Criteo libraries and confirm my understanding of task cancellation, so I wrote a small Console application with the same 4.7.2 version of .NET Framework with the following code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static async Task ReproduceWorking() { var source = new CancellationTokenSource(); var token = source.Token; var t = new Task( () =\u0026gt; Console.WriteLine(\u0026#34;Done\u0026#34;), token, TaskCreationOptions.LongRunning); source.Cancel(); try { await t; } catch (TaskCanceledException x) { Console.WriteLine(x.ToString()); } } And guess what? I got the expected TaskCancellationException:\nSystem.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.GetResult() at TaskCancellation.Program.d__2.MoveNext()\nThe next step was to double-check the understanding of how tasks are working related to cancellation. So I set a breakpoint after the task is instantiated\nAnd its status is Created.\nDoing the same after Cancel() is called on the CancellationTokenSource, gives the expected Canceled status.\nSo, let’s do the same when debugging the test code:\nIt gives the same Created status after the task creation but after canceling the source:\n…the status does not switch to Canceled like in my Console repro!\nLooking for cancellation The only thing that came to my mind was: maybe the cancellation token is not taken into account. However, a CancellationToken is just a struct that keeps a reference to its CancellationTokenSource. It should be easy in Visual Studio debugger to double-check that our task keeps track of the token somewhere and goes back to the cancellation source. Well… the token is not kept as a field of the task but stored inside the m_contingentProperties field deep during the task construction code path.\nLet’s look at the value in the Quick Watch after the cancellation source gets canceled:\nIt sounds like the cancellation token is not canceled… But if we look at the source Token property of our canceled CancellationTokenSource,\nwe don’t have the same value for IsCancellationRequested!\nIt’s like we don’t look at the same cancellation source… To find out, we just have to compare the reference to our CancellationTokenSource with the one we see in the m_contingentProperties token of the task. To achieve that, we could copy the expression from QuickWatch, paste it into the Debug | Windows | Memory… pane and press ENTER to get the address where the object is stored in memory\nAfter having done the same with _cancellationSource, I did not get the same address:\nIt means that we were dealing with two different instances of CancellationTokenSource.\nBut this might be too C++ish for you… Kevin prefers leveraging the Make Object ID feature of the C# debugger. You simply right-click the Data Tip of the cancellation source and select Make Object ID:\nOnce this is done, a numeric identifier is displayed for this instance in Data Tip and any Watch window (#1 in this screenshot):\nWhen we looked at the cancellation source of the token stored by the task,\nwe didn’t see any ID so it was not the same object.\nSo we decided to use Make Object ID on this m_source that became marked as #2.\nAnd when we looked at the m_source of the second cleanup Task, we realized that it was the same object but not the one we created!\nWe started to think that we were becoming crazy. So, let’s restart from the beginning and follow the CancellationTokenSource from its creation because we are sure that we passed a valid cancellation token (linked to this source) to the task constructor. Or… Did we? The QuickWatch gives a different answer just after the task gets created compared to what we’ve seen already: a token with an empty source property now!\nI closed the QuickWatch pane and reopened it for Kevin to confirm. And like in a nightmare, the source was not null anymore…\nVisual Studio must be responsible for that weird behavior!\nKevin remembered the ”Enable property evaluation” settings in the Options dialog. If it is checked (which is the default), it means that the Debugger would fetch the value of an instance field and then call the Getter of each property in order to display its value.\nHowever, if you uncheck it, only the fields are displayed. So in our case, we then always got a null m_contingentProperties field (and, as expected, all property would not be displayed):\nThe m_contingentProperties is initialized in EnsureContingentPropertiesInitialized() when called by AssignCancellationToken() from the TaskContructorCore() helper used by the Task constructor but it did not seem to be the case because it was definitively null…\nKevin decided to stop at the CancellationTokenSource constructor with a new breakpoint (more on how to set a breakpoint on a .NET Framework method soon) to see where the one shown in the Debugger was created but the breakpoint was never hit. So the CancellationTokenSource #2 must have been created even before our own was created by our code. In fact, a static CancellationTokenSource is created and is set to m_source when InitializeDefaultSource() gets called by one of the Getter. This explains why we saw the same instance #2 in both tasks token.\nTo sum up, we were now sure that the passed token was not “received” by the Task.\nEureka! Maybe there is a magic trick done by the .NET Framework to lazily set the token source after the creation of the task. However, we did not find such a code in the .NET Framework and this is not what we see in our repro.\nBack to the basics: are we sure that we are executing the code we think is executed? We looked for mscorlib in the Debug | Windows | Modules pane,\nand we opened it with a decompiler: the code of the methods called during the Task** **construction was the same as the one shown in https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs.\nNext, in order to better follow the execution and the passing of parameters (including our token), we decided to set breakpoints on Task private method responsible for its initialization.\ninternal void TaskConstructorCore(object action, object state, CancellationToken cancellationToken, TaskCreationOptions creationOptions, InternalTaskOptions internalOptions, TaskScheduler scheduler) In the Debug | Windows | Breakpoints pane, click New | Function Breakpoint…\nand type the full name of the method. This is working even for a method of a class defined in the .NET Framework assembly for which you do not have the source code:\nWe checked that the breakpoints were well set (i.e. no typo in the full name) by looking at the filled red circle:\nThe cancellationToken parameter should contain the token that we passed at Task creation. Unfortunately, the QuickWatch pane displayed a “cannot read memory” error that we never saw in Visual Studio before!\nAt that time, we thought we were doomed but we looked at the Call Stack pane and we realized that the code was calling the wrong Task constructor:\npublic Task(Action\u0026lt;object\u0026gt; action, object state, TaskCreationOptions creationOptions) Its signature is compatible with our code:\n_backgroundProcessing = new Task(DoStuffInTheBackground, cancellationToken, TaskCreationOptions.LongRunning); and this is why the compiler did not complain.\nOur CancellationToken was passed as the state parameter is given directly to our DoStuffInTheBackground Action: the created Task had no idea that it was supposed to be its CancellationToken.\nNote that if we had noticed the Auto Completion (Ctrl + Shift + Space) hint, we might have figured out the root cause much sooner…\nThe fix was straightforward; just using the right constructor:\npublic Task(Action\u0026lt;object\u0026gt; action, object state, CancellationToken cancellationToken, TaskCreationOptions creationOptions) that accepts both a state for the callback and a CancellationToken for the Task to create:\n_backgroundProcessing = new Task(DoStuffInTheBackground, cancellationToken, cancellationToken, TaskCreationOptions.LongRunning); Under the debugger, we validated that the source was now the expected one:\nand the test did not hang anymore.\nIf, from the beginning, we would have been able to step into .NET Framework compiled code as we do with Jetbrains Resharper integration in Visual Studio, we would have found the issue almost immediately. Thankfully, Microsoft has just announced decompilation of C# code made easy with Visual Studio.\nWe wish we had it last Wednesday…\nInterested in reading more about Christophe’s \u0026amp; Kevin’s work? Check out their latest articles:\nBuild your own .NET memory profiler in C# *This post explains how to collect allocation details by writing your own memory profiler in C#.*medium.comSwitching back to the UI thread in WPF/UWP, in modern C# Leveraging the async machinery to transparently switch to the UI thread when neededmedium.com\nIf you are looking for a change and would love to work with these two, head over to our careers page and let us know if there is something that sounds like you!\nProduct, Research \u0026amp; Development | Criteo Careers *Come and meet our teams …*careers.criteo.com\n","cover":"https://chrisnas.github.io/posts/2020-02-21_debugging-wednesday-cancel-thi/1_g3mc-sEc56gtAUP_rR4HHA.jpeg","date":"2020-02-21","permalink":"https://chrisnas.github.io/posts/2020-02-21_debugging-wednesday-cancel-thi/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2020-02-21_debugging-wednesday-cancel-thi/1_g3mc-sEc56gtAUP_rR4HHA.jpeg\"\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eLast Wednesday was a great day for \u003ca href=\"https://twitter.com/KooKiz\"\u003eKevin\u003c/a\u003e and myself: We spent a lot of time investigating the reasons why a test was failing. Let’s share with you these frustrating but interesting minutes.\u003c/p\u003e\n\u003cp\u003eOne of our colleagues came to us because an integration test would get stuck in some specific conditions. Here is the simplified code of the service that is supposed to do some background processing until it is stopped:\u003c/p\u003e","title":"Debugging Wednesday at Criteo — Cancel this task!"},{"content":" This post of the series details how to look into your threads stack with ClrMD.\nIntroduction It’s been a long time (see the resources at the end) since I’ve been discussing what ClrMD could bring to .NET developers/DevOps! My colleague Kevin just wrote an article about how to emulate SOS DumpStackObjects command both on Windows and Linux with ClrMD. This implementation lists the objects on the stack but without their values (like strings content for example) nor the stack frames corresponding to the method calls.\nThe rest of the post will show you, with ClrMD, how to get an higher view, closer to what the SOS ClrStack command could provide.\nLet’s take this simple application as an example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 class Program { static void Main(string[] args) { Host h = new Host(); h.Base(); } } class Host { public void Base() { int iValue = Guid.NewGuid().GetHashCode(); bool bValue = (iValue % 2) == 0; string parameter = Guid.NewGuid().ToString().Substring(0,1) + \u0026#34;_1234567890\u0026#34;; Console.WriteLine(parameter + \u0026#34; = \u0026#34; + First(iValue, bValue, parameter)); } private int First(int iValue, bool bValue, string parameter) { Guid guid = Guid.NewGuid(); return Second(bValue, guid) / 2; } private int Second(bool bValue, Guid guid) { return Third(guid).Length; } private string Third(Guid guid) { Console.WriteLine($\u0026#34;call procdump -ma {Process.GetCurrentProcess().Id}\u0026#34;); Console.ReadLine(); return $\u0026#34;{guid.GetHashCode()}#{guid.GetHashCode()}#{guid.GetHashCode()}\u0026#34;; } } As you can see, I’ve mixed value and reference types as parameters and local variables up to the call to the Third method that displays the procdump command line to execute in order to generate a memory dump of the process.\nUse WinDBG + SOS Luke! When you open it with WinDBG and load SOS, here is the result of the dso command:\nThe clrstack command shows the stacked method calls:\nAnd if you use the -a parameter, you will get methods with their parameters and local variables (or -p for parameters only and -l for local variables only):\nIt is weird that SOS implementation does not give the type of both the parameters and locals. But wait! While researching for this post, I looked at the SOS implementation (now in the strike.cs file moved from the coreclr to the diagnostics repository) to find this nice comment:\nSo I tried clrstack with -i and I got the types for parameters (and locals unlike what the comments implies):\nEven though clrstack supports the -all flag to dump the call stack of all managed threads, you might need to do your own automatic analysis on hundreds of threads and this is where ClrMD shines.\nMerging methods and parameters/locals When I read Kevin’s post, I immediately thought about adding the method call on the stack based on the work I’ve done in March 2019 to implement the pstacks tool. At that time, my goal was to aggregate the call stacks of a large number of threads in order to find out pattern of blocked threads, sharing the “same” call stacks. Visual Studio provides a great “Parallel Stacks” pane but I needed it for both Windows and Linux.\nTo list all the call stack with ClrMD, you simply enumerate the managed threads and for each one, its StackTrace property contains the list of StackFrame objects corresponding to each method call.\nThe StackPointer property of each frame contains the address of the frame in the call stack, allowing a mapping of the method call with its parameters and locals:\nAs always with stacks, lower addresses correspond to the last things added to the stack (i.e. last called method). While checking between what is shown by SOS, the parameters/locals addresses and frame stack pointers, you realize that all objects at an address in the stack equal or below the StackPointer of a frame are either parameters or local variables of the frame method.\nEven better, for non static method, you can guess what is the this* *implicit parameter if the address is the same as the frame StackPointer; shown with the green = sign in the previous screenshot and prefixed by \u0026gt; in Kevin’s updated code that merges the method calls to the parameters and locals:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 for (ulong ptr = stackTop; ptr \u0026lt;= stackLimit; ptr += (ulong)runtime.PointerSize) { // look for the frame corresponding to the current position in the stack if (currentFrame.StackPointer \u0026lt;= ptr) { Console.WriteLine(FormatFrame(currentFrame)); nFrame++; if (nFrame \u0026lt; frames.Count) { currentFrame = frames[nFrame]; } else { break; } } ulong obj; if (!runtime.ReadPointer(ptr, out obj)) { break; } if (!IsInHeap(heap, obj)) { continue; } var type = heap.GetObjectType(obj); if (type == null || type.IsFree) { continue; } // try to find implicit \u0026#34;this\u0026#34; parameter in case of non-static method var myFrame = (nFrame == 0) ? currentFrame : frames[nFrame-1]; separator = ((myFrame.StackPointer == ptr) \u0026amp;\u0026amp; (myFrame.Method != null) \u0026amp;\u0026amp; (!myFrame.Method.IsStatic)) ? \u0026#34;\u0026gt;\u0026#34; : \u0026#34; \u0026#34;; Console.Write($\u0026#34;{ptr:x16} {separator} \u0026#34;); DumpObject(heap, type, (ulong)obj); } The FormatFrame helper method simply prefix static methods with # instead of | for instance methods:\nUnfortunately, I did not find any way with ClrMD to make the difference between parameters and locals. Based on what you can see in SOS implementation of this part of the clrstack command, it relies on the EnumerateArguments and EnumetateLocalVariables methods of ICorDebugILFrame which is not exposed by ClrMD. There is another undocumented implementation based on private interfaces I could not leverage neither. For a larger discussion around stack walking in .NET, read this great post by Matt Warren.\nAlso, without any explicit access to specific parameter or local, I did not find a way to get the value of primitive and value type instances stored on the stack. However, it is still possible to get them for boxed ones and reference type instances such as string for example.\nGetting instances from the stack In the last code excerpt, I did not describe the DumpObject helper method used to display an object on the stack. The implementation provided by Kevin was used to show the address and the type of the object:\n1 Console.WriteLine($\u0026#34;{objAddress:x16} {type.Name}\u0026#34;); The next step would be to display value for primitive types such as numbers, boolean, string and even array size:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 private static void DumpObject(ClrHeap heap, ClrType type, ulong objAddress) { // get value for simple types string valueOrAddress = (type.Name == \u0026#34;System.Char\u0026#34;) ? $\u0026#34;{type.GetValue(objAddress),16}\u0026#34; : (type.Name == \u0026#34;System.String\u0026#34;) ? $\u0026#34;{FormatString(type.GetValue(objAddress).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Bool\u0026#34;) ? $\u0026#34;{FormatString(((bool)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Byte\u0026#34;) ? $\u0026#34;{FormatString(((byte)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.SByte\u0026#34;) ? $\u0026#34;{FormatString(((sbyte)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Decimal\u0026#34;) ? $\u0026#34;{FormatString(((decimal)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Double\u0026#34;) ? $\u0026#34;{FormatString(((double)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Single\u0026#34;) ? $\u0026#34;{FormatString(((float)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Int32\u0026#34;) ? $\u0026#34;{FormatString(((int)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.SInt32\u0026#34;) ? $\u0026#34;{FormatString(((uint)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.Int64\u0026#34;) ? $\u0026#34;{FormatString(((long)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.Name == \u0026#34;System.SInt64\u0026#34;) ? $\u0026#34;{FormatString(((ulong)type.GetValue(objAddress)).ToString())}\u0026#34; : (type.IsArray) ? $\u0026#34;{FormatString(GetArrayAsString(type, objAddress))}\u0026#34; : $\u0026#34;{objAddress:x16}\u0026#34;; // work also for IntPtr Console.WriteLine($\u0026#34;{valueOrAddress} {type.Name}\u0026#34;); } Most of this code is based on the GetValue helper from ClrType: it returns the right “thing” for simple types. Look at ClrMD implementation details to get a better understanding of how the value is rebuilt.\nThe GetArrayAsString simply returns the number of elements in the array:\n1 2 3 4 5 private static string GetArrayAsString(ClrType type, ulong objAddress) { var elementCount = type.GetArrayLength(objAddress); return $\u0026#34;length = {elementCount}\u0026#34;; } And the call stack is now complete!\nNote that you may even get more locals or parameters than with WinDBG+SOS but don’t ask me why…\nFor more advanced object formatting cases such as dumping structs or enumerating fields and their value, I would highly recommend to look at the related ClrMD documentation page (just replace GCHeapType by ClrType and you’ll be safe).\nResources Dumping stack objects with ClrMD by Kevin Gosse Stack walking in the .NET Runtime by Matt Warren Part 1: Bootstrap ClrMD to load a dump. Part 2: Find duplicated strings with ClrMD heap traversing. Part 3: List timers by following static fields links. Part 4: Identify timers callback and other properties. Part 5: Use ClrMD to extend SOS in WinDBG. Part 6: Manipulate memory structures like real objects. Part 7: Manipulate nested structs using dynamic. Part 8: Spelunking inside the .NET Thread Pool Part 9: Deciphering Tasks and Thread Pool items ","cover":"https://chrisnas.github.io/posts/2019-12-31_getting-another-view-on/1__F2Z31l2DikZkOVEr6SSmQ.png","date":"2019-12-31","permalink":"https://chrisnas.github.io/posts/2019-12-31_getting-another-view-on/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series details how to look into your threads stack with ClrMD.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIt’s been a long time (see the resources at the end) since I’ve been discussing what ClrMD could bring to .NET developers/DevOps! My colleague \u003ca href=\"https://twitter.com/KooKiz\"\u003eKevin\u003c/a\u003e just wrote \u003ca href=\"https://medium.com/@kevingosse/dumping-stack-objects-with-clrmd-c002dab4651b\"\u003ean article about how to emulate SOS \u003cstrong\u003eDumpStackObjects\u003c/strong\u003e\u003c/a\u003e command both on Windows and Linux with ClrMD. This implementation lists the objects on the stack but without their values (like strings content for example) nor the stack frames corresponding to the method calls.\u003c/p\u003e","title":"Getting another view on thread stacks with ClrMD"},{"content":" This post shows how to attach to a .NET Core process running on Linux with WSL and also how to start a Linux process with Visual Studio debugger\nComing from the Windows world, I don’t find that easy to develop .NET Core applications for Linux. I’m used to code and debug in Visual Studio. Now, I need to build on Windows (due to our Criteo continuous integration), deploy an artifact to Marathon in order to get an application running inside a Mesos container. At Criteo, we had to build a whole set of services to allow remote debugging or memory dump analysis.\nBut what if I just wanted to test and debug a small scenario on my beloved Windows machine? Windows Subsystem for Linux (aka WSL) is perfect for running a Linux .NET Core application on Windows. However, how to attach to it or even start a debugging session from Visual Studio? In the rest of this post, I’ll explain how to setup your Windows 10 machine to achieve these miracles.\nLinux on Windows: welcome WSL\nIt is obviously not the place to dig into Windows Subsystem for Linux. You just need to know that once installed, it allows you open up a Linux shell on your Windows machine without any virtual machine kind of technology. It is also possible to share folders between Windows (where you want to build your application) and Linux (where you want to execute the built assemblies).\nThe first step is to turn on WSL feature in your Windows:\nAfter a reboot, you are able to start a WSL prompt:\nYou can also install your favorite Linux distro if you want.\nThe next step is to install .NET Core runtime or SDK so you are able to run your application on Linux.\nIt is now time to look at your hard drive from a Linux perspective:\nYour drives are mapped under /mnt without the “:”. In my case, I created a wsl folder under my d: drive to do my experiments. With Visual Studio, I generated a TestConsole application with the following code:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 using System; namespace TestConsole { class Program { static void Main(string[] args) { Console.WriteLine(\u0026#34;Enter x to EXIT...\u0026#34;); while(true) { var cmd = Console.ReadLine(); if (cmd.ToLower() == \u0026#34;x\u0026#34;) return; Console.WriteLine($\u0026#34;\u0026gt; {cmd}\u0026#34;); } } } } I’m publishing the application to get all needed assemblies:\nIn a WSL prompt, type dotnet with the name of your application assembly et voila!\nA Linux .NET Core application is running on your Windows machine.\nHow to attach to a running Linux application\nLet’s say that I’m detecting a problem and I want to debug the Linux application with Visual Studio. When attaching the Visual Studio debugger to a process, several connection types are available:\nThe SSH connection type will be used with WSL with the following kind of communications architecture:\nThe Visual Studio debugger is sending commands to the remote Linux debugger vsdbg via an SSH channel. Here are the steps to follow to install the missing components:\nBy default, an SSH server is installed with WSL. However, I was not able to make the whole pipeline work with it so I had to uninstall and reinstall it: sudo apt-get remove openssh-server sudo apt-get install openssh-server The SSH configuration needs also to be changed in order to allow username/password kind of security needed by Visual Studio (if you prefer key-based security, look at the end of the post for available resources). If you don’t know how to use vi efficiently to simply edit a file, install nano (thanks @kookiz for the tips :^) sudo apt-get install nano In /etc/ssh/sshd_config, change the PasswordAuthentication settings sudo nano /etc/ssh/sshd_config PasswordAuthentication yes Restart the ssh server sudo service ssh start You need to install unzip in order to get vsdbg sudo apt-get install unzip curl -sSL https://aka.ms/getvsdbgsh | bash /dev/stdin -v latest -l ~/vsdbg You are now ready to select SSH as connection type and enter your machine name before clicking the Refresh button. At that time, a new dialog should pop up for you to enter your WSL credentials:\nAfter you click the Refresh button, the list at the bottom should contain the Linux processes running in WSL\nSelect your .NET Core application and click Attach to select the Managed debugger:\nNow, if you set a breakpoint in the code and trigger it with an appropriate action in your WSL prompt\nthen the debugger will break as expected:\nNote that when you stop your debugging session, the Linux application is not stopped; just detached from the debugger and keeps on running.\nShowtime for F5!\nAttaching to a running Linux application is nice but it would be even better to start a Linux process from Visual Studio debugger. In order to achieve this goal you need to add another piece to the architecture puzzle:\nAs detailed in this WIKI page, it is possible to tell Visual Studio to execute debugging actions thanks to a launch.json file such as the following:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 { \u0026#34;version\u0026#34;: \u0026#34;0.2.0\u0026#34;, \u0026#34;adapter\u0026#34;: \u0026#34;c:\\\\tools\\\\plink.exe\u0026#34;, \u0026#34;adapterArgs\u0026#34;: \u0026#34;-ssh -pw \u0026lt;password\u0026gt; chrisnas@DBGDQPZ1 -batch -T ~/vsdbg/vsdbg --interpreter=vscode\u0026#34;, \u0026#34;configurations\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;.NET Core Launch\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;coreclr\u0026#34;, \u0026#34;cwd\u0026#34;: \u0026#34;/mnt/d/wsl/TestConsole/TestConsole/bin/Debug/netcoreapp2.1/publish\u0026#34;, \u0026#34;program\u0026#34;: \u0026#34;TestConsole.dll\u0026#34;, \u0026#34;request\u0026#34;: \u0026#34;launch\u0026#34; } ] } The plink tool from putty will be used as an adapter for Visual Studio to communicate with vsdbg running in WSL. The adapterArgs property gives the same SSH/machine/user/password information that you provide via Visual Studio UI in the Attach scenario. The configuration section defines which request (“launch” instead of attach and which folder/assembly to start) will be sent to vsdbg.\nOnce this file is created (in my case in d:\\wsl\\vs folder), you just need to type the following command in the Immediate pane of Visual Studio:\nDebugAdapterHost.Launch /LaunchJson:d:\\wsl\\vs\\launch.json\nand if you had set a breakpoint on the first line of your application, the debugger should break there:\nIn a WSL prompt, you can see the expected 2 new spawned processes:\nBut wait!\nI have a problem now: I don’t have any prompt in which typing input for my console application… However, as always in Linux, you simply need to write to a file to fix this. The stdin stream of your application is accessible under /proc//fd/0.\nSo, when I type the following command:\necho \u0026quot;Launching a Linux app is not a problem!\u0026quot; \u0026gt; /proc/18341/fd/0\nmy breakpoint is hit in Visual Studio:\nAlso note that everything that is sent to the console appears in Visual Studio Output pane:\nNote that, unlike the Attach scenario, if you stop the debugging session, the Linux application (and vsdbg) will be terminated.\nResources\nDuring my investigations I’ve found a few resources you might find useful (especially about configuring SSL with keys instead of passing clear user/password)\nBasic VS + WSL Good description about how to use VS 2017 to attach to .NET Core app running in WSL Debugging .NET Core from VS 2017 and WSL VS/C++ with WSL (describe how to install WSL and setup open ssh server) Setup SSH on WSL Wiki for VSDBG Great description about how to setup your linux dev environment with VS Code and WSL ","cover":"https://chrisnas.github.io/posts/2019-11-21_wsl-visual-studio-attaching/1_uMcLQA9pq4wtHLq61dV8Gw.png","date":"2019-11-21","permalink":"https://chrisnas.github.io/posts/2019-11-21_wsl-visual-studio-attaching/","summary":"\u003chr\u003e\n\u003cp\u003eThis post shows how to attach to a .NET Core process running on Linux with WSL and also how to start a Linux process with Visual Studio debugger\u003c/p\u003e\n\u003cp\u003eComing from the Windows world, I don’t find that easy to develop .NET Core applications for Linux. I’m used to code and debug in Visual Studio. Now, I need to build on Windows (due to our Criteo continuous integration), deploy an artifact to Marathon in order to get an application running inside a Mesos container. At Criteo, we had to build a whole set of services to allow remote debugging or memory dump analysis.\u003c/p\u003e","title":"WSL + Visual Studio = attaching/launching a Linux .NET Core application on my Window 10"},{"content":" This post of the series explains how to implement your own counters.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: CLR Threading events with TraceEvent.\nPart 4: Spying on .NET Garbage Collector with TraceEvent.\nPart 5: Building your own Java GC logs in .NET\nPart 6: Spying on .NET Core Garbage Collector with .NET Core EventPipes\nPart 7: .NET Core Counters internals: how to integrate counters in your monitoring pipeline\nIntroduction The** EventPipe** counters are the .NET Core replacement for Windows performance counters. In the previous post, I’ve explained how to listen to CLR event pipes to get the counter’s value over time both on Windows and Linux. This post shows you how easy it is to provide your counters via the same infrastructure.\nThe example I’m using is based on a real-world case we had to investigate at Criteo. We needed to correlate request duration with garbage collections, so we decided to add new metrics to our testing dashboard: number and duration of requests but split between those processed without being interrupted by a GC and the others.\nFor the sake of the ASP.NET Core code example, a dedicated middleware is created: it simply measures the time spent to process a request and if the count of garbage collections has changed before and after the request is processed:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 public class RequestMetricsMiddleware { private readonly RequestDelegate _next; public RequestMetricsMiddleware(RequestDelegate next) { _next = next; } public async Task InvokeAsync(HttpContext context) { // get the count of GCs before processing the request var collectionCountBeforeProcessingTheRequest = GetCurrentCollectionCount(); var sw = Stopwatch.StartNew(); try { // Call the next delegate/middleware in the pipeline await _next(context); } finally { // compare the counter of GCs after processing the request // if the count changed, a garbage collection occurred during the processing // and might have slowed it down and maybe reaching SLA limit: this could // explain 9x-percentile in slow requests for example if (GetCurrentCollectionCount() - collectionCountBeforeProcessingTheRequest != 0) { // update with collection metric RequestCountersEventSource.Instance.AddRequestWithGcDuration(sw.ElapsedMilliseconds); } else { // update without collection metric RequestCountersEventSource.Instance.AddRequestWithoutGcDuration(sw.ElapsedMilliseconds); } } } private int GetCurrentCollectionCount() { int count = 0; for (int i = 0; i \u0026lt; GC.MaxGeneration; i++) { count += GC.CollectionCount(i); } return count; } } The interesting part is in the RequestCountersEventSource implementation.\nUse an EventSource Luke! As explained in the previous post, an EventSource instance is used as the “server” part of the EventPipe communication channel. It exposes a name that is used to identify it, but more important to listen to it with dotnet-trace, dotnet-counters, or your own listener as the provider name.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 [EventSource(Name = RequestCountersEventSource.SourceName)] public class RequestCountersEventSource : EventSource { // this name will be used as \u0026#34;provider\u0026#34; name with dotnet-counters // ex: dotnet-counters monitor -p \u0026lt;pid\u0026gt; Sample.RequestCounters // const string SourceName = \u0026#34;Sample.RequestCounters\u0026#34;; public RequestCountersEventSource() : base(RequestCountersEventSource.SourceName, EventSourceSettings.EtwSelfDescribingEventFormat) { // create the counters: they\u0026#39;ll be bound to this event source + CounterGroup CreateCounters(); } This name is exposed via an EventSourceAttribute that decorates your EventSource-derived class (you could also pass it to the constructor). The counters are created in the constructor through the CreateCounters helper.\nPick the right Counter class Before looking at the implementation of the CreateCounters method, you need to understand what kind of counters are available for you. In the previous post, I mentioned that the CLR was using Mean (that provides mean, max, and min values over the update interval) and Sum (to increment a single value) kinds of counters. Note that dotnet-counter will only show the mean value for Mean counters. In addition, the counters could either automatically poll the value from a callback (the method used by the CLR today), or your code could change a counter value by calling the WriteMetric method. The EventCounter class provides this helper and it does its best to compute the min/max/mean in a lock-free way.\nThe next question to answer is which one should you use.\nIn the case of the request with(out) GC example, I want to expose different metrics:\nRequest count: a PollingCounter will be used in addition to an int field incremented when a request is received. Request count delta: an IncrementingCounter associated with the same int value will provide the delta (i.e., number of requests processed during an interval) Request with GC and without GC counts: two PollingCounter instances based on two int fields incremented when a request with (or without respectively) GC are processed. Duration of requests with and without GC: two EventCounter instances updated when requests are processed. Here is the implementation ofCreateCounters:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 private void CreateCounters() { // the same request count can be used for two counters: // - raw request counter that will always increase // - increment counter that will automatically compute the delta // between the current value and the value when the counter // was previously sent _requestCount ??= new PollingCounter(\u0026#34;request-count\u0026#34;, this, () =\u0026gt; _requestCountValue) { DisplayName = \u0026#34;Requests count\u0026#34; }; _requestCountDelta ??= new IncrementingPollingCounter(\u0026#34;request-count-delta\u0026#34;, this, () =\u0026gt; _requestCountValue) { DisplayName = \u0026#34;New requests\u0026#34;, DisplayRateTimeScale = new TimeSpan(0, 0, 1) }; // split the request counts between those for which a GC occured or not // during their processing _noGcRequestCount ??= new PollingCounter(\u0026#34;no-gc-request-count\u0026#34;, this, () =\u0026gt; _noGcRequestCountValue) { DisplayName = \u0026#34;Requests (processed without GC) count\u0026#34; }; _withGcRequestCount ??= new PollingCounter(\u0026#34;with-gc-request-count\u0026#34;, this, () =\u0026gt; _withGcRequestsCountValue) { DisplayName = \u0026#34;Requests (processed during a GC) count\u0026#34; }; // request duration counters (with or without GC happening during the processing) _noGcRequestDuration ??= new EventCounter(\u0026#34;no-gc-request-duration\u0026#34;, this) { DisplayName = \u0026#34;Requests (processed without GC) duration in milli-seconds\u0026#34; }; _withGcRequestDuration ??= new EventCounter(\u0026#34;with-gc-request-duration\u0026#34;, this) { DisplayName = \u0026#34;Requests (processed during a GC) duration in milli-seconds\u0026#34; }; } The request processing code of the ASP.NET Core middleware is relying on the following helper methods to update the counters:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 internal void AddRequestWithoutGcDuration(long elapsedMilliseconds) { IncRequestCount(); Interlocked.Increment(ref _noGcRequestCountValue); // compute min/max/mean _noGcRequestDuration?.WriteMetric(elapsedMilliseconds); } internal void AddRequestWithGcDuration(long elapsedMilliseconds) { IncRequestCount(); Interlocked.Increment(ref _withGcRequestsCountValue); // compute min/max/mean _withGcRequestDuration?.WriteMetric(elapsedMilliseconds); } private void IncRequestCount() { Interlocked.Increment(ref _requestCountValue); } And that’s it!\nHow to get these custom counters? The controller of the ASP.NET Core sample application is triggering (or not) garbage collections based on the parameters passed via the url:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 [Route(\u0026#34;api/[controller]\u0026#34;)] [ApiController] public class RequestController : ControllerBase { // GET: api/Request/5 [HttpGet(\u0026#34;{id}\u0026#34;)] public string Get(int id) { if (id == -1) { return $\u0026#34;pid = {Process.GetCurrentProcess().Id}\u0026#34;; } else if ((id \u0026gt;= 0) \u0026amp;\u0026amp; (id \u0026lt;= 2)) { GC.Collect(id); return $\u0026#34;triggered GC {id}\u0026#34;; } else if (id \u0026lt;= 10) { // trigger a given number of GCs up to 10 TriggerGCs(id); return $\u0026#34;triggered {id} garbage collections\u0026#34;; } return $\u0026#34;value = {id}\u0026#34;; } private void TriggerGCs(int count) { for (int current = 0; current \u0026lt; count; current++) { GC.Collect(0); } } As explained earlier, it is possible to see the counter values with dotnet-counters by using the event source name as a provider with the following command line:\ndotnet counters monitor -p Sample.RequestCounters\nThen if you trigger a few requests with and without GC, you should see the numbers change:\nThe code available on Github has been updated to provide the middleware and the event source classes that demonstrate how to expose custom .NET Core counters.\n","cover":"https://chrisnas.github.io/posts/2019-10-17_how-to-expose-your/1_M-GjqpcH8BL4oG02sk3sEg.png","date":"2019-10-17","permalink":"https://chrisnas.github.io/posts/2019-10-17_how-to-expose-your/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series explains how to implement your own counters.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003eCLR Threading events with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2018-12-15_spying-on-net-garbage/\"\u003eSpying on .NET Garbage Collector with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2019-02-12_building-your-own-java/\"\u003eBuilding your own Java GC logs in .NET\u003c/a\u003e\u003c/p\u003e\n\u003cp\u003ePart 6: \u003ca href=\"/posts/2019-05-28_spying-on-net-garbage/\"\u003eSpying on .NET Core Garbage Collector with .NET Core EventPipes\u003c/a\u003e\u003c/p\u003e\n\u003cp\u003ePart 7: \u003ca href=\"/posts/2019-07-23_net-core-counters-internals/\"\u003e.NET Core Counters internals: how to integrate counters in your monitoring pipeline\u003c/a\u003e\u003c/p\u003e","title":"How to expose your custom counters in .NET Core"},{"content":" This post of the series digs into the implementation details of the new .NET Core counters.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: CLR Threading events with TraceEvent.\nPart 4: Spying on .NET Garbage Collector with TraceEvent.\nPart 5: Building your own Java GC logs in .NET\nPart6: Spying on .NET Core Garbage Collector with .NET Core EventPipes\nIntroduction As explained in a previous post, .NET Core 2.2 introduced the EventListener class to receive in-proc CLR events both on Windows and Linux. Starting with .NET Core 3.0 Preview 6, the EventPipe-based infrastructure makes it now possible to get these events from another process. The diagnostics repository contains the cross-platform tools leveraging this infrastructure:\ndotnet-dump: take memory snapshot and allow analysis based on most SOS commands dotnet-trace: collect events emitted by the Core CLR and generate trace file to be analyzed with Perfview dotnet-counters: collect the metrics corresponding to some performance counters that used to be exposed by the .NET Framework At Criteo, our metrics are exposed in Grafana dashboards and it is interesting to figure out how the new counters are implemented and see how to fetch them via the EventPipe infrastructure. With this knowledge in hand, I’ve implemented helpers to let you get counters in less than 10 lines of code:\n1 2 3 4 5 6 7 _counterMonitor = new CounterMonitor(_pid, GetProviders()); _counterMonitor.CounterUpdate += // receive the value of one counter after the other Task monitorTask = new Task(() =\u0026gt; { _counterMonitor.Start(); }); monitorTask.Start(); At the end of this post you will be able to very easily integrate any counter to your own monitoring pipeline!\n.NET Core replacement for .NET Framework Performance Counters With .NET Core being cross-platform, performance counters were gone and, as explained in the previous posts of the series, CLR events were the only way to get metrics about how your .NET Core applications were behaving. However, with .NET Core 3.0, it is now possible to view a few metrics thanks to the dotnet-counters tool.\nYou can download and install the tools automatically if you have installed .NET Core SDK 2.1+. Microsoft is currently working to provide other ways to directly download the tools binaries without having to install the SDK or recompile the diagnostics repository.\nUse the following command line to install dotnet-counters: dotnet tool install --global dotnet-counters --version 3.0.0-preview7.19365.2\nNote that you need to have the same version both for the Core CLR runtime and for the tools because, as you will soon see, the monitoring and the monitored applications are communicating via a dedicated protocol (that have changed between previews) on top of a transport layer different between Windows and Linux.\nAfter the installation, use the following command line dotnet counters monitor -p and you get a 1 second auto-refreshed view of counters.\nThese counters are exposed by the System.Runtime provider and are detailed with the list argument:\nThis list is currently hard-coded in the CreateKnownProviders method. However, you are free to create your own provider and expose your application metrics as shown in this tutorial (and in the next forthcoming post). In addition, if you are using ASP.NET Core, starting from Preview 7, then you could get a few counters from the “Microsoft.AspNetCore.Hosting” provider defined in HostingEventSource.cs.\nWhat are these “counters” Even though it is nice to have a console-based cross-platform tool to see the values of counters change, what would be the cost to get them into your own monitoring pipeline? For example, at Criteo, we are pushing our metrics to Graphite in order to get nice Grafana dashboards. These graphical representations allow us to have a visual representation of the evolution of metrics over time. In addition, it is also possible to define alerts based on threshold for some metrics values (when CPU \u0026gt; 85% for more than 5 seconds for example).\nIn a nutshell, dotnet-counters tool is listening to another application via EventPipe. Unlike .NET Framework performance counters that are polled by the monitoring application, the counters are pushed by the monitored .NET Core process.\nIn term of implementation, these counters are values that you could get via .NET internal or public APIs if you were running in-proc as shown in RuntimeEventSource.cs:\nUnlike most of the events that previous posts of this series presented, counters are metrics that are computed by the CLR in the monitored application. They are supposed to provide a set of values changing over time in the monitored application without impacting the performance nor flooding the listener client. I highly recommend to take a look at this issue for a deeper discussion about EventCounters compared to regular events.\nAs of Preview 7, two types of counters are used:\nMean: supposed to contain a mean of all values during the polling interval with its min and max values. However, based on the current implementation, all contain only the current value. Sum: contains an increment between the previous value and the current one The question is now to figure out how to get the values of the counters.\nHow to receive the counters? Like the Perfview tool that relies on TraceEvent library, dotnet-counters uses an API exposed by Microsoft.Diagnostics.Tools.RuntimeClient assembly. Note that it is currently not (yet) available from nuget so you need to recompile it with the diagnostics git repo.\nTo receive counters, you need to create an EventPipe session that communicates via IPC (named pipes on Windows and domain sockets on Linux) with the CLR of the monitored process. Here is an excerpt of the CounterMonitor.StartMonitoring implementation that connects and listens to counter events:\n1 2 3 4 5 6 7 8 9 var configuration = new SessionConfiguration( circularBufferSizeMB: 1000, outputPath: \u0026#34;\u0026#34;, providers: Trace.Extensions.ToProviders(providerString)); var binaryReader = EventPipeClient.CollectTracing(_processId, configuration, out _sessionId); EventPipeEventSource source = new EventPipeEventSource(binaryReader); source.Dynamic.All += ProcessEvents; source.Process(); The important method call is call is EventPipeClient.CollectTracing() that returns a Stream from which an EventPipeEventSource instance gets created. This class has been added to TraceEvent so you can now leverage the event parsing infrastructure on top of EventPipe! As shown in a previous post, it is easy to attach a listener to the source All .NET event and get notified each time an event is received after the Process method is called.\nA few parameters are given to CollectTracing via the SessionConfiguration object: the size of the circular buffer used by the CLR and no file path because we want a live session. The last one is supposed to filter which providers and counters you would like to listen to: it expects a list of Provider instances. This struct is created with a few parameters:\n1 2 3 4 5 6 public struct Provider { public Provider(string name, ulong keywords = ulong.MaxValue, EventLevel eventLevel = EventLevel.Verbose, string filterData = null) { ... } As we have already mentioned, the name of the provider is “System.Runtime” for the Core CLR counters. The keywords and event level are expected to have these max values. The filter data string starts with “EventCounterIntervalSec=” followed by the refresh interval in seconds. Internally, the CLR in the monitored application is creating a timer with that frequency to push the counters via EventPipe (more on this later).\nHere is a helper class to easily create your providers:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 public class CounterHelpers { public static Provider MakeProvider(string name, int refreshIntervalInSec) { var filterData = BuildFilterData(refreshIntervalInSec); return new Provider(name, 0xFFFFFFFF, EventLevel.Verbose, filterData); } private static string BuildFilterData(int refreshIntervalInSec) { if (refreshIntervalInSec \u0026lt; 1) throw new ArgumentOutOfRangeException(nameof(refreshIntervalInSec), $\u0026#34;must be at least 1 second\u0026#34;); return $\u0026#34;EventCounterIntervalSec={refreshIntervalInSec}\u0026#34;; } } Note that dotnet-counters allows you to pass a subset of the counters with the System.Runtime[counter1,counter2,counter2] syntax: events for all System.Runtime counters will be received but only these three will be displayed in the console.\nShow time for counter events! Next, the important part of the job takes place in the EventSourc.All event listener. Each new counter value is received in the payload of an event named “EventCounters”.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private void ProcessEvents(TraceEvent data) { if (data.EventName.Equals(\u0026#34;EventCounters\u0026#34;)) { IDictionary\u0026lt;string, object\u0026gt; countersPayload = (IDictionary\u0026lt;string, object\u0026gt;)(data.PayloadValue(0)); IDictionary\u0026lt;string, object\u0026gt; kvPairs = (IDictionary\u0026lt;string, object\u0026gt;)(countersPayload[\u0026#34;Payload\u0026#34;]); var name = string.Intern(kvPairs[\u0026#34;Name\u0026#34;].ToString()); var displayName = string.Intern(kvPairs[\u0026#34;DisplayName\u0026#34;].ToString()); var counterType = kvPairs[\u0026#34;CounterType\u0026#34;]; if (counterType.Equals(\u0026#34;Sum\u0026#34;)) { OnSumCounter(name, displayName, kvPairs); } else if (counterType.Equals(\u0026#34;Mean\u0026#34;)) { OnMeanCounter(name, displayName, kvPairs); } else { throw new InvalidOperationException($\u0026#34;Unsupported counter type \u0026#39;{counterType}\u0026#39;\u0026#34;); } } } The Name and DisplayName values are self-explanatory. The Sum/Mean type is retrieved from CounterType.\nThe value for each counter type is retrieved from the payload with “Increment” (Sum type) or “Mean” (*Mean *type) keys.\n1 2 3 4 5 6 7 8 9 10 11 12 13 private void OnSumCounter(string name, string displayName, IDictionary\u0026lt;string, object\u0026gt; kvPairs) { double value = double.Parse(kvPairs[\u0026#34;Increment\u0026#34;].ToString()); // send the information to your metrics pipeline } private void OnMeanCounter(string name, string displayName, IDictionary\u0026lt;string, object\u0026gt; kvPairs) { double value = double.Parse(kvPairs[\u0026#34;Mean\u0026#34;].ToString()); // send the information to your metrics pipeline } The CounterMonitor class has been added on my Github to expose a CounterUpdate C# event when a counter event is received:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public class CounterMonitor { ... public event Action\u0026lt;CounterEventArgs\u0026gt; CounterUpdate; private void OnSumCounter(string name, string displayName, IDictionary\u0026lt;string, object\u0026gt; kvPairs) { double value = double.Parse(kvPairs[\u0026#34;Increment\u0026#34;].ToString()); // send the information to your metrics pipeline CounterUpdate(new CounterEventArgs(name, displayName, CounterType.Sum, value)); } private void OnMeanCounter(string name, string displayName, IDictionary\u0026lt;string, object\u0026gt; kvPairs) { double value = double.Parse(kvPairs[\u0026#34;Mean\u0026#34;].ToString()); // send the information to your metrics pipeline CounterUpdate(new CounterEventArgs(name, displayName, CounterType.Mean, value)); } } The event argument contains the expected properties but other could be added if needed such as the timestamp for example:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 public class CounterEventArgs : EventArgs { internal CounterEventArgs(string name, string displayName, CounterType type, double value) { Counter = name; DisplayName = displayName; Type = type; Value = value; } public string Counter { get; set; } public string DisplayName { get; set; } public CounterType Type { get; set; } public double Value { get; set; } } public enum CounterType { Sum = 0, Mean = 1, } Let’s show some graphs! With these helpers in hand, it is easy to integrate any counter to your monitoring pipeline. As an example, let’s see how to generate a .csv file used to create visual representations in Excel.\nWith a refresh rate of 1 second, one line containing the value of the CLR counters should be added to the .csv file every second. Since we get one event per counter, we need to know which is the “last” counter event sent by the CLR for a given 1 second counters push.\nAs mentioned earlier the RuntimeEventSource class defines the CLR counters. Each one is an instance of a type derived from the DiagnoticCounter class that associates its instances to a CounterGroup also bound to the RuntimeEventSource. The CounterGroup class will setup a repeating timer responsible for creating the payload for its DiagnosticCounter-derived instances and ask the event source to send each to the monitoring application via EventPipe.\nSo we can rely on the order defined by the counters creation code in RuntimeEventSource: for a given push of counters, the name of the last one will be “assembly-count”. Beware that in a case of new counters (such as for ASP.NET Core), you would need to check what would be the last one of the counters series. Another way to work around would be to rely on the timestamps of each event but this could become flaky over time. It would have been great if a “CounterSeries”event containing the list of counter names would have been sent before any “EventCounters” of a series push (good idea for a pull request :^)\nThe CsvCounterListener class wraps the few lines of code needed to handle the events and add a line into the .csv file each time a series of counters is received:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 public class CsvCounterListener { private readonly string _filename; private readonly int _pid; private CounterMonitor _counterMonitor; private List\u0026lt;(string name, double value)\u0026gt; _countersValue; public CsvCounterListener(string filename, int pid) { _filename = filename; _pid = pid; _countersValue = new List\u0026lt;(string name, double value)\u0026gt;(); } public void Start() { if (_counterMonitor != null) throw new InvalidOperationException($\u0026#34;Start can\u0026#39;t be called multiple times\u0026#34;); _counterMonitor = new CounterMonitor(_pid, GetProviders()); _counterMonitor.CounterUpdate += OnCounterUpdate; Task monitorTask = new Task(() =\u0026gt; { try { _counterMonitor.Start(); } catch (Exception x) { Environment.FailFast(\u0026#34;Error while listening to counters\u0026#34;, x); } }); monitorTask.Start(); } private void OnCounterUpdate(CounterEventArgs args) { _countersValue.Add((args.DisplayName, args.Value)); // we know that the last CLR counter is \u0026#34;assembly-count\u0026#34; if (args.Counter == \u0026#34;assembly-count\u0026#34;) { SaveLine(); _countersValue.Clear(); } } bool isHeaderSaved = false; private void SaveLine() { if (!isHeaderSaved) { File.AppendAllText(_filename, GetHeaderLine()); isHeaderSaved = true; } File.AppendAllText(_filename, GetCurrentLine()); } private string GetHeaderLine() { StringBuilder buffer = new StringBuilder(); foreach (var counter in _countersValue) { buffer.AppendFormat(\u0026#34;{0}\\t\u0026#34;, counter.name); } // remove last tab buffer.Remove(buffer.Length - 1, 1); // add Windows-like new line because will be used in Excel buffer.Append(\u0026#34;\\r\\n\u0026#34;); return buffer.ToString(); } private string GetCurrentLine() { StringBuilder buffer = new StringBuilder(); foreach (var counter in _countersValue) { buffer.AppendFormat(\u0026#34;{0}\\t\u0026#34;, counter.value.ToString()); } // remove last tab buffer.Remove(buffer.Length - 1, 1); // add Windows-like new line because will be used in Excel buffer.Append(\u0026#34;\\r\\n\u0026#34;); return buffer.ToString(); } public void Stop() { if (_counterMonitor == null) throw new InvalidOperationException($\u0026#34;Stop can\u0026#39;t be called before Start\u0026#34;); _counterMonitor.Stop(); _counterMonitor = null; _countersValue.Clear(); } private IReadOnlyCollection\u0026lt;Provider\u0026gt; GetProviders() { var providers = new List\u0026lt;Provider\u0026gt;(); // create default \u0026#34;System.Runtime\u0026#34; provider with a refresh every second var provider = CounterHelpers.MakeProvider(\u0026#34;System.Runtime\u0026#34;, 1); providers.Add(provider); return providers; } } What’s next? You have seen how easy it is to be notified of CLR counters update. The integration to your own monitoring system should not be more complicated. However, you need to pay attention to the meaning of counter types between *Mean *and Sum. For example, the value you get for gen-0-count (Sum) counters is a difference between now and the previous computation. It means that you can’t have the “current” number of gen 0 collection at a given time.\nThis is not a problem in the Excel example because you can “rebuild” a column that will contain the “current” count based on the previous value + the diff returned by the counter.\nHere is the resulting graph:\nIn other cases, you might need to feed your monitoring system with real count values and benefit from advanced charting such as non derivative computation to show a rate based on a series of values. At the end of the day, it is just a question of initial value from which rebuild a count. And if you think about it, you are often more interested in unexpected variations (i.e. differences returned by counters) when monitoring your application.\nIn addition to your business metrics, .NET Core Counters are usually enough to monitor the health of your applications. However, in order to investigate situations where counters value are showing weird results, you often need more details. For example spikes in garbage collections count might not be a problem if the pause time is not too long. Listening to specific CLR events as shown in previous posts of this series is a great way to unveil important metrics such as GC pause time, contentions duration or exception names without performance hit.\nThe code available on Github has been updated to provide the CounterMonitor and CsvCounterListener classes that demonstrates how to get .NET Core counters and generate .csv file usable in Excel.\n","cover":"https://chrisnas.github.io/posts/2019-07-23_net-core-counters-internals/1_U2SXMs1uV4x36fdjH7nKiA.png","date":"2019-07-23","permalink":"https://chrisnas.github.io/posts/2019-07-23_net-core-counters-internals/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series digs into the implementation details of the new .NET Core counters.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003eCLR Threading events with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2018-12-15_spying-on-net-garbage/\"\u003eSpying on .NET Garbage Collector with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2019-02-12_building-your-own-java/\"\u003eBuilding your own Java GC logs in .NET\u003c/a\u003e\u003c/p\u003e\n\u003cp\u003ePart6: \u003ca href=\"/posts/2019-05-28_spying-on-net-garbage/\"\u003eSpying on .NET Core Garbage Collector with .NET Core EventPipes\u003c/a\u003e\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAs explained in \u003ca href=\"/posts/2018-12-06_in-process-clr-event/\"\u003ea previous post\u003c/a\u003e, \u003ca href=\"/posts/2018-12-06_in-process-clr-event/\"\u003e.NET Core 2.2 introduced\u003c/a\u003e the \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventlistener?WT.mc_id=DT-MVP-5003325?view=netcore-2.2\"\u003eEventListener class\u003c/a\u003e to receive in-proc CLR events both on Windows and Linux. Starting with .NET Core 3.0 Preview 6, the \u003cstrong\u003eEventPipe\u003c/strong\u003e-based infrastructure makes it now possible to get these events from another process. The \u003ca href=\"https://github.com/dotnet/diagnostics\"\u003ediagnostics repository\u003c/a\u003e contains the cross-platform tools leveraging this infrastructure:\u003c/p\u003e","title":".NET Core Counters internals: how to integrate counters in your monitoring pipeline"},{"content":" This post of the series shows how to generate GC logs in .NET Core with the new event pipes architecture and details the events emitted by the CLR during a collection.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: CLR Threading events with TraceEvent.\nPart 4: Spying on .NET Garbage Collector with TraceEvent.\nPart 5: Building your own Java GC logs in .NET\nIntroduction The previous episode of the series introduced the notion of “GC log”, well known in the Java world and how to implement it in .NET thanks to ETW and TraceEvent on Windows. This solution is easy but requires to create an ETW session (and to remember to close it)… and is also not supported on Linux. However, .NET Core 2.2 introduced the EventListener class as the best way to receive CLR events both on Windows and Linux but only from inside the process itself. As of today, TraceEvent is not supporting live session with EventPipe/EventListener, only a file-based constructor is available. This is unfortunate because it means that you can’t rely on the huge work done by TraceEvent to parse the CLR events; especially those related to garbage collections. The rest of the post will explain how to decipher raw events.\nIn addition, there is a bigger problem: the current .NET Core 2.2 implementation is not working for all CLR events. Long story short, the EventPipe class relies on specific Thread Local Storage slot that is not set by GC background worker threads: the events are not emitted in that case. In addition, there is no per event timestamp information in 2.2. The implementation presented in this post relies on tests done with ETW traces and on the Pull Request that fixes the issue for .NET Core 3.0, available in Preview 5.\nBack to the basics: what events are emitted by the GC? The previous posts of the series were based on C# events raised by the TraceEvent parser with names different from the original CLR events and the corresponding Microsoft Docs. When you implement your EventListener-derived class, each event is received as an EventWrittenEventArgs object in the OnEventWritten override. The EventId and EventName properties allow you to figure out which event is received. If you have worked with TraceEvent before, you might be using the Opcode property but even if a property with the same name exists in EventWrittenEventArgs, the value is completely different and should not be used.\nThe CLR is versioning the emitted events to be able to add information over time. For example, the EventId of the “GCStart” event is 1 but the EventName could be GCStart, GCStart_V1 or GCStart_V2 even though the Microsoft Docs seems to be stuck on version 1. The following table lists the interesting GC events for .NET Core 2.2/3.0:\nLook at the documentation related to each event.\nIf you go back to this previous article of the series, you notice that all details provided by the TraceGC argument are available except for the objects size before and after the collection. These values are embedded in the workload of the GCPerHeapHistory event by the GC code. Unfortunately, these details are not marshalled by the current EventPipe implementation to your OnEventWritten override (read https://github.com/dotnet/coreclr/issues/24506 for more details and when it will be fixed).\nThere is no strongly typed EventArgs per event and you need to know the name of the field you are interested in to get its index. From this index, you get its corresponding value from the Payload property of the received EventWrittenArgs. The following helper method is doing the heavy lifting for you:\n1 2 3 4 5 6 7 8 9 private T GetFieldValue\u0026lt;T\u0026gt;(EventWrittenEventArgs e, string fieldName) { // this is not very optimum in term of performance but should not be a problem var index = e.PayloadNames.IndexOf(fieldName); if (index == -1) return default(T); return (T) e.Payload[index]; } Now that all interesting events are known, it is time to figure out what is the sequence of events emitted during a garbage collection: a new line with the details should be added to the GC log file when the last event is received.\nWhat is the exact sequence of GC events So let’s go back to the main phases of a garbage collection with the related CLR events as shown in the following figure (with Konrad Kokosa courtesy from his book)\nThis is the expected events for the most complicated case: a background collection with possible foreground ephemeral (gen0 and gen1) collections while the GC threads are concurrently sweeping. However, it is not possible to rely on this specific order of events because the order changes, depending on workstation/background mode and generation 2/ephemeral. Each type of collection triggers events in different order as shown below:\nGen0/Gen1 and Gen 2 (non concurrent) Gen 2 (background) Here is a more visual view of what could happen (dark blue is gen 2 and light blue are ephemeral gen0/1):\nWhen exactly does a GC start… The GCTriggered event notifies that a new collection will start except in the case of foreground ephemeral gen0/gen1 collections triggered during a background gen2. In that case, you could rely on the GCStart event and check if a background gen2 is running. This GCStart event provides the condemned generation in its Depth property. So I keep track of both the current background GC (if any) and the foreground GC (if any) in a GCInfo object:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 internal class GCInfo { ... // When a background garbage collection (BGC) is started, // other foreground garbage collection (FGC) for gen 0 and 1 could happen // before the original BGC ends // public GCDetails CurrentBGC { get; set; } // this could contain a FGC after a BGC has started // or a non-concurrent gen0/gen1/gen2 collection public GCDetails GCInProgress { get; set; } } The GCDetails class keeps tracks of all the details gathered during a garbage collection:\n1 2 3 4 5 6 7 8 9 10 11 internal class GCDetails { public DateTime TimeStamp { get; set; } public double PauseDuration { get; set; } public int Number { get; set; } public GCReason Reason { get; set; } public GCType Type { get; set; } public int Generation { get; set; } public bool IsCompacting { get; set; } public HeapDetails Heaps; } The HeapDetails stores the size of each generation after a collection:\n1 2 3 4 5 6 7 public struct HeapDetails { public long Gen0Size { get; set; } public long Gen1Size { get; set; } public long Gen2Size { get; set; } public long LOHSize { get; set; } } The GCDetails instance is created when the GCStart event is received:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 private void OnGcStart(EventWrittenEventArgs e) { // This event is received after a collection is started var newGC = BuildGCDetails(e); // If a BCG is already started, FGC (0/1) are possible and will finish before the BGC // if ( (GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Depth\u0026#34;) == 2) \u0026amp;\u0026amp; ((GCType)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Type\u0026#34;) == GCType.BackgroundGC) ) { _gcInfo.CurrentBGC = newGC; } else { _gcInfo.GCInProgress = newGC; } // forthcoming expected events for gen 0/1 collections are GCGlobalHeapHistory then GCHeapStats } private GCDetails BuildGCDetails(EventWrittenEventArgs e) { return new GCDetails() { Number = (int)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Count\u0026#34;), Generation = (int)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Depth\u0026#34;), Type = (GCType)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Type\u0026#34;), Reason = (GCReason)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;Reason\u0026#34;) }; } This is where it is important to remember if either a background or foreground GC is starting. In the former case, the CurrentBGC field is set and the GCInProgress field is set otherwise with a new GCDetails instance.\nThat way, when either of GCGlobalHistory or GCHeapStarts is received, it is easy to know what is the GC in progress; i.e. if a foreground GC is in progress, an event happens in its context (until the last one **GCHeapStats **that will clean the GCInProcess field):\n1 2 3 4 5 6 7 8 9 private GCDetails GetCurrentGC(GCInfo info) { if (info.GCInProgress != null) { return info.GCInProgress; } return info.CurrentBGC; } … suspend, pause application threads and end of ephemeral collections The suspension and pause time are not that complicated to compute. The garbage collector code is relying on the SuspendEE and RestartEE methods provided by the .NET Execution Engine to suspend and restart the application threads respectively. Each of these methods emits a pair of GCxxxBegin and GCxxxEnd events. After GCSuspendEEBegin is emitted, the Execution Engine waits for the application threads to suspend their execution. When all threads are suspended, GCSuspendEEEnd gets emitted.\nThe GCRestartEEBegin event is emitted when the applications threads begin to resume their execution. When all application threads are resumed, GCRestartEEEnd gets emitted. The elapsed time between the GCSuspendEEEnd and GCRestartEEBegin events is counted as suspension time. However, for simplicity sake, my current implementation sums both the time spent by the Execution Engine to suspend the threads and the pause time due to the GC work.\nThe suspension start time is kept in GCInfo:\n1 2 3 4 // time when SuspendEEBegin is received for this process // --\u0026gt; from here, all app threads will be suspended until RestartEEStop is received // Note that we don\u0026#39;t know yet what will be the triggered GC public DateTime? SuspensionStart { get; set; } It will be set when the GCSuspendEEBegin event is received:\n1 2 3 4 5 6 private void OnGcSuspendEEBegin(EventWrittenEventArgs e) { // we don\u0026#39;t know yet what will be the next GC corresponding to this suspension // so it is kept until next GCStart _gcInfo.SuspensionStart = e.TimeStamp; } This implementation decision does not provide the same level of suspension details (no fine grain suspension time for inner foreground collections) as the one provided by the TraceEvent parsing.\nThe sibling GCRestartEEEnd event is used to (1) compute the total pause time and (2) detect when gen0/gen1/non concurrent gen2 collections end:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 private void OnGcRestartEEEnd(EventWrittenEventArgs e) { var currentGC = GetCurrentGC(_gcInfo); if (currentGC == null) { // this should never happen, except if we are unlucky to have missed a GCStart event return; } // compute suspension time double suspensionDuration = 0; if (_gcInfo.SuspensionStart.HasValue) { suspensionDuration = (e.TimeStamp - _gcInfo.SuspensionStart.Value).TotalMilliseconds; _gcInfo.SuspensionStart = null; } else { // bad luck: a xxxBegin event has been missed } currentGC.PauseDuration += suspensionDuration; // could be the end of a gen0/gen1 or of a non concurrent gen2 GC if ( (currentGC.Generation \u0026lt; 2) || (currentGC.Type == GCType.NonConcurrentGC) ) { GcEvents?.Invoke(this, BuildGcArgs(currentGC)); _gcInfo.GCInProgress = null; return; } // in case of background gen2, just need to sum the suspension time // --\u0026gt; its end will be detected during GcGlobalHistory event } Detect other collections end (and more details) As shown in the events workflow figure, the GCRestartEEBegin/GCRestartEEEnd duo of events are used to detect the end of non-concurrent gen0/1/2 collections. It is more complicated to detect the end of a gen2 background or inner ephemeral gen0/1 collections: GCGlobalHeapHistory for the former and GCHeapStats for the latter. However, these two events payload does not contain the piece of information to know if we are in a middle of a background gen 2 or not. With this details in mind, the code of the different event handlers is quite straightforward.\nThe generations size are retrieved from the GCHeapStat event:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 // This event provides the size of each generation after the collection // Note: last event for non background GC (will be GCGlobalHeapHistory for background gen 2) private void OnGcHeapStats(EventWrittenEventArgs e) { var currentGC = GetCurrentGC(_gcInfo); if (currentGC == null) return; currentGC.Heaps.Gen0Size = (long)GetFieldValue\u0026lt;ulong\u0026gt;(e, \u0026#34;GenerationSize0\u0026#34;); currentGC.Heaps.Gen1Size = (long)GetFieldValue\u0026lt;ulong\u0026gt;(e, \u0026#34;GenerationSize1\u0026#34;); currentGC.Heaps.Gen2Size = (long)GetFieldValue\u0026lt;ulong\u0026gt;(e, \u0026#34;GenerationSize2\u0026#34;); currentGC.Heaps.LOHSize = (long)GetFieldValue\u0026lt;ulong\u0026gt;(e, \u0026#34;GenerationSize3\u0026#34;); // this is the last event for non background collections during a background gen2 collections if ( (_gcInfo.CurrentBGC != null) \u0026amp;\u0026amp; (currentGC.Generation \u0026lt; 2) ) { GcEvents?.Invoke(this, BuildGcArgs(currentGC)); _gcInfo.GCInProgress = null; } } Remember this is the last event received for a gen0/gen1/foreground gen2 collection so I’m using it to clear the GCInProgress field: the next event will be for the current background gen2 if any (CurrentBGC field is not null) or a new collection.\nAs of today with Preview 5, the before/after generation sizes are not marshalled through event pipes (see the corresponding bug for more details) so the **GCPerHeapHistory **event does not bring any value.\nThe last GCGlobalHeapHistory event of background gen 2 collection is also used to detect compaction:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 // This event is used to figure out if a collection is compacting or not // Note: last event for background GC (will be GCHeapStats for ephemeral (0/1) and non concurrent gen 2 collections) private void OnGcGlobalHeapHistory(EventWrittenEventArgs e) { var currentGC = GetCurrentGC(_gcInfo); // check unexpected event (we should have received a GCStart first) if (currentGC == null) return; var globalMask = GetFieldValue\u0026lt;GCGlobalMechanisms\u0026gt;(e, \u0026#34;GlobalMechanisms\u0026#34;); currentGC.IsCompacting = (globalMask \u0026amp; GCGlobalMechanisms.Compaction) == GCGlobalMechanisms.Compaction; // this is the last event for gen 2 background collections if ((GetFieldValue\u0026lt;int\u0026gt;(e, \u0026#34;CondemnedGeneration\u0026#34;) == 2) \u0026amp;\u0026amp; (currentGC.Type == GCType.BackgroundGC)) { // check unexpected generation mismatch var globalMask = (GCGlobalMechanisms)GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;GlobalMechanisms\u0026#34;); currentGC.IsCompacting = (globalMask \u0026amp; GCGlobalMechanisms.Compaction) == GCGlobalMechanisms.Compaction; // this is the last event for gen 2 background collections if ((GetFieldValue\u0026lt;uint\u0026gt;(e, \u0026#34;CondemnedGeneration\u0026#34;) == 2) \u0026amp;\u0026amp; (currentGC.Type == GCType.BackgroundGC)) { GcEvents?.Invoke(this, BuildGcArgs(currentGC)); ClearCollections(_gcInfo); } } } In case of a background gen 2, this is the last event so there should not be any collection in progress:\n1 2 3 4 5 private void ClearCollections(GCInfo info) { info.CurrentBGC = null; info.GCInProgress = null; } The next received event will start a new garbage collection cycle of events.\nThis post concludes the series about CLR events and how to use them to better understand how the runtime is behaving under the workloads of your applications. The code available on Github has been updated to provide the EventListenerGcLog class that uses the code demonstrated in this post to generate GC logs with event pipes.\n","cover":"https://chrisnas.github.io/posts/2019-05-28_spying-on-net-garbage/1_CJb-0oh4Z1vntA2JQpZsog.png","date":"2019-05-28","permalink":"https://chrisnas.github.io/posts/2019-05-28_spying-on-net-garbage/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series shows how to generate GC logs in .NET Core with the new event pipes architecture and details the events emitted by the CLR during a collection.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: CLR Threading events with TraceEvent.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2018-12-15_spying-on-net-garbage/\"\u003eSpying on .NET Garbage Collector with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2019-02-12_building-your-own-java/\"\u003eBuilding your own Java GC logs in .NET\u003c/a\u003e\u003c/p\u003e","title":"Spying on .NET Garbage Collector with .NET Core EventPipes"},{"content":" This post of the series shows how we debugged the Core CLR to figure out insane contention duration.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: CLR Threading events with TraceEvent.\nPart 4: Spying on .NET Garbage Collector with TraceEvent.\nPart 5: Building your own Java GC logs in .NET.\nIntroduction Long before migrating our .NET applications to Linux, our first step was to build a monitoring pipeline based on LTTng instead of ETW on Windows. To achieve this goal, the open source TraceEvent Nuget package needed to be updated in order to listen to LTTng live session (only a file based implementation was provided by Microsoft; mostly to allow Perfview to be able to open traces taken on Linux machines). This was a huge development task that led sometimes to weird results. Among the metrics we wanted to monitor, the contention duration gave insane value such as thousands of minutes… per minute:\nAs shown in a previous episode, this duration is computed by comparing the time between the two events ContentStart and ContentionStop. What could be the possible reasons to get such insane values?\nA lot of small contentions are happening\nA few very long contentions are happening\nAs a first step, it would be great to be able to debug the Core CLR and figure out what call stacks end up to triggering these contention events. Unfortunately for us, the .NET debugging ecosystem on Linux is far from being as rich as on Windows. So this episode is detailing the steps to compile and debug the Core CLR on Windows with WinDBG.\nFrom the source to debugging the runtime To better understand the implementation details in the CLR, we needed to find where the two events are emitted. In fact, during the CLR compilation, a lot of helpers are created based on the name of the event. In our case, FireEtwContentionStart_V1 and FireContentionStop are the two helpers in charge. Both are called in the AwareLock::EnterEpilogHelper function.\nAs a Windows developer, I would like to debug the CLR code and set a breakpoint in the EnterEpilogHelper with Visual Studio to see what are the call stacks that end up to contention. However, I did not find a way to do it with Visual Studio. I turned to WinDBG and things gets “easier”… in a certain way.\nHere are the different steps you need to follow before setting a breakpoint on any Core CLR function in WinDBG:\nClone the Core CLR repository from https://github.com/dotnet/coreclr Build it: Get the Visual Studio, .NET Core SDK, CMake, Python, Powershell prerequisites from the documentation Goto the root folder and type .\\build -skiptests to build a DEBUG version of the Core CLR Leave your desk and go to lunch (ok… maybe just take a coffee break) When you go back, the result of the compilation should be available in the following folder: …\\coreclr\\bin\\Product\\Windows_NT.x64.debug.\nthe next step is to use your custom Core CLR build in the application: the application must be self-contained by adding win-x64 (or linux-x64 for Linux) in a PropertyGroup section of the .csproj. publish the application by running dotnet publish or from within Visual Studio Click the Configure link and select Debug configuration after clicking Save and Publish, you should now have the result under the \\bin\\Debug\\netcoreapp2.2\\publish folder. after clicking Save and Publish, you should now have the result under the \\bin\\Debug\\netcoreapp2.2\\publish folder. It is now time to copy the following files from the Core CLR output to your application publication folder: coreclr.dll (for the native part of the CLR) and System.Private.CoreLib.dll (if the CLR C# code has been modified) in the PDB subfolder, coreclr.pdb and System.Private.CoreLib.pdb note that you might also need the sos.dll and mscordaccore.dll files for any investigation in WinDBG. If you wonder why the CoreFx repo is not rebuilt, the answer is simple: the contention related code is in the CoreCLR. Also, most of the managed “mscorlib” is defined in System.Private.CoreLib.dll that gets built with CoreCLR. The rest of the BCL is covered by CoreFX and not needed in this investigation.\nFrom running to debugging in WinDBG You should use corerun.exe instead of dotnet.exe to run an application with the debug version of the Core CLR you’ve just built.\nOpen up a command prompt in the coreclr\\bin\\Product\\Windows_NT.x64.debug folder and type corerun** c:\u0026lt;your path to the bin\\Debug\\netcoreapp2.2\\publish folder of your application\u0026gt;\u0026lt;yourApp.dll\u0026gt;**\nYou have to tell corerun where to find the CoreFx assemblies via the CORE_LIBRARY environment variable:\nCORE_LIBRARIES=C:\\Program Files\\dotnet\\shared\\Microsoft.NETCore.App\\2.2.0\nIf you forget about it, don’t be surprised if the application stops with FileNoteFoundException for a missing assembly (usually System.Runtime)…\nIf, like me, your applications are running with server mode GC, you know that it is set in the application .csproj file to end up into the runtimeconfig.json file. Unfortunately, this is not taken into account by corerun (yet?) and you need to set it (and if you need concurrent version too) explicitly through the following environment variables:\nCOMPlus_gcServer=1 COMPlus_gcConcurrent=1\nFrom there, (install WinDBG if not already done and) start the debugger: click the File menu and select Launch Executable (advanced) to setup a debugging session:\nThe Executable text field points to the corerun.exe file generated during the compilation of the Core CLR. The same folder is used as Start Directory and the Arguments text field contains the full path of the application to debug. You could also attach to a running process but sometimes you need to access Core CLR data structures before any C#-compiled managed code of your application starts executing (to see how the garbage collector initializes for example).\nAs soon as you click the Ok button, the application starts but is almost immediately stopped by WinDBG\nDon’t be scared by the last lines of the output: even through you read the word exception, this int 3** **assembly instruction tells you that a breakpoint has been set for you by WinDBG, has been hit when the application reached it and the application is now paused just before calling its entry point.\nAs you can see from the list of loaded modules, even though CoreRun.exe is there, no managed assembly (especially your application) has been loaded yet; not even the Core CLR itself! This means that you have to tell WinDBG to keep on executing the application until a point you would be interested in. To achieve that task, you will first need a quick tour of WinDBG user interface even though this post is not there to replace the WinDBG online help nor provide a detailed walkthrough.\nThe debugging section of the Home tab is not too different from what you get in Visual Studio:\nThe icons are even easier to understand because their action is also displayed. If you want to see the current call stack, select the View tab and click the Stack icon:\nLike in Visual Studio, you are able to pin each panel wherever you want into the IDE\nThe next step would be to set a breakpoint on the line of code you are interested in. But let’s be clear here: I’m talking about a line of code in a function exported by a native dll; not a line of C# code in a managed assembly. Remember that WinDBG is a native debugger and it debugs only native code. If you want to debug managed code with WinDBG, you need to use commands from the sos extension; but this is another story.\nSo let’s go back to the native world. Even though WinDBG does not have the notion of “solution” like Visual Studio provides, it is still possible to open a C++ file and set a breakpoint in it. Click File |Open Source File menu and go to your Core CLR github repo to select syncblk.cpp under the \\src\\vm folder. Look for AwareLock::EnterEpilogHelper with CTRL+F (yes: search is working in WinDBG) and go down to the call to the FireEtwContentionStart_V1 helper. Setting a breakpoint on this line is as simple as pressing **F9 **like in Visual Studio. Press the View tab and click the Breakpoint button to see the result:\nSince the dll in which the breakpoint is set is not loaded yet, you can’t see the details of the breakpoint.\nThere is a way to tell WinDBG to continue the execution of the application until a dll get’s loaded. For coreclr.dll, type the following command:\nsxeld:coreclr\nand type F5 (or type g as a command or click the green triangle in the Home toolbar) to resume the execution of the application. The Breakpoints panel shows more details now:\nPress F5 to resume the execution and the breakpoint should be triggered when the first contention happens.\nFrom symbols to call stacks in WinDBG Before digging into call stacks, I would like to show you one of the differences between native dlls and managed assemblies. As a .NET developer, you are used to Intellisense and strongly typed environment provided by the metadata stored in an assembly itself. For native dll, the story is different. Exported functions are visible with tools such as Dependency Walker or dumpbin /exports from the SDK. If the dll exports symbols built by the C++ compiler, their name gets mangled to describe their signature. To get human readable symbols, you need the associated .pdb file. It will also be required to map a function address to its name in call stacks.\nWinDBG allows you to browse these symbols with the dt command. For example, if you want to know all members defined by the AwareLock class, use the following command:\ndt CoreClr!AwareLock::*\nLike what was shown in the previous Breakpoints screenshot, the prefix of a name is the dll in which the symbol is defined. Next, use ! as separator before the class name. Since Visual Studio is really slow to navigate the source code of the Core CLR or search in the thousands of include and C/C++ files, this is a very convenient way to navigate and learn its different parts. Don’t forget that the compilation could also inline functions (that won’t be visible in the symbols) and expand macros.\nIf you want to set a breakpoint on a function, use the bp command with the same syntax as dt. For example, the following command:\nbp coreClr!AwareLock::EnterEpilogHelper\nsets a breakpoint at the beginning of the function in which I already set a breakpoint.\nThis is the very basics of breakpoints in WinDBG. You are also able to define which actions to start when a breakpoint is hit. This is extremely powerful! For example, in the case of thread contention, you typically don’t want to stop the execution of the application because it will pause all threads and disturb the normal flow of execution that could lead to thread contention. Instead, you could ask WinDBG to dump the call stack leading to the function we are interested in and lets the execution resume with the following syntax:\nbp coreClr!AwareLock::EnterEpilogHelper \u0026quot;!clrstack; g\u0026quot;\nThe commands to execute after the breakpoint is hit are defined between quotes. In this example, I’m using the clrstack command exported by the sos.dll extension (that must be previously loaded via .loadby sos coreclr) and once it is done, g resumes the execution.\nWhat’s next? Due to automatic suspension of all threads when the clrstack command gets executed (before g resumes), the interactions between threads are not the same as normal execution outside of a debugger. I have even used some code available in DEBUG to dump the callstacks outside of a debugger if the contention last more than a threshold. However, it was not possible to reproduce the problem on Windows.\nIn parallel on Linux, another colleague investigated another lead: some events may also be skipped by our LTTng implementation. Due to complicated event management, if a ContentionStop and ContentionStart are missed, a possible previous ContentionStart could be used by the next ContentionStop and the duration would be unrelated to the real contention that happened.\nSo there could be a simpler way to narrow down the issue: instead of relying on two events, why not simply compute the duration of the contention in the AwareLock::EnterEpilogHelper function and emit only one new event with the duration as payload? Well… this will be the topic of the next episode of this series.\nReferences Series of videos from the Defrag Tools show where Maoni Stephens explains how to debug the Garbage Collector for a better understanding of its arcana\nhttps://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-33-CLR-GC-Part-1 https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-34-CLR-GC-Part-2 https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-35-CLR-GC-Part-3 https://channel9.msdn.com/Shows/Defrag-Tools/Defrag-Tools-36-CLR-GC-Part-4 ","cover":"https://chrisnas.github.io/posts/2019-04-04_let-debug-the-core/1_kqvC1tJsfN_lpaD6rclGjA.png","date":"2019-04-04","permalink":"https://chrisnas.github.io/posts/2019-04-04_let-debug-the-core/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series shows how we debugged the Core CLR to figure out insane contention duration.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: CLR Threading events with TraceEvent.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2018-12-15_spying-on-net-garbage/\"\u003eSpying on .NET Garbage Collector with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2019-02-12_building-your-own-java/\"\u003eBuilding your own Java GC logs in .NET\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eLong before migrating our .NET applications to Linux, our first step was to build a monitoring pipeline based on LTTng instead of ETW on Windows. To achieve this goal, the open source TraceEvent Nuget package needed to be updated in order to listen to LTTng live session (only a file based implementation was provided by Microsoft; mostly to allow Perfview to be able to open traces taken on Linux machines). This was a \u003ca href=\"https://github.com/criteo-forks/perfview/pull/1\"\u003ehuge development task\u003c/a\u003e that led sometimes to weird results. Among the metrics we wanted to monitor, the contention duration gave insane value such as thousands of minutes… per minute:\u003c/p\u003e","title":"Let’s debug the Core CLR with WinDBG!"},{"content":" Introduction At Criteo, CLR metrics are collected by a service that listens to ETW events (see the related series). On a few servers, the metrics stopped being collected and we had to fix the problem by manually polling new and dead processes. After deploying the new version, the same scenario started to happen: on some servers, the metrics were no more collected.\nIn an investigation, the first step is always trying to check the environment. In our case, on a server where the metrics collector is up and running, a dedicated ETW session should be created to listen to the CLR events.\nThe name given to the session allows us to easily detect if the session is present or not. In the case of a faulted server, the session was not present.\nIf you look at the code described in a previous post, it is not easy to guess why the session would be stopped by the metrics collector:\n1 2 3 4 5 6 7 8 9 10 11 12 private void ListenToEtw(TraceEventSession etwSession) { try { // this call is blocking... until ewtSession.Stop is called (done in Dispose) etwSession.Source.Process(); } finally { etwSession.Dispose(); } } A TraceEventSession is created and passed to a dedicated thread to process the events until Stop is called at the end of the application.\nThe second step of an investigation is trying to reproduce the issue in a controlled environment such as… my developer machine. I setup the Exception Settings of the debugger to stop on any managed exception:\nThat way, if something bad happens while I’m debugging the application, Visual Studio tells me exactly where the exception was thrown.\nAfter starting and stopping applications monitored by the metrics collector a few times, an exception was thrown in the code responsible for mapping the application id and the component in charge of storing the metrics of this process. When looking at the code, it seems that there was a “timing” conflict between the code in charge of detecting new and dead processes (in a timer) and the code receiving the events from ETW (in the dedicated thread described earlier). A CLR event was received after the corresponding process was detected as being dead. The net effect of the uncaught exception was fast: the TraceEvent session stopped its execution and the Process method returned. Nothing special visible outside of a debugger with the right exception settings. This is a great scenario to understand why swallowing all exceptions is not a good pattern…\nStill not working The next step is to fix the code to handle the dead process case, build it and test it. Unfortunately, the new metrics collector, from time to time, does not seem to receive any CLR event. Even worse, this time the ETW session is still here as shown by logman -ets. Going back inside the Visual Studio debugger, everything is working fine: the TraceEvent session is created, its Process method called and blocked in a dedicated thread and… the events are received! It means that I’m not able to reproduce the problem while running under debugger control. Maybe the problem could come from the code responsible to filter out events sent by unmonitored applications.\nSo, I’m adding a Breakpoint in the method responsible for checking monitored process to ensure that there is no bug there (events could be received but skipped due to invalid process ID mapping for example):\n1 2 3 4 5 6 7 8 9 10 private bool IsMonitoredEvent(TraceEvent traceEvent) { var isMonitored = traceEvent.ProcessID == _monitoredPid; if (isMonitored) { NotifyProcessedEvent(traceEvent); } return isMonitored; } The breakpoint is hit and I’m able to validate that there is no problem in the mapping code\nI can even check that the events I’m expecting are all received by asking Visual Studio to trace the event name with a tracepoint when isMonitored is true:\nAnd I get all expected events in the Output Window. If some events were missing, it could have explained that metrics based on events series (such as contention duration) were not computed.\nI’m now running the application outside of the debugger… and no event. Just to confirm that I’m not crazy, I decide to add traces in the source code:\nBut how to get the output without an attached debugger? The trick is to start SysInternals Debug View and wait for the event names to appear: nothing. Even by moving the Debug.WriteLine call outside of the if block, no event is ever received, even from unmonitored processes.\nNavigating memory by stack frame Let’s summarize the investigation status:\nThe metrics collector is working only when under the control of a debugger. If the debugger is attached after it is started, the events are not received. I don’t know why but this kind of weird behaviors is always happening at Criteo on a Friday. So let’s start a joined debugging session with Kevin, Jean-Philippe and Gregory!\nTo better understand what is the state of the metrics collector when the events are not received, the application is launched and the debugger is then attached to it. I open the Parallel Stacks panel and double-click the stack frame with a valuable context:\nIn our case, it would be interesting to get a view on the state of the ETWTraceEventSource object used by TraceEvent to process the events. Even if you don’t have the source code, it is still possible for the debugger to get a view of the object used as implicit “this” pointer by the ProcessOneFile method. Summon the Quick Watch dialog (Shift-F9 with my old VC 6 keyboard shortcut) and type “this”\nBased on own understanding of how the ETWTraceEventSource is working, we know that registered event handlers are associated to entries in its templates field.\nInstead of the expected GC, thread pool, exception and contention events, only kernel related events are defined:\nBut breakpoints have been set and hit on the code that registers our own event handlers! Well… it was the case when we debugged the application from start. What if… the event source we are looking at now is not the one our code has registered its handlers to?\nWithout the address of the object available like in C++, it is complicated to easily check if two references actually point to the same object in memory. However, it is still possible to associate a numeric ID to an object with Make Object ID:\nAnd its ID {$1} is now visible after the type name:\nTo compare with the ETWTraceEventSource object manipulated by our own source code, double-click the right frame in Parallel Stack:\nIn the ListenToEtw method, this refers to our SessionManager in which the ClrTraceEventParser property references the ETWTraceEventSource object we believe we initialized with the right event handlers:\nAnd… this object does not have any ID: it should be {$1}.\nTo confirm that we are not looking at the source object {$1} that TraceEvent uses to receive events, it is just a question of checking its template field:\nAnd here are the expected CLR events we are interested in!\nRace condition… again So now the question is: how is it possible to have two instances of a TraceEvent internal class?\nHere is the sequence of execution in our code:\nThread 1 is creating a TraceEventSession object Thread 1 starts Thread 2 Thread 1 accesses the clrTraceEventParser via _currentSession?.Source.Clr and\nThread 2 calls etwSession.Source.Process() So the Source property getter could be called from two different threads. Unfortunately, in the getter, the source is lazily created in a non thread-safe way.\n1 2 3 4 5 6 7 8 9 10 11 12 public ETWTraceEventSource Source { get { if (m_source == null) { ... // long code m_source = new ETWTraceEventSource(SessionName, TraceEventSourceType.Session); } return m_source; } } When both threads enter the getter, m_source could be null and in that case, two ETWTraceEventSource objects are created and returned. One is used by TraceEvent to listen to events and the other by our code to register handlers to events that will never be received.\nThe fix is simply to force the initialization of the Source object in the first thread.\nIt is now a good time to go back home… to take a well deserved vacation!\n","cover":"https://chrisnas.github.io/posts/2019-02-22_debugging-friday-hunting-down/1_AvVFp0WGFB-NC2mKMyhlEQ.png","date":"2019-02-22","permalink":"https://chrisnas.github.io/posts/2019-02-22_debugging-friday-hunting-down/","summary":"\u003chr\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAt Criteo, CLR metrics are collected by a service that listens to ETW events (\u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003esee the related series\u003c/a\u003e). On a few servers, the metrics stopped being collected and we had to fix the problem \u003ca href=\"/posts/2018-11-13_get-process-name-challenge/\"\u003eby manually polling new and dead processes\u003c/a\u003e. After deploying the new version, the same scenario started to happen: on some servers, the metrics were no more collected.\u003c/p\u003e\n\u003cp\u003eIn an investigation, the first step is always trying to check the environment. In our case, on a server where the metrics collector is up and running, a dedicated ETW session should be created to listen to the CLR events.\u003c/p\u003e","title":"Debugging Friday — Hunting down race condition"},{"content":" This post of the series focuses on logging each GC details in a file and how to leverage it during investigations.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: Monitor Finalizers, contention and threads in your application.\nPart 4: Spying on .NET Garbage Collector with TraceEvent.\nIntroduction I’m working in a team where we investigate issues in production: both for Java and .NET applications. This is a good opportunity to learn what are the features provided by Java that are missing in .NET. One of the features heavily discussed with my colleague Jean-Philippe is called the GC Log. It is possible to start an application with parameters that tell the GC to save tons of details about each garbage collection in a file : the GC Log. Based on this file, it is possible to extract the reason of a collection, the times of the different phases including the suspension time. This is a great source of information during investigations… when you know how GC is working or by leveraging automatic report generation.\nIn addition, you can also build your own UI to more easily understand what is going on and get a more visual representation of the situation.\nIn the short video above you can see the heap evolution during several days. Then, as this is an interactive HTML page you can zoom in an interesting period to have a more detailed view of the evolution between GCs.\nAlso for the pause time graph, you can follow the behavior of the GC with different kinds of pauses and associated phases. In this example, we have minor GCs happening and then an “initial mark” is triggered, followed by “final remark” and “cleanup” pauses. After an extra minor GC, we have a series of mixed GCs that is the result of what was planned by the GC after the marking phase.\nIn the .NET world, there is no such thing as a GC Log. However, as shown in the previous post, it is possible to use Perfview to analyze traces corresponding to collected CLR events. The GCStats view shows high level details in the “All GC Events” section. In addition to this HTML rendering, you can get access to the data itself in different formats\nThe more complete one is the Raw Data XML file that you could parse to extract the details you need. This is very close to a .NET GC Log but it is complicated to build an automated process to get it from a production machine.\nIt would be great if you could tell a .NET application to generate such a GC Log like in Java instead of relying on manual steps with Perfview (and more scripts on Linux). This post will show you how to achieve this goal!\nDefining the goal: basic GcLog implementation In Java, you have to set on or off the GC log before the application starts and you can’t change it while it runs. Since I’m working with server applications, I would prefer to enable/disable the generation of a GC log file without having to stop and restart the application.\nSo I’ve defined the simple IGcLog interface:\n1 2 3 4 5 6 public interface IGcLog { void Start(string filename); void Stop(); } In a dedicated administration handler (i.e. http endpoint of the application), the code could just use a class that implements this interface and call Start when the log is enabled and Stop when it is no more needed.\nTo make the implementation easier, I’ve written the following GcLogBase abstract class:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 public abstract class GcLogBase : IGcLog { protected string Filename; private StreamWriter _fileWriter; public void Start(string filename) { if (string.IsNullOrEmpty(filename)) throw new ArgumentNullException(nameof(filename)); if (_fileWriter != null) throw new InvalidOperationException(\u0026#34;Start can\u0026#39;t be called twice: Stop must be called first.\u0026#34;); _fileWriter = new StreamWriter(filename); Filename = filename; OnStart(); } public void Stop() { if (string.IsNullOrEmpty(Filename)) return; OnStop(); Filename = null; _fileWriter.Flush(); _fileWriter.Dispose(); _fileWriter = null; } protected bool WriteLine(string line) { if (_fileWriter == null) return false; // just in case the method is called AFTER Stop _fileWriter.WriteLine(line); return true; } protected abstract void OnStart(); protected abstract void OnStop(); } Its main goal is to hide the file management by providing the WriteLine method that child classes would call to save the details of a garbage collection into a single line of text. The write operations are flushed when Stopis called. This combination allows asynchronous writes with low performance impact: don’t be scared if you don’t see the file size change because the StreamWriter class is caching write operations.\nThe next step is to implement OnStart and OnStop in a derived class to enable/disable GC details retrieval.\nHow to get the GC details: the easy way? As already discussed in the previous posts of the series, the CLR is emitting traces (via ETW on Windows and LTTng on Linux) that can be collected in C#. You have already seen how TraceEvent could help collecting and parsing GC traces from any application like what Perfview is doing. With TraceEvent, the TraceGC instance received when a garbage collection ends contains tons of information: it’s mapped to the GarbageCollectionArgs structure that you get while listening to the GarbageCollection event of my ClrEventsManager helper class. The only information to provide is the ID of the .NET process I’m interested in: that way, is it easy to filter the events for this process only.\n1 2 3 4 5 6 7 8 9 EtwGcLog gcLog = EtwGcLog.GetProcessGcLog(pid); var filename = GetUniqueFilename(pid); gcLog.Start(filename); // in a simple Console application, wait for the user to press ENTER. // in a more realistic case, keep track of the EtwLog instance and // call Stop to end the processing of events when needed. gcLog.Stop(); The GetUniqueFilename method builds a filename based on the process ID and the time of the day:\n1 2 3 4 5 6 7 8 private static string GetUniqueFilename(int pid) { var now = DateTime.Now; string filename = Path.Combine(Environment.CurrentDirectory, $\u0026#34;{pid.ToString()}_{now.Year}{now.Month}{now.Day}_{now.Hour}{now.Minute}{now.Second}.csv\u0026#34; ); return filename; } The GetProcessGcLog method is a factory-like helper to build an instance bound to the given process ID.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public static EtwGcLog GetProcessGcLog(int pid) { EtwGcLog gcLog = null; try { var process = Process.GetProcessById(pid); process.Dispose(); gcLog = new EtwGcLog(pid); } catch (System.ArgumentException) { // there is no running process with the given pid } return gcLog; } The implementation of OnStart and OnStop overrides is straightforward based on the previous post:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 protected override void OnStart() { string sessionName = $\u0026#34;GcLogEtwSession_{_pid.ToString()}_{Guid.NewGuid().ToString()}\u0026#34;; Console.WriteLine($\u0026#34;Starting {sessionName}...\\r\\n\u0026#34;); _userSession = new TraceEventSession(sessionName, TraceEventSessionOptions.Create); Task.Run(() =\u0026gt; { // only want to receive GC event ClrEventsManager manager = new ClrEventsManager(_userSession, _pid, EventFilter.GC); manager.GarbageCollection += OnGarbageCollection; // this is a blocking call until the session is disposed manager.ProcessEvents(); Console.WriteLine(\u0026#34;End of CLR event processing\u0026#34;); }); // add a header to the .csv file WriteLine(Header); } protected override void OnStop() { // when the session is disposed, the call to ProcessEvents() returns _userSession.Dispose(); } The created TraceEventSession is passed to the ClrEventManager with the process ID with a filter to receive only GarbageCollection event notifications. The OnGarbageCollection handler is super simple:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 private void OnGarbageCollection(object sender, GarbageCollectionArgs e) { _line.Clear(); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.StartRelativeMSec.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Number.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Generation.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Type); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Reason); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.IsCompacting.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.SuspensionDuration.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.PauseDuration.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.BGCFinalPauseDuration.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Gen0Size.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Gen1Size.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.Gen2Size.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.LOHSize.ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeBefore[0].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeBefore[1].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeBefore[2].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeBefore[3].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeAfter[0].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeAfter[1].ToString()); _line.AppendFormat(\u0026#34;{0},\u0026#34;, e.ObjSizeAfter[2].ToString()); _line.AppendFormat(\u0026#34;{0}\u0026#34;, e.ObjSizeAfter[3].ToString()); WriteLine(_line.ToString()); } Each garbage collection appears as a textual line with the following columns separated by a comma:\nThe last twelve pieces of information require some explanation:\n· **xxxBefore **: size of a generation before the collection; without free list\n· **xxxAfter **: size of a generation after the collection; without free list\n· **xxxSize **: size of a generation (including LOH) after the collection; including free list (i.e. fragmentation)\nThe computation of these sizes relies on inner fields of the TraceGC argument receives from TraceEvent. The xxxSize are grouped in the GenerationSize0/1/2/3 fields of the HeapStat property. It is a little bit more complicated for the Before/After sizes. The Garbage Collector keeps track of these numbers in the PerHeapHistories field: an array of GCPerHeapHistory elements; one per heap (i.e. one per core for server GC). The next level is provided by the GenData field storing an array of GCPerHeapHistoryGenData elements; one per generation with LOH as the last index 3. So, to compute the size of each generation, it is needed to iterate on each heap:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 private long[] GetGenerationSizes(TraceGC gc, bool before) { var sizes = new long[4]; if (gc.PerHeapHistories == null) { return sizes; } for (int heap = 0; heap \u0026lt; gc.PerHeapHistories.Count; heap++) { // LOH = 3 for (int gen = 0; gen \u0026lt;= 3; gen++) { sizes[gen] += before ? gc.PerHeapHistories[heap].GenData[gen].ObjSpaceBefore: gc.PerHeapHistories[heap].GenData[gen].ObjSizeAfter; } } return sizes; } The code of the GetGenerationSizes helper method does that sum the value of either ObjSpaceBefore or ObjSizeAfter.\nAs you have probably noticed from the implementation, it is possible that the PerHeapHistories field is not filled up and all Before/After values are zero. This happens for a background gen2 collection. Also note that for gen0 and gen1 collection the value for gen2 and LOH is also 0 (make sense that gen2 and LOH do not change during such a collection).\nA little bit of UI Now that a .csv file containing all garbage collections details is available, it is time to provide some UI on top of it such as the following for GC pauses:\nLet’s start what you can get for Excel champions:\nGeneration ratio Sizes of generations including Large Object Heap Top 10 pauses (including suspension time comparison) But you can get better interaction thanks to Jean-Philippe. My colleague adapted his script for JVM to my .NET GC log .csv format: it generates some nice zoomable HTML UI.\nThis short video above shows the heap evolution during ~20 minutes. Then, as this is an interactive HTML page you can focus on gen2 and LOH impact on memory consumption.\nFor the pause time graph on the same period, it is very easy to detect long pauses (even for gen0 collection) and zoom into smaller period to figure out the impact of different collections.\nThe code available on Github has been updated to make the EtwGcLog class available to you.\n","cover":"https://chrisnas.github.io/posts/2019-02-12_building-your-own-java/1_nd462AscBlWil52op5ObsA.png","date":"2019-02-12","permalink":"https://chrisnas.github.io/posts/2019-02-12_building-your-own-java/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series focuses on logging each GC details in a file and how to leverage it during investigations.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003eMonitor Finalizers, contention and threads in your application\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2018-12-15_spying-on-net-garbage/\"\u003eSpying on .NET Garbage Collector with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eI’m working in a team where we investigate issues in production: both for Java and .NET applications. This is a good opportunity to learn what are the features provided by Java that are missing in .NET. One of the features heavily discussed with my colleague \u003ca href=\"https://twitter.com/jpbempel\"\u003eJean-Philippe\u003c/a\u003e is called the \u003cem\u003eGC Log\u003c/em\u003e. It is possible to start an application with parameters that tell the GC to save tons of details about each garbage collection in a file : the GC Log. Based on this file, it is possible to extract the reason of a collection, the times of the different phases including the suspension time. This is a great source of information during investigations… when you know how GC is working or by leveraging \u003ca href=\"https://gceasy.io/\"\u003eautomatic report generation\u003c/a\u003e.\u003c/p\u003e","title":"Building your own Java-like GC logs in .NET"},{"content":" My colleague Kevin has just described how to implement Java ReferenceQueue in C# as a follow-up to Konrad Kokosa’s article on this Java class. Among the different discussed features, one is still missing. This post will discuss how to deal with the “middle age crisis” scenario and control finalizer threading issues. I’m sure that my former Microsoft colleague Sebastien won’t be surprised by my interest in the subject.\nWhen a class references both IDisposable instances and native resources, the usual C# pattern is to implement both IDisposable for explicit cleanup and a Finalizer to deal with developers who would have forgotten the explicit cleanup. This pattern might have a side effect when these classes are also referencing a large objects graph.\nLet’s take a minute to describe how finalizers are managed by the CLR\nThis animation shows what happens at the end of a collection. The darkened objects are no more referenced and should be collected. B, G and H do not implement finalizers so that could be discarded. It is different for E, I and J because their classes implement a finalizer. First, a Finalization list was holding a “weak” reference to them since they were created. Then, at the end of a collection, these references are moved to the FReacheable queue and the collection ends. Later on, after the collection ends, the finalizer thread wakes up and calls the finalizer of all objects referenced by the FReacheable queue. This is the important part of the issue: it means that even though those objects weren’t referenced anymore, they couldn’t be collected nor their memory be reclaimed because the finalizer thread has not run yet. As they could not be reclaimed, they are promoted to the next generation just like other survivors. So if those objects were in generation 0, they now end up in generation 1, extending their lifetime. It’s even worse if they get promoted from generation 1 to generation 2, as the next gen 2 collection might happen only very far in the future. This artificially increases the memory consumption of the application.\nTo summarize, in case of business objects that hold a large references tree with also native resources, it would be great to be able to:\nAllow explicit cleanup resources with the IDisposable pattern Discard the managed memory when the objects are collected Automatically cleanup native resources AFTER they are collected Have control on the thread that is cleaning up native resources Mix a Phantom with IDisposable The requirement #3 seems impossible to fulfill: how to access to field of an object if its memory has been reclaimed? Maybe it is possible to cheat: what if these native resources usually held as IntPtr field would be copied when the object is still alive? That way, the cleanup code could be moved outside of the object itself. This is basically the PhantomReference Java idea implemented in C# by Kevin with his PhantomObjectFinalizer:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 public class PhantomObjectFinalizer : PhantomReference\u0026lt;LargeObject\u0026gt; { private int _handle; public PhantomObjectFinalizer(ReferenceQueue\u0026lt;LargeObject\u0026gt; queue, LargeObject value) : base(queue, value) { _handle = value.Handle; } public void FinalizeResources() { Console.WriteLine(\u0026#34;I\u0026#39;m cleaning handle \u0026#34; + _handle); } } Let’s make it generic in term of native payload:\n1 2 3 4 5 6 7 8 9 10 public class PhantomObjectFinalizer\u0026lt;T, S\u0026gt; : PhantomReference\u0026lt;T\u0026gt; where T : class { public S State; public PhantomObjectFinalizer(ReferenceQueue\u0026lt;T\u0026gt; queue, T value, S state) : base(queue, value) { State = state; } } Also note that the cleaning method has been removed due to the requirement #1: the LargeObjectshould be responsible for cleaning the resources because it will also implements IDisposable. The cleaning native part will obviously be shared with the Dispose method.\nThe LargeObject could be rewritten to use it and the first step is group native resources in a state:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 public class LargeObject : IDisposable { ... class NativeState { public bool _disposed; public IntPtr _handle1; public IntPtr _handle2; } private NativeState _state; // Plus some heavy stuff that we don\u0026#39;t want to keep around once the object // is no more referenced public LargeObject() { _state = new NativeState() { _handle1 = (IntPtr) (DateTime.Now.Ticks), _handle2 = (IntPtr) (DateTime.Now.Ticks + 1), }; // this is where we imagine using a PhantomObjectFinalizer } public void Dispose() { if (_state._disposed) return; Cleanup(_state); CleanupIDisposableFields(); } private void CleanupIDisposableFields() { // call Dispose on all IDisposable fields } private static void Cleanup(NativeState state) { if (state._disposed) return; state._disposed = true; Console.WriteLine($\u0026#34;cleanup native resource {state._handle1.ToString()}\u0026#34;); Console.WriteLine($\u0026#34;cleanup native resource {state._handle2.ToString()}\u0026#34;); throw new InvalidOperationException(\u0026#34;I messed up the cleaning...\u0026#34;); } } The native payload is stored in a NativeState object that also contains the _disposed IDisposablestatus. This is required to be able to know if the object has been disposed explicitly when the static Cleanup method is called. This implementation fulfills the requirement #1 even though the cleanup code is throwing an exception: we will have to see how to control it.\nIntroducing the Cleaner a la Java The next step is to focus on requirements #2 and #3: how to ensure that our LargeObject memory gets reclaimed by the garbage collector but still automatically cleanup the native resources? This scenario is handled by the Cleaner class in Java mentioned reading Konrad’s article and that I have learnt to know better by discussing at length with Jean-Philippe, our team Java internals expert.\nYou can register an object, a state and a callback that will be called when the object is no more referenced. It is a kind of secondary finalization mechanism.\nLet’s see how I would like to use it in C#:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 public class LargeObject : IDisposable { private static readonly Cleaner\u0026lt;LargeObject, NativeState\u0026gt; _cleaner; static LargeObject() { _cleaner = new Cleaner\u0026lt;LargeObject, NativeState\u0026gt;(Cleanup, OnError); } private NativeState _state; // Plus some heavy stuff that we don\u0026#39;t want to keep around once the object is no more referenced public LargeObject() { _state = new NativeState() { _handle1 = (IntPtr) (DateTime.Now.Ticks), _handle2 = (IntPtr) (DateTime.Now.Ticks + 1), }; _cleaner.Track(this, _state); } There will be a unique Cleaner object for all LargeObject instances. Each one will register itself and its native state in its constructor by calling the Cleaner.Track method.\nThe Cleaner instance receives two static callbacks:\nCleanup: this method will be called by the cleaner after a tracked LargeObject instance has been collected. As you can see, there is no need to change its initial implementation. It was static and receives the NativeState that stores the native state of a LargeObject. Since the NativeState type is a private inner class, the implementation details does not leak from LargeObject like it was the case with Kevin’s PhantomObjectFinalizer implementation. OnError: when an exception occurs during the cleanup (like my naïve implementation did by throwing an InvalidOperationException), the method gets called. This is a new feature compared to a .NET finalizer: you are notified if something goes wrong and you are able to log it. However, I would recommend to still exit the application like the default CLR behavior when a finalizer throws an exception. The LargeObject code is therefore responsible for cleaning both IDisposableand native resources: no need for its users to know the gory details.\nThe high-level API of the Cleaner class has been defined; it is now time to see how to implement it. If you have read Kevin’s post, the first step should be obvious: a ReferenceQueue will keep track of the PhantomObjectFinalizer bound to each “business object” like LargeObject. When the latter is collected, the phantom finalizer gets called to enqueue itself to the ReferenceQueue.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 /// T is the type to finalize /// S contains the native state of T to be cleaned up /// This separation enforces the pattern where the \u0026#34;native\u0026#34; part stored by a managed type is kept in a State /// The state could even be used as is in T (kind of a SafeHandle holding several native resources) public class Cleaner\u0026lt;T, S\u0026gt; where T : class { ... private readonly ReferenceQueue\u0026lt;T\u0026gt; _queue; public Cleaner(Action\u0026lt;S\u0026gt; onCleanup, Action\u0026lt;Exception\u0026gt; onError) { _onCleanup = onCleanup; _onError = onError; _queue = new ReferenceQueue\u0026lt;T\u0026gt;(); ... } private Action\u0026lt;S\u0026gt; _onCleanup; private Action\u0026lt;Exception\u0026gt; _onError; ... public void Track(T value, S state) { var phantomReference = new PhantomObjectFinalizer\u0026lt;T, S\u0026gt;(_queue, value, state); } ... } There is one big missing step: who will call the queue Poll method to get the finalized PhantomObjectFinalizer that contains the native state to cleanup?\nStay in control of the cleaner job The simple implementation I’ve chosen is to create a dedicated thread that will poll the queue every period you want and call the cleanup callback. I did not want to add pressure on the ThreadPool that is shared with the application. If an exception is raised, the error callback will be called.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 public class Cleaner\u0026lt;T, S\u0026gt; where T : class { private const int DefaultCleanupPeriod = 1000; private Action\u0026lt;S\u0026gt; _onCleanup; private Action\u0026lt;Exception\u0026gt; _onError; private readonly ReferenceQueue\u0026lt;T\u0026gt; _queue; private readonly Thread _cleanerThread; private readonly ManualResetEvent _exitEvent; private readonly int _cleanupPeriod; public Cleaner(Action\u0026lt;S\u0026gt; onCleanup, Action\u0026lt;Exception\u0026gt; onError, int cleanupPeriod = DefaultCleanupPeriod) { _onCleanup = onCleanup; _onError = onError; _cleanupPeriod = cleanupPeriod; _queue = new ReferenceQueue\u0026lt;T\u0026gt;(); _exitEvent = new ManualResetEvent(false); _cleanerThread = new Thread(PeriodicCleanup); _cleanerThread.IsBackground = true; // allow process exit even though the thread is still running _cleanerThread.Start(this); } private void PeriodicCleanup(object parameter) { Cleaner\u0026lt;T, S\u0026gt; cleaner = (Cleaner\u0026lt;T, S\u0026gt;) parameter; while (true) { var exit = _exitEvent.WaitOne(cleaner._cleanupPeriod); if (exit) return; try { PhantomReference\u0026lt;T\u0026gt; reference; while ((reference = _queue.Poll()) != null) { var finalizedReference = (PhantomObjectFinalizer\u0026lt;T, S\u0026gt;) reference; // cleanup native fields _onCleanup(finalizedReference.State); } } catch (Exception x) { _onError(x); } } } public void Track(T value, S state) { var phantomReference = new PhantomObjectFinalizer\u0026lt;T, S\u0026gt;(_queue, value, state); } public void Untrack(T value) { _queue.Untrack(value); } public void Dispose() { _exitEvent.Set(); } } Since I’ve created the thread as a background thread, it won’t block .NET to exit the process when the last foreground thread returns. However, you are free to follow the IDisposable pattern, and call Dispose to explicitly stop the cleaning thread at the right time of your application lifecycle.\nIn the IDisposable/finalizer pattern, the GC class provides the SuppressFinalize static method to remove an object when it has been explicitly disposed: that way, the object won’t go to the FReacheable queue nor be promoted into the next generation after it is collected. The Cleaner class provides the Untrack method to achieve the same effect: the object native payload won’t be cleaned up. I just had to update the ReferenceQueue to remove the object from the ConditionalWeakTable and remove the PhantomReference from the FinalizationList:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public class ReferenceQueue\u0026lt;T\u0026gt; where T : class { ... internal void Track(T value, PhantomReference\u0026lt;T\u0026gt; reference) { _table.Add(value, reference); } internal void Untrack(T value) { if (_table.TryGetValue(value, out var reference)) { _table.Remove(value); GC.SuppressFinalize(reference); } } } The requirement #4 is now fulfilled. You are obviously free to pick another implementation more suitable to your needs than a thread-based periodic cleanup. I would like to mention that if the cleanup callback never returns, the effect is almost the same as in the case of a stuck finalizer: the native resources won’t be cleaned up anymore.\nThe following code shows how all this “complicated” code does not leak in a C# application:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 class Program { static void Main(string[] args) { InnerScope(); GC.Collect(); GC.WaitForPendingFinalizers(); Console.WriteLine(\u0026#34;Done\u0026#34;); Console.ReadLine(); // use the following code to explicitly stop the cleaner thread //LargeObject.DisposeCleaner(); // the code is working without this call because the ticking thread // is a background thread and won\u0026#39;t stop the CLR to exit the process } static void InnerScope() { var largeObject = new LargeObject(); // it is also possible to support explicit disposing // var largeObject2 = new LargeObject(); // largeObject2.Dispose(); GC.Collect(); GC.WaitForPendingFinalizers(); GC.KeepAlive(largeObject); } } And you get the expected output:\nMaybe Konrad will integrate a smarter Java Cleaner-like feature within the CLR itself or Alexandre in his new .NET ;^)\n","cover":"https://chrisnas.github.io/posts/2019-01-16_fixing-net-middle-age/1_Asj0272CtE0YBUC81pX_FA.png","date":"2019-01-16","permalink":"https://chrisnas.github.io/posts/2019-01-16_fixing-net-middle-age/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2019-01-16_fixing-net-middle-age/1_Asj0272CtE0YBUC81pX_FA.png\"\u003e\u003c/p\u003e\n\u003cp\u003eMy colleague Kevin has just described \u003ca href=\"https://medium.com/@kevingosse/implementing-java-referencequeue-and-phantomreference-in-c-827d7141b6e4\"\u003ehow to implement Java ReferenceQueue in C#\u003c/a\u003e as a follow-up to \u003ca href=\"http://tooslowexception.com/do-we-need-jvms-phantomreference-in-net/\"\u003eKonrad Kokosa’s article\u003c/a\u003e on this Java class. Among the different discussed features, one is still missing. This post will discuss how to deal with the “middle age crisis” scenario and control finalizer threading issues. I’m sure that my former Microsoft colleague \u003ca href=\"https://twitter.com/sbovo\"\u003eSebastien\u003c/a\u003e won’t be surprised by my interest in the subject.\u003c/p\u003e\n\u003cp\u003eWhen a class references both \u003ccode\u003eIDisposable\u003c/code\u003e instances and native resources, the usual C# pattern is to implement both \u003ccode\u003eIDisposable\u003c/code\u003e for explicit cleanup and a Finalizer to deal with developers who would have forgotten the explicit cleanup. This pattern might have a side effect when these classes are also referencing a large objects graph.\u003c/p\u003e","title":"Fixing .NET middle-age crisis with Java ReferenceQueue and Cleaner"},{"content":" This post of the series focuses on CLR events related to garbage collection in .NET.\nPart 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nPart 3: CLR Threading events with TraceEvent.\nIntroduction The allocator and garbage collector components of the CLR may have a real impact on the performances of your application. The Book of the Runtime describes the allocator/collector design goals in the must read Garbage Collection Design page written by Maoni Stephens, lead developer of the GC. In addition, Microsoft provides large garbage collection documentation. And if you want more details about .NET garbage collector, take a look at Pro .NET Memory Management by Konrad Kokosa. In this post, I will focus on the events emitted by the CLR and how you could use them to better understand how your application is behaving, related to its memory consumption.\nThe impact on how your application behaves is mostly related to a couple of topics:\n*How many times and how long your threads get suspended during a collection *Desktop applications and games provide fluent User Interfaces where glitches are less and less acceptable. In the opposite side of the spectrum, low latency server applications have short SLAs to answer each request. In both cases, applications cannot afford freezing for too long while the high priority GC threads are cleaning up the .NET heaps for background GCs or blocking non concurrent GCs. *How much memory is dedicated to your process *With the rise of containers and their quotas, your application needs to trim down its memory consumption. For example, with server GC enabled, the amount of memory used by your application could grows big (depending on the number of cores) before a gen 0 collection kicks in (read this discussion about real world cases including StackOverflow web site and what are the possible solutions) The memory pressure on the system is also taken into account by the GC and could lead to more collections being triggered (read Maoni Stephen blog post about how Windows jobs are taken into account by the GC and how to leverage them if needed). It becomes more and more important to detect leaks and memory consumption spikes. In the previous post, you saw how to get the type name of instances being finalized. The CLR provides many more events related to memory management. They definitively help understand the interactions between this crucial part of .NET and your own code. In this article, you will see how to replace the not always consistent performance counters such as generation sizes or collection counts. More importantly, you will get very useful metrics information like the type of GC (foreground or background) and your application threads suspension time.\nSequences of events during Garbage Collection phases Ephemeral collections (of generation 0 and 1) are called “stop-the-world”: your application threads will be frozen during the whole collection. For generation 2 background collections, it is a little bit more complicated. As shown in the following figure (with Konrad Kokosa courtesy from his book)\nThe applications threads will be frozen during different phases:\nInitial internal step at the beginning of the collection, At the end of the marking phase to reconcile the changes (allocations, references updates) done while background collection threads are running (also if compaction is needed). Look for documentation about card table usage to get more details, If a compaction occurs. Please read Understanding different GC modes with Concurrency Visualizer to go deeper and blog posts from Matt Warren and Maoni Stephens about GC pauses.\nWhat are the available garbage collections metrics? The Perfview tool could help you analyze how many garbage collections occurred and for which reason. Select Run in the Collect menu and click the Run Command button.\nYou could also trigger a collection after the application is started with Collect | Collect. When you want to stop collecting information, click the Stop Collection. When the .etl file gets generated, go to the GCStats node\nLook for your application to get statistics related to garbage collections. The first GC Rollup By Generation table gives you high level metrics such as the number of collections per generation and the mean pause time:\nThe next two sections list the collections with a pause time longer than 200ms before the section that lists all generation 2 collections:\nThe Suspend Msec columns gives you the time it took to suspend your application threads while Pause MSec counts the time during which your threads were actually suspended.\nIn addition to this, memory details such as the size of all generations after each collection are available:\nHowever, my goal is to get these details to feed monitoring dashboards as the application runs. I can’t use Perfview but I can still rely on the same CLR events.\nA solution for runtime please! Since version 2 of TraceEvent, there is an easy way to get already computed metrics about GC as described by Maoni Stephens. It relies on the same code as Perfview for its GCStats window.\nYou only need to subscribe to two events; one when a GC starts and one when it ends:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 var source = userSession.Source; source.NeedLoadedDotNetRuntimes(); source.AddCallbackOnProcessStart((TraceProcess proc) =\u0026gt; { proc.AddCallbackOnDotNetRuntimeLoad((TraceLoadedDotNetRuntime runtime) =\u0026gt; { runtime.GCStart += (TraceProcess p, TraceGC gc) =\u0026gt; { // a GC is starting }; runtime.GCEnd += (TraceProcess p, TraceGC gc) =\u0026gt; { // a GC ends }; }); }); The TraceGC class provides too many details beyond the scope of this post but here are the main fields that should be used in GCEnd event handler to monitor your applications:\nNote that the IsNotCompacting method currently returns invalid value.\nFinal words I would like to mention one last event related to memory management. The GCAllocationTick CLR event (mapped by the ClrTraceEventParser.GCAllocationTick event) is emitted after ~100 KB has been allocated by your application. As you can infer from the Microsoft documentation, the field of the GCAllocationTickTraceData argument received by your handler provides the following properties:\nAs you can see, listening to this GCAllocationTick event gives you a sampling of the allocations made in your application. This is not as precise as what a .NET profiler (relying on expensive ObjectAllocated and ObjectAllocatedByClass ICorProfilerCallback hooks) would provide but it is much less intrusive. However, I would not recommend to systematically listen to this event in production, especially if your application is allocating GBs of memory per minute. Unlike what the documentation states, you need to set the verbosity to TraceEventLevel.Verbose (and not Informational) when you enable the CLR provider and this could impact your application performances due to the high number of emitted CLR events.\nThis event could be very helpful in case of unusual LOH allocations because you would get the type of the objects in the LOH almost each time (the 85.000 bytes threshold is close to the 100 KB trigger limit) or simply to have an hint on the most allocated types over time. Note that you won’t get the callstack leading to the allocations triggering the event. Instead, for memory leak or memory usage analysis, I would definitively recommend you to use Perfview. Vance Morrison has published a series of videos that detail .NET memory investigations, collecting the data and analyzing the data with Perfview. You will also find a lot of detailed memory-related investigations guidelines in Konrad Kokosa’s book.\nYou now have a complete view of the CLR events interesting to understand the different phases of a garbage collection and a few interactions (suspension) with the Execution Engine. Everything is in hands to replace the performance counters by CLR events: the metrics are more accurate and you get access to more information such as suspension time or contention time. The code presented during all episodes is available on Github with an easy to reuse ClrEventManager class that you could plug into your own applications or monitoring service!\n","cover":"https://chrisnas.github.io/posts/2018-12-15_spying-on-net-garbage/1_INTuAJqcsWDbp8XM1ZtbMA.png","date":"2018-12-15","permalink":"https://chrisnas.github.io/posts/2018-12-15_spying-on-net-garbage/","summary":"\u003chr\u003e\n\u003cp\u003eThis post of the series focuses on CLR events related to garbage collection in .NET.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003eCLR Threading events with TraceEvent\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eThe allocator and garbage collector components of the CLR may have a real impact on the performances of your application. The Book of the Runtime describes the allocator/collector design goals in the must read \u003ca href=\"https://github.com/dotnet/coreclr/blob/master/Documentation/botr/garbage-collection.md\"\u003eGarbage Collection Design page\u003c/a\u003e written by Maoni Stephens, lead developer of the GC. In addition, Microsoft provides large \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/?WT.mc_id=DT-MVP-5003325\"\u003egarbage collection documentation\u003c/a\u003e. And if you want more details about .NET garbage collector, take a look at \u003ca href=\"https://www.amazon.com/Pro-NET-Memory-Management-Performance/dp/148424026X\"\u003ePro .NET Memory Management\u003c/a\u003e by \u003ca href=\"https://twitter.com/konradkokosa\"\u003eKonrad Kokosa\u003c/a\u003e. In this post, I will focus on the events emitted by the CLR and how you could use them to better understand how your application is behaving, related to its memory consumption.\u003c/p\u003e","title":"Spying on .NET Garbage Collector with TraceEvent"},{"content":" As the .NET Core 2.2 blog post introduced, it is now possible for a .NET Core application to listen to the events generated by the CLR that power it up. If you remember the Grab ETW Session, Providers and Events post, the CLR is emitting a lot of valuable events through ETW on Windows and LTTng on Linux. Thanks to TraceEvent nuget package, it is not that difficult to fetch these events at runtime on Windows, either in-process or out of process. However, it is much more complicated to achieve the same goal on Linux… With .NET Core 2.2, it is now super easy to listen to the events emitted by the CLR while your application is running: you simply need to implement a class that derives from System.Diagnostics.Tracing.EventListener and create an instance of it. Nothing more.\nThis class exists since .NET Framework 4.5 and .NET Core 1.0 but it could only be used to listen events pushed by managed code. Since .NET Core 2.2, it can also be used to listen to native events pushed by the CLR. The usage is simple, even a little bit magical.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 sealed class GcFinalizersEventListener : EventListener { // from https://docs.microsoft.com/en-us/dotnet/framework/performance/garbage-collection-etw-events private const int GC_KEYWORD = 0x0000001; private const int TYPE_KEYWORD = 0x0080000; private const int GCHEAPANDTYPENAMES_KEYWORD = 0x1000000; protected override void OnEventSourceCreated(EventSource eventSource) { Console.WriteLine($\u0026#34;{eventSource.Guid} | {eventSource.Name}\u0026#34;); // look for .NET Garbage Collection events if (eventSource.Name.Equals(\u0026#34;Microsoft-Windows-DotNETRuntime\u0026#34;)) { EnableEvents( eventSource, EventLevel.Verbose, (EventKeywords) (GC_KEYWORD | GCHEAPANDTYPENAMES_KEYWORD | TYPE_KEYWORD) ); } } // from https://blogs.msdn.microsoft.com/dotnet/2018/12/04/announcing-net-core-2-2/ // Called whenever an event is written. protected override void OnEventWritten(EventWrittenEventArgs eventData) { ... } } class Program { static void Main(string[] args) { GcFinalizersEventListener listener = new GcFinalizersEventListener(); Console.WriteLine(\u0026#34;\\nPress ENTER to trigger a few finalizers...\u0026#34;); Console.ReadLine(); for (int i = 0; i \u0026lt; 4; i++) { Thread t = new Thread(()=\u0026gt; {}); } GC.Collect(2, GCCollectionMode.Forced, true, true); Console.WriteLine(\u0026#34;\\nPress ENTER to exit...\u0026#34;); Console.ReadLine(); } } First, you implement a class that derives from EventListener and override the following two methods:\n1 2 void OnEventSourceCreated(EventSource eventSource) void OnEventWritten(EventWrittenEventArgs eventData) As soon as you new up an instance of your class, the OnEventSourceCreatedoverride is called for each event source defined in the application. An event source, as its name implies, produces events. You can define your own in managed code if you wish. For the sake of this post, I will focus on listening to the Microsoft-Windows-DotNETRuntime event source. The EventSourceinstance passed to the OnEventSourceCreatedmethod provides two interesting properties to let us identify the available sources. The following code :\n1 2 3 4 protected override void OnEventSourceCreated(EventSource eventSource) { Console.WriteLine($\u0026#34;{eventSource.Guid} | {eventSource.Name}\u0026#34;); } generates the following output :\n5e5bb766-bbfc-5662-0548-1d44fad9bb56 | Microsoft-Windows-DotNETRuntime 2e5dba47-a3d2-4d16-8ee0-6671ffdcd7b5 | System.Threading.Tasks.TplEventSource 8e9f5090-2d75-4d03-8a81-e5afbf85daf1 | System.Diagnostics.Eventing.FrameworkEventSource You can imagine that the first one is the source we are interested in listening to its events!\nBy default, there is no connection between the sources and your listeners: you need to enable the source by calling the EnableEventsmethod in your OnEventSourceCreatedoverride. This EventListenermethod takes the following arguments:\nEventSource eventSource: the event source you want to listen to EventLevel level: minimum verbosity level for the received events EventKeywords matchAnyKeyword: a keyword to filter on specific events The Microsoft Docs provides the level and keywords for each events documented in the CLR. For the complete list, you take a look at ClrETWAll.man in CoreClr source code or in ClrTraceEventParser class of TraceEvent. In the sample code at the beginning of this post, I selected a group of keywords GC_KEYWORD | GCHEAPANDTYPENAMES_KEYWORD | TYPE_KEYWORD to receive only events related to the garbage collector and type information (read this previous post for more details).\nOnce the source has been paired to the listener, each time an event is emitted by the source with the right level and for the given keywords, the OnEventWrittenoverride will get called. The EventWrittenEventArgsinstance received as a parameter describes each event.\nThe Payload contains the value of the different properties stored in a ReadOnlyCollection and the corresponding property names are provided via the ReadOnlyCollection PayLoadNames. The following code shows how to extract all properties values:\n1 2 3 4 5 6 7 8 9 10 11 12 13 // from https://blogs.msdn.microsoft.com/dotnet/2018/12/04/announcing-net-core-2-2/ // Called whenever an event is written. protected override void OnEventWritten(EventWrittenEventArgs eventData) { // Write the contents of the event to the console. Console.WriteLine($\u0026#34;ThreadID = {eventData.OSThreadId} ID = {eventData.EventId} Name = {eventData.EventName}\u0026#34;); for (int i = 0; i \u0026lt; eventData.Payload.Count; i++) { string payloadString = eventData.Payload[i] != null ? eventData.Payload[i].ToString() : string.Empty; Console.WriteLine($\u0026#34; Name = \\\u0026#34;{eventData.PayloadNames[i]}\\\u0026#34; Value = \\\u0026#34;{payloadString}\\\u0026#34;\u0026#34;); } Console.WriteLine(\u0026#34;\\n\u0026#34;); } Here is the kind of output you get for common garbage collector and finalizer events:\nThreadID = 17456 ID = 200 Name = IncreaseMemoryPressure Name = \u0026#34;BytesAllocated\u0026#34; Value = \u0026#34;1672\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; ThreadID = 17456 ID = 9 Name = GCSuspendEEBegin_V1 Name = \u0026#34;Reason\u0026#34; Value = \u0026#34;1\u0026#34; Name = \u0026#34;Count\u0026#34; Value = \u0026#34;0\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; ThreadID = 17456 ID = 8 Name = GCSuspendEEEnd_V1 Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; ThreadID = 17456 ID = 35 Name = GCTriggered Name = \u0026#34;Reason\u0026#34; Value = \u0026#34;10\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; ThreadID = 17456 ID = 1 Name = GCStart_V2 Name = \u0026#34;Count\u0026#34; Value = \u0026#34;1\u0026#34; Name = \u0026#34;Depth\u0026#34; Value = \u0026#34;2\u0026#34; Name = \u0026#34;Reason\u0026#34; Value = \u0026#34;10\u0026#34; Name = \u0026#34;Type\u0026#34; Value = \u0026#34;0\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; Name = \u0026#34;ClientSequenceNumber\u0026#34; Value = \u0026#34;0\u0026#34; ThreadID = 18860 ID = 29 Name = FinalizeObject Name = \u0026#34;TypeID\u0026#34; Value = \u0026#34;1210056592\u0026#34; Name = \u0026#34;ObjectID\u0026#34; Value = \u0026#34;1371069040\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; ThreadID = 18860 ID = 15 Name = BulkType Name = \u0026#34;Count\u0026#34; Value = \u0026#34;1\u0026#34; Name = \u0026#34;ClrInstanceID\u0026#34; Value = \u0026#34;8\u0026#34; The fact that there is no strongly typed event argument per event is not as good as what TraceEvent provides. In addition, after a few tests, it seems that the .NET Core 2.2 implementation is not complete:\nGC events are not all received when in Server Mode Properties are missing for BulkType event necessary to figure out finalizer type names However, with EventListener, Microsoft is giving us a very simple way to get valuable information, in-process, from the CLR while the application is running. A forthcoming blog post will show how to leverage this infrastructure to provide insights on how the garbage collection impacts an application.\nBefore leaving you building your own event listeners, you should know a couple of last details. Under the hood, the framework is creating a dedicated thread for you that will execute the two OnXXXmethods of your EventListener-derived class. It means that your code should not block or spend to much time processing the events if you want to keep on receiving events at a regular pace.\nThis thread will last as long as one of your listeners still exists. When I say “exist”, I mean until you decide to dispose them. This is the way for you to tell the sources that you are no more interested in receiving events. When all your listeners are disposed, then the processing thread will exit.\n","cover":"https://chrisnas.github.io/posts/2018-12-06_in-process-clr-event/1_zc1BKfAHkpvrZlHPbUvuYA.png","date":"2018-12-06","permalink":"https://chrisnas.github.io/posts/2018-12-06_in-process-clr-event/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2018-12-06_in-process-clr-event/1_zc1BKfAHkpvrZlHPbUvuYA.png\"\u003e\u003c/p\u003e\n\u003cp\u003eAs the \u003ca href=\"https://devblogs.microsoft.com/dotnet/announcing-net-core-2-2/?WT.mc_id=DT-MVP-5003325\"\u003e.NET Core 2.2 blog post\u003c/a\u003e introduced, it is now possible for a .NET Core application to listen to the events generated by the CLR that power it up. If you remember the \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e post, the CLR is emitting a lot of valuable events through ETW on Windows and LTTng on Linux. Thanks to \u003ca href=\"https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent/\"\u003eTraceEvent nuget package\u003c/a\u003e, it is not that difficult to fetch these events at runtime on Windows, either in-process or out of process. However, it is much more complicated to achieve the same goal on Linux… With .NET Core 2.2, it is now super easy to listen to the events emitted by the CLR while your application is running: you simply need to implement a class that derives from \u003ca href=\"https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventlistener?WT.mc_id=DT-MVP-5003325?view=netcore-2.2\"\u003eSystem.Diagnostics.Tracing.EventListener\u003c/a\u003e and create an instance of it. Nothing more.\u003c/p\u003e","title":"In-process CLR event listeners with .NET Core 2.2"},{"content":" Unexpected CPU consumption At Criteo, CLR metrics are collected by a service that listens to ETW events (see the related series). This metrics collector is given the process name of applications to monitor. Since applications could crash, be stopped or restarted, the metrics collector must be able to detect such an event. The previous implementation was using ETW kernel events (TraceEvent ProcessStart and ProcessStop events from ETWTraceEventSource.Kernel). However, in rare cases, it seems that a new application start was not detected and therefore the metrics were not collected for it.\nAn easy fix for this situation is to poll the list of running processes every second and detect which one is new or has left since the last time the list was polled. The implementation is straightforward: just call Process.GetProcesses() and get the process name from the Process MainModule.FileName property. After a few seconds testing this implementation on my laptop the fan started spinning: a quick look at Task Manager shows that the metrics collector is using ~10% CPU time!\nI’ve used these P/Invoked PSAPI functions 20 years ago but I don’t remember such an impact: for our monitoring service, we would like to keep the CPU impact below 1%.\nMeasure, measure… and profile This was a good opportunity to start profiling the metrics collector with dotTrace on a Friday afternoon!\nNtProcessManager.GetModuleInfos and NtProcessManager.GetProcessIds are at the methods top list of CPU consumption the worst offenders by far:\nThe callstack to reach GetModuleInfos() shows the following:\nThe GetProcessPath() method of the metrics collector is asking for the value of Process.MainModule.FileName property that ends up calling GetModuleInfos.\nAnd the callstack for the MainModule getter execution shows the following:\nThis call stack looks weird for two reason:\nSince the Process object exists, why is it needed to call OpenProcess again to get the main module? Why would OpenProcess need to call GetProcessIds (i.e. get the list of running processes) since its id is already known?! Just take a look at the decompiled source code to get the answer:\n1 2 3 4 5 6 7 8 9 10 11 12 13 public static SafeProcessHandle OpenProcess(int processId, int access, bool throwIfExited) { SafeProcessHandle safeProcessHandle = NativeMethods.OpenProcess(access, false, processId); int lastWin32Error = Marshal.GetLastWin32Error(); if (!safeProcessHandle.IsInvalid) return safeProcessHandle; // error handling if (processId == 0) throw new Win32Exception(5); if (ProcessManager.IsProcessRunning(processId)) throw new Win32Exception(lastWin32Error); ... And IsProcessRunning calls GetProcessIds:\n1 2 3 4 public static bool IsProcessRunning(int processId) { return IsProcessRunning(processId, GetProcessIds()); } It does not appear in the callstack most probably because it was inlined by the JIT.\nSo, the code of ProcessManager.OpenProcess calls the Win32 OpenProcess API to get… the handle of the process corresponding to the given id and desired access rights. From there, it is spending most of its CPU time dealing with error cases (i.e. when a process information cannot be accessed maybe due to access right limitation). We definitively don’t need all that in our case!\nWhat next? At that point of the investigation, my colleague Kevin and I went to different directions. A few decades ago, I spent a lot of time digging into Windows internals using Win32 APIs and I remember that calling PSAPI.GetModuleFilenameEx with a pid and 0 as module handle should return the path name of the process (BTW, this is also what GetModuleInfos ends up calling but more on that later). So it should not be too complicated to P/Invoke this function from PSAPI.dll.\nAt the beginning of .NET programming, the https://pinvoke.net/ web site was very useful to figure out the right syntax for a lot of APIs if you did not want to read the 1579 pages of the Complete Interoperability Guide! The description of GetModuleFileNameEx is available, and even come with a code sample.\n1 2 [DllImport(\u0026#34;psapi.dll\u0026#34;, BestFitMapping = false, CharSet = CharSet.Auto, SetLastError = true)] private static extern int GetModuleFileNameEx(SafeProcessHandle processHandle, IntPtr moduleHandle, StringBuilder baseName, int size); Note that marshaling strings should always be done with care: the Win32 API is not always consistent when a pointer to a C-like string is supposed to be filled up by a function. An additional parameter is given to state the size of the buffer in which the characters of the string will be copied. In some cases, this parameter counts the number of characters and in some others, it counts the number of bytes available in the buffer. I let you imagine what a nightmare it was when you had to deal with ANSI/UNICODE strings. In the GetModuleFileNameEx case, the size parameter takes the number of characters.\nIf you take a look at the NtProcessManager.GetModuleInfos implementation, you find the following code in the implementation:\n1 2 3 4 5 6 7 StringBuilder stringBuilder2 = new StringBuilder(1024); if (Microsoft.Win32.NativeMethods.GetModuleFileNameEx( safeProcessHandle, new HandleRef(null, handle), stringBuilder2, stringBuilder2.Capacity * 2 ) == 0) Since the capacity of the StringBuilder is set to 1024, this code tells GetModuleFileNameEx that it is allowed to write up to 2048 characters. This looks like a bug… but hard to trigger with the usual 260 characters limitation for filenames. However, if, one day, you decide to use the extended syntax with the “\\?\\” prefix syntax to create a looooong folder for your application, the bug will trigger an AccessViolationException beyond 1025 characters.\nHere is my safer implementation:\nprivate readonly StringBuilder _baseNameBuilder = new StringBuilder(1024); public static string GetProcessNameNative(Process p) { _baseNameBuilder.Clear(); if (GetModuleFileNameEx(p.SafeHandle, IntPtr.Zero, _baseNameBuilder, _baseNameBuilder.Capacity) == 0) { _baseNameBuilder.Append(\u0026#34;???\u0026#34;); } return _baseNameBuilder.ToString(); } When I presented Kevin my oldies but goodies solution, he told me that he found a smarter solution. While I was digging into my memories, he kept decompiling the implementation of the Process class and realized that it contains a processInfo private field:\nAnd its internal class exposes a public field called… processName: exactly what we needed!\nSo I was ready to implement a reflection-based solution like:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 private static Type _processInfoType = null; private static FieldInfo _processNameField = null; public static string GetProcessNameByReflection(Process p) { var processInfoField = typeof(System.Diagnostics.Process) .GetField(\u0026#34;processInfo\u0026#34;, BindingFlags.Instance | BindingFlags.NonPublic); var processInfo = processInfoField.GetValue(p); if (_processInfoType == null) { _processInfoType = processInfo.GetType(); _processNameField = _processInfoType.GetField(\u0026#34;processName\u0026#34;); } return _processNameField.GetValue(processInfo).ToString(); } And Kevin was able to give me a definitively smarter solution based on compiled expressions:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 private static Func\u0026lt;Process, string\u0026gt; GetProcessNameAccessor() { var param = Expression.Parameter(typeof(Process), \u0026#34;arg\u0026#34;); var processInfoMember = Expression.Field(param, \u0026#34;processInfo\u0026#34;); var processNameMember = Expression.Field(processInfoMember, \u0026#34;processName\u0026#34;); var lambda = Expression.Lambda(typeof(Func\u0026lt;Process, string\u0026gt;), processNameMember, param); return (Func\u0026lt;Process, string\u0026gt;)lambda.Compile(); } private static readonly Func\u0026lt;Process, string\u0026gt; _getProcessNameFunc = GetProcessNameAccessor (); public string GetProcessNameByExpression() { return _getProcessNameFunc(_process); } However, when I tested them on my laptop, I got null reference exceptions while it was working fine on Kevin’s machine… There was a tiny difference between us: I was calling Process.GetProcessById while Kevin was using Process.GetProcesses to get the Process instance. It looks like the implementation of both methods is not doing the same initialization. This kind of things happen when you are trying to use undocumented implementation details… Also note that the .NET Core implementation is different (and does not contain the string length bug).\nComparing the different solutions So which solution should I pick for our metrics collector?\nIt’s time to do some benchmarking thanks to BenchmarkDotNet and the results give different order of magnitude!\nThe winner is without contest based on expressions and the worst one is… the initial implementation that does not even fall into the error case during our tests!\nAfter updating the implementation with the compiled expression-based solution, the CPU usage of our metrics collector seems more reasonable:\nIt is now a good time to go back home… to write this article :^)\n","cover":"https://chrisnas.github.io/posts/2018-11-13_get-process-name-challenge/1_CDn3N44B8tI1cCzL3G-Qbw.png","date":"2018-11-13","permalink":"https://chrisnas.github.io/posts/2018-11-13_get-process-name-challenge/","summary":"\u003chr\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2018-11-13_get-process-name-challenge/1_CDn3N44B8tI1cCzL3G-Qbw.png\"\u003e\u003c/p\u003e\n\u003ch2 id=\"unexpected-cpu-consumption\"\u003eUnexpected CPU consumption\u003c/h2\u003e\n\u003cp\u003eAt Criteo, CLR metrics are collected by a service that listens to ETW events (\u003ca href=\"/posts/2018-09-28_monitor-finalizers-contention-threads/\"\u003esee the related series\u003c/a\u003e). This metrics collector is given the process name of applications to monitor. Since applications could crash, be stopped or restarted, the metrics collector must be able to detect such an event. The previous implementation was using ETW kernel events (TraceEvent \u003ccode\u003eProcessStart \u003c/code\u003eand \u003ccode\u003eProcessStop \u003c/code\u003eevents from \u003ccode\u003eETWTraceEventSource.Kernel\u003c/code\u003e). However, in rare cases, it seems that a new application start was not detected and therefore the metrics were not collected for it.\u003c/p\u003e","title":"[C#] Get-process-name challenge on a Friday afternoon"},{"content":"Part 1: Replace .NET performance counters by CLR event tracing.\nPart 2: Grab ETW Session, Providers and Events.\nIntroduction In the previous post, you saw how the TraceEvent nuget helps you deciphering simple ETW events such as the one emitted when a first chance exception happens. Most situations trigger more than one event and could make their processing more complicated.\nWho said Finalizer? In the early days of .NET, you might had to deal with native resources that you were responsible for cleaning up with the related unmanaged API or legacy COM component. It was a best practice to implement a ~finalizer method to ensure that everything was deleted the right way. These times are over for most of us now. If you don’t have an IntPtr field in your class, chances are that you don’t need a ~finalizer method.\nThe Microsoft documentation about IDisposable/Finalizer often leads people to implement both even though only IDisposable is needed (i.e. some fields of the class implement IDisposable). Having a large number of finalizers could impact memory consumption by having objects staying alive for a longer time and maybe even increase garbage collection total duration. Last but not least, some finalizers code outside of your code base could “block” on locks during their cleanup and… drastically slow down everything else.\nGetting the name of these types with TraceEvent is a two steps process. First, a TypeBulkType event is received: it contains a list of GCBulkTypeValues which binds a TypeID integer to a string type name:\nOnTypeBulkType.cs\n1 2 3 4 5 6 7 8 9 10 11 12 private void OnTypeBulkType(GCBulkTypeTraceData data) { if (data.ProcessID != _processId) return; // keep track of the id/name type associations for (int currentType = 0; currentType \u0026lt; data.Count; currentType++) { GCBulkTypeValues value = data.Values(currentType); _types[value.TypeID] = value.TypeName; } } This association is needed because when a finalizer is notified via the GCFinalizeObject event, the received data only contains the type ID:\nOnGCFinalizeObject.cs\n1 2 3 4 5 6 7 8 private void OnGCFinalizeObject(FinalizeObjectTraceData data) { if (data.ProcessID != _processId) return; // the type id should have been associated to a name via a previous TypeBulkType event NotifyFinalize(data.TimeStamp, data.ProcessID, data.TypeID, _types[data.TypeID]); } Note that in the two code snippets, there is an explicit check to keep the events only from the process we are interested in: as explained in part 1, this is needed for older versions of Windows.\nThread contention duration With .NET CLR LocksAndThreads “Contention Rate / sec” and “Total # of Contentions” performance counters, you can monitor how many times threads have been blocked while waiting for a lock owned by another thread. However, you don’t know for how long. The two TraceEvent ContentionStart and ContentionStop events allow you to get this crucial piece of information.\nAs their names imply and the corresponding documentation explains, these two events let you know respectively when a thread starts to wait on a lock and when the lock has been acquired. In addition to the process and thread identifiers, the ContentionTraceData event argument gives you the type of contention with its ContentionFlags property: either managed or native\nContentionTraceData.cs\n1 2 3 public sealed class ContentionTraceData : TraceEvent { public ContentionFlags ContentionFlags { get; } Since contention is a per-thread waiting operation, you need to keep track of the starting time on a per-thread basis when ContentionStart happens.\nOnContentionStart.cs\n1 2 3 4 5 6 7 8 9 private void OnContentionStart(ContentionTraceData data) { ContentionInfo info = _contentionStore.GetContentionInfo(data.ProcessID, data.ThreadID); if (info == null) return; info.TimeStamp = data.TimeStamp; info.ContentionStartRelativeMSec = data.TimeStampRelativeMSec; } The ContentionStore class keeps track of the monitored processes and assign them a ContentionInfo instance where the contention details are stored.\nNow you retrieve it back when the matching ContentionStop event occurs. The rest is just a matter of computing the time difference between the two events based on their TimeStampRelativeMSec property.\nOnContentionStop.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 private void OnContentionStop(ContentionTraceData data) { ContentionInfo info = _contentionStore.GetContentionInfo(data.ProcessID, data.ThreadID); if (info == null) return; // unlucky case when we start to listen just after the ContentionStart event if (info.ContentionStartRelativeMSec == 0) return; var contentionDurationMSec = data.TimeStampRelativeMSec - info.ContentionStartRelativeMSec; var isManaged = (data.ContentionFlags == ContentionFlags.Managed); } You are now able to detect when thread contention occurs but also if the contention duration increases over time.\nIf you want to test contention, here is the kind of code you could use:\nTestContention.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 _workers = new Task[workerCount]; for (int i = 0; i \u0026lt; workerCount; i++) { _workers[i] = Task.Run(async () =\u0026gt; { while (true) { lock (_lock) { Thread.Sleep(5); } } }); } A few tasks are created to acquire the same lock over and over and sleeping 5 milliseconds before releasing it.\nHow to count threads or monitor the ThreadPool usage? In a previous post, it was mentioned that CLR performance counters related to threads were not able to provide an accurate count of the running threads. In fact, you could use the Process/Thread Count Windows kernel performance counter to get the accurate value. If you build .NET Core applications to run on Linux, you have to find other ways such as described on stackoverflow. However, there is an easy programmatic way to get the number of running threads in an application that works both on Windows and Linux: call Process.GetProcessById().Threads.Count with its process ID.\nSince this is a series dedicated to ETW, you would expect to simply listen to a few events to get the thread count. Well… It is almost that simple. Each time a thread gets started, the AppDomainResourceManagement/ThreadCreated event is emitted with basically the ID of the created thread as payload. In order to receive the sibling AppDomainResourceManagement/ThreadTerminated event, you need to call AppDomain.MonitoringIsEnabled in the monitored application. The other ways described by the documentation did not work for me.\nIf you want to figure out if your applications are not hammering too much the .NET thread pool, the CLR provides many ETW events for you to listen that map to the following TraceEvent events:\nThe ThreadPoolWorkerThreadAdjustementAdjustment event (there is no typo in this name) provides a Reason property. If its value is 0x06, then it means Starvation: if this event frequency is ~1 per second, it could be a good indication that the ThreadPool is receiving a burst of workitems or tasks to process. In addition, the ThreadPoolWorkerThreadAdjustmentTraceData argument received by the handler also gives the count of threads via the NewWorkerThreadCount property.\nWith all these events, you should be able to monitor how the .NET ThreadPool is used in your application.\nThe next post will be entirely dedicated to garbage collection analysis.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2018-09-28_monitor-finalizers-contention-threads/0_SEfZiyEmSjrA-UGR.png","date":"2018-09-28","permalink":"https://chrisnas.github.io/posts/2018-09-28_monitor-finalizers-contention-threads/","summary":"\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2018-07-26_grab-etw-session-providers/\"\u003eGrab ETW Session, Providers and Events\u003c/a\u003e.\u003c/p\u003e\n\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eIn the previous post, you saw how the TraceEvent nuget helps you deciphering simple ETW events such as the one emitted when a first chance exception happens. Most situations trigger more than one event and could make their processing more complicated.\u003c/p\u003e\n\u003ch2 id=\"who-said-finalizer\"\u003eWho said Finalizer?\u003c/h2\u003e\n\u003cp\u003eIn the early days of .NET, you might had to deal with native resources that you were responsible for cleaning up with the related unmanaged API or legacy COM component. It was a best practice to implement a ~finalizer method to ensure that everything was deleted the right way. These times are over for most of us now. If you don’t have an \u003cstrong\u003eIntPtr\u003c/strong\u003e field in your class, chances are that you don’t need a ~finalizer method.\u003c/p\u003e","title":"Monitor Finalizers, contention and threads in your application"},{"content":"Part 1: Replace .NET performance counters by CLR event tracing.\nIn the previous post, you saw that the CLR is emitting traces that could (should?) replace the performance counters you are using to monitor your application and investigate when something goes wrong. The perfview tool that was demonstrated is built on top of the Microsoft.Diagnostics.Tracing.TraceEvent Nuget package and you should leverage it to build your own monitoring system. In addition, the Microsoft.Diagnostics.Tracing.TraceEvent.Samples Nuget package contains sample code to help you ramping up.\nManage an ETW session Create a console application and add the TraceEvent Nuget package. Your project now contains a TraceEvent.ReadMe.txt and a more detailed _TraceEventProgrammersGuide.docx Word document. You should really take the time to read the latter: it describes the architecture in great details and helps understanding what is going on under the scene.\nIn the Main entry point, add the following code to list existing ETW sessions:\nShowETWSessions.cs\n1 2 3 4 5 6 Console.WriteLine(\u0026#34;Current ETW sessions:\u0026#34;); foreach(var session in TraceEventSession.GetActiveSessionNames()) { Console.WriteLine(session); } Console.WriteLine(\u0026#34;--------------------------------------------\u0026#34;); Like logman -ets command, this piece of code might be handy during debugging session. Why? Just because when you debug your code that creates an ETW session and if you stop the debugger before disposing it, the session becomes orphan and after a while, Windows simply refuses to create new ones. In addition to easily find your orphans sessions, another good reason to give a meaningful name to your session is to be able to stop it. Type the following command line: logman -ets stop to close a running session and clean up the mess your debugging sessions might have created. This is definitively better than rebooting the machine.\nThe next step is to create a session. You get a TraceEventSession object either by attaching to an existing session or by creating a new one as shown in this code:\nMainETW.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 string sessionName = \u0026#34;EtwSessionForCLR_\u0026#34; + Guid.NewGuid().ToString(); Console.WriteLine($\u0026#34;Starting {sessionName}...\\r\\n\u0026#34;); using (TraceEventSession userSession = new TraceEventSession(sessionName, TraceEventSessionOptions.Create)) { Task.Run(() =\u0026gt; { // register handlers for events on the session source // more on this later... // decide which provider to listen to with filters if needed // process the events in a blocking call }); // wait for the user to dismiss the session Console.WriteLine(\u0026#34;Presse ENTER to exit...\u0026#34;); Console.ReadLine(); } Why is it necessary to run the code that manipulates the session in another thread with Task.Run? The call to the Process method is synchronous so it would not be possible to get the user input. When the user exits, the session is disposed and the Process method returns. If you close the session with logman, the Process method will also return. This behavior applies because the TraceEventSession.StopOnDispose property is set to true by default.\nNote that if you want to use TraceEvent to parse an .etl file, you simply need to pass the filename as an additional parameter to the TraceEventSession constructor; the rest of the code will be the same.\nThe code running in the task first registers handlers to the events as you will soon see. Next, you need to enable the providers you are interested in receiving events from. In our case, only the ClrTraceEventParser.ProviderGuid CLR provider is enabled (read https://docs.microsoft.com/en-us/dotnet/framework/performance/clr-etw-providers for more details about the two available CLR providers). In addition to the verbosity level, you should set the keywords with the ClrTraceEventParser.Keywords enumeration values corresponding to the categories of events you want to receive\nProviderAndSource.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // register handlers for events on the session source // more on this later... // decide which provider to listen to with filters if needed userSession.EnableProvider( ClrTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, (ulong)( ClrTraceEventParser.Keywords.Contention | // thread contention timing ClrTraceEventParser.Keywords.Threading | // threadpool events ClrTraceEventParser.Keywords.Exception | // get the first chance exceptions ClrTraceEventParser.Keywords.GCHeapAndTypeNames | ClrTraceEventParser.Keywords.Type | // for finalizer and exceptions type names ClrTraceEventParser.Keywords.GC // garbage collector details ); // this is a blocking call until the session is disposed userSession.Source.Process(); Console.WriteLine(\u0026#34;End of session\u0026#34;); Additional filters such as which process to monitor can be set by passing TraceEventProviderOptions to the EnableProvider() method. But wait! On a Windows 7 machine this kind of filtering is not working and you get events from all processes… No specific documentation to look at… This is where knowing what Win32 APIs are called behind the scene could help. Instead of decompiling the TraceEvent assembly, you should instead take a look at its implementation… because it is open sourced with Perfview! The Microsoft documentation for the called EnableTraceEx2 function states that there are several types of scope filters that allow filtering based on the event ID, the process ID (PID), executable filename, the app ID, and the app package name. This feature is supported on Windows 8.1,Windows Server 2012 R2, and later. If you need to filter out events based on process id, don’t worry: each event will provide it.\nListen to CLR events This class derives from TraceEventDispatcher that provides the Process method. The TraceEventSource ancestor class is where the event handlers can be registered on the following properties:\nBehind the scene, each provider emits strongly typed traces that could be difficult to parse manually: don’t worry, the TraceEvent library does the job for you through dedicated parsers exposed by TraceEventSource.\nIn case of .NET events, you usually rely on the ClrTraceEventParser that exposes via .NET event the 100+ different traces emitted by the CLR ETW provider… plus one called All just in case you want to see all of them.\nHere is the first naïve implementation to display all CLR traces as shown in the previous screenshot:\nNaiveListener.cs\n1 2 3 4 5 6 7 8 9 10 // listen to all CLR events userSession.Source.Clr.All += delegate (TraceEvent data) { // skip verbose and unneeded events if (SkipEvent(data)) return; // raw dump of the events Console.WriteLine($\u0026#34;{data.ProcessID,7} \u0026lt;{data.ProviderName}:{data.ID}\u0026gt;__[{data.OpcodeName}] {data.EventName} \u0026lt;| {data.GetType().Name}\u0026#34;); }; The SkipEvent method is here just for you to filter out traces… and this is very helpful to remove meaningless noise when you are beginning to work with CLR traces and you need to see which events are generated.\nEach trace payload is received as a generic TraceEvent object that exposes common properties:\nProcessID/ProcessName: information related to the process in which the trace has been emitted by the CLR ThreadID: numeric identifier of the thread from which the trace was sent ID: numeric identifier of the trace that helps you find the corresponding event in the Microsoft documentation (i.e. Event ID) OpcodeName: human readable name of the trace EventName: concatenation of task (= group such as “Contention” corresponding to the Keyword mentioned earlier) and OpcodeName separated by ‘/’ But… what are the events to listen to? Son, the next big step is to learn which events are interesting for you to monitor. I would suggest you read the rest of this post before going to the Microsoft documentation that describes all CLR events in details.\nLet’s start with the simplest case: one trace is emitted when an exception is thrown. Microsoft documents theExceptionThrown_V1 event with ExceptionKeyword and Warning verbosity level:\nUnfortunately, there is no ExceptionThrown_V1 event at the ClrTraceEventParser level:\nIn fact, the ID property of the received traces maps the “Event ID” of the documentation so the ExceptionStart event happens to bring the same level of information as ExceptionThrown_V1 via the ExceptionTraceData parameter passed to the handler:\nThe corresponding handler implementation is straightforward:\nExceptionStartHandler.cs\n1 2 3 4 private static void OnExceptionStart(ExceptionTraceData data) { Console.WriteLine($\u0026#34;{data.EventName} --\u0026gt; {data.ExceptionType} : {data.ExceptionMessage}\u0026#34;); } The interesting properties are ExceptionType for the name of the thrown exception and ExceptionMessage for its message. Note that if the exception contains inner exceptions, you know it by taking a look at the ExceptionFlags property but you can’t access them. Remember that you should have received earlier the trace corresponding to the inner exception when it was thrown so you are just missing the relationship between the two.\nIf for an unknown reason the number of exceptions raises for one of your applications, listening to this event is a very cheap way to know which exceptions are thrown and search for them into the source code!\nThe next episode will detail other important CLR traces you need to monitor your applications and even start an investigation.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2018-07-26_grab-etw-session-providers/0_Kp9Q3s1tAIzeLjin.png","date":"2018-07-26","permalink":"https://chrisnas.github.io/posts/2018-07-26_grab-etw-session-providers/","summary":"\u003cp\u003ePart 1: \u003ca href=\"/posts/2018-06-19_replace-net-performance-counters/\"\u003eReplace .NET performance counters by CLR event tracing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003eIn the previous post, you saw that the CLR is emitting traces that could (should?) replace the performance counters you are using to monitor your application and investigate when something goes wrong. The perfview tool that was demonstrated is built on top of the \u003ca href=\"https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent\"\u003eMicrosoft.Diagnostics.Tracing.TraceEvent Nuget package\u003c/a\u003e and you should leverage it to build your own monitoring system. In addition, the \u003ca href=\"https://www.nuget.org/packages/Microsoft.Diagnostics.Tracing.TraceEvent.Samples/\"\u003eMicrosoft.Diagnostics.Tracing.TraceEvent.Samples Nuget package\u003c/a\u003e contains sample code to help you ramping up.\u003c/p\u003e","title":"Grab ETW Session, Providers and Events"},{"content":"Introduction At Criteo, each .NET application provides custom metrics to monitor deviation and trigger alerts. This is the first line of defense against misbehaviors. The next step is to figure out what could be the cause of these deviations. After source code changes analysis, it is often needed to dig deeper into performance counters exposed by the CLR such as the following:\nAgain, these counters are used to detect possible deviations in usual patterns. For example, some applications are supposed to answer under a 50 ms threshold. When the corresponding “number of timeouts” or “request time” metrics start to increase, several reasons linked to the CLR might be partially responsible but it is hard to tell:\n# of Exceps Thrown / sec: if the number increases, it is possible that performance is impacted but how to get the list of these unusual first chance exceptions caught by the applications? Contention rate / sec: an increase might suggest that threads are spending more time waiting for locks to be released but for how many milliseconds? This information is not available among the performance counters. #Gen 2 Collections: even if this counter does vary a lot, how could we be sure that blocking gen2 collections are responsible for the lack of responsiveness: no counter exposes how many milliseconds the applications threads were frozen in case of a compacting collection. As you can see, this second line of defense is not enough to start an investigation with a clear assumption in mind.\nIn addition, some counters are not showing what you might think:\n# Gen Collections: gen0 counter is also incremented after gen1 and gen2 collection, gen1 counter is also incremented after gen2 collection. There is no counter for the exact count of per generation collection trigger even though you could compute their value. # of current logical/physical threads: it is not possible to make any link with the number of threads used by the thread pool or the TPL/Tasks. As you can see in the early versions of the Core CLR (i.e. where the performance counters code was not removed yet), the increment and decrement of counts do not seem thread safe: that could explain some issues (always less threads than expected in our monitoring boards) we faced in the past. You need to realize that performance counters are sampling-based and could show the same value that does not represent the current reality. For example, most of the GC counters will not change until the next garbage collection occurs; i.e. the % Time in GC could stay misleading (until the next GC) and this is very far from the % CPU time you are used to.\nIf you are moving to .NET Core, you will discover a worse situation: There are no more performance counters on Windows to monitor your applications And if you are targeting Linux… well…\nHowever, as the rest of the articles will demonstrate, the CLR provides even more details via strongly typed tracing through Event Tracing for Windows (ETW) and LTTng on Linux.\nCLR and Event Tracing for Windows The ETW framework has existed for a long long time and allows consumers to listen to events emitted by producers as explained in a 2007 MSDN Magazine article.\nA tracing session wraps the providers and consumers together either for real-time processing or .etl file generation. If you already know the Perfview tool or the Windows Performance Toolkit/xperf, you should be familiar with the generated .etl files.\nFor debugging purpose, listening to events during a live session with a minimal impact on production machines is even more practical. The list of documented CLR events is huge but no fear: this series of posts will focus on exceptions, finalizers, thread contention and garbage collection.\nIn addition to the documentation, I would recommend that you take a look at how the tracing is implemented in .NET Core source code for two reasons. First, you get access to the exact payload schema of all generated events (even those not documented). Second, by searching the Core CLR source code for the FireEtw-prefixed methods generated at build time, you will get a better understanding of when things are happening. Don’t forget the higher level ETW::-prefixed methods and enums that are also called by the runtime to emit traces.\nPerfview has already been mentioned earlier to help analyzing traces but it is also useful for deciphering events produced by the CLR. On a trace, double-click the Events node:\nIn the new window that pops up, select an event on the left side to get the list of occurrences on the right side:\nRight-click an event occurrence and select Dump Event to get its payload details:\nsuch as the type name of the new instance that led the GC to trigger the AllocationTick event after ~100KB was allocated.\nIn addition to the tooling available to get the traces, Microsoft is providing theMicrosoft.Diagnostics.Tracing.TraceEventNuget package. With this library, you will be able to build your own tool or listen to the CLR events from within your running applications to replace the performance counters. The next episode of the series will ramp you up with the implementation of a basic listener.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2018-06-19_replace-net-performance-counters/0_TxC5sfAh5Mfguhxn.png","date":"2018-06-19","permalink":"https://chrisnas.github.io/posts/2018-06-19_replace-net-performance-counters/","summary":"\u003ch2 id=\"introduction\"\u003eIntroduction\u003c/h2\u003e\n\u003cp\u003eAt Criteo, each .NET application provides custom metrics to monitor deviation and trigger alerts. This is the first line of defense against misbehaviors. The next step is to figure out what could be the cause of these deviations. After source code changes analysis, it is often needed to dig deeper into performance counters exposed by the CLR such as the following:\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2018-06-19_replace-net-performance-counters/0_nufe9ma4xSKydxmH.png\"\u003e\u003c/p\u003e\n\u003cp\u003eAgain, these counters are used to detect possible deviations in usual patterns. For example, some applications are supposed to answer under a 50 ms threshold. When the corresponding “number of timeouts” or “request time” metrics start to increase, several reasons linked to the CLR might be partially responsible but it is hard to tell:\u003c/p\u003e","title":"Replace .NET performance counters by CLR event tracing"},{"content":"This post of the series shows how to easily list pending tasks and work items managed by the .NET thread pool using DynaMD proxies.\nPart 1: Bootstrap ClrMD to load a dump.\nPart 2: Find duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nPart 4: Identify timers callback and other properties.\nPart 5: Use ClrMD to extend SOS in WinDBG.\nPart 6: Manipulate memory structures like real objects.\nPart 7: Manipulate nested structs using dynamic.\nPart 8: Spelunking inside the .NET Thread Pool\nIntroduction The previous post showed you how to list the pending tasks and work items from the.NET thread pool. It is now time to find out which method will be called. Last but not least, the running work items will be listed.\nUnderstanding .NET “tasks” The Task case requires more work to extract meaningful information. The m_taskScheduler field of the class keeps track of the custom scheduler, if any (we need this at Criteo). More importantly, the m_action field stores an Action instance wrapping the callback of the Task as a delegate:\nThe _target field is the implicit this parameter of instance methods that is pointed to by the _methodPtr/_methodPtrAux field.\nGetTask.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 private ThreadPoolItem GetTask(dynamic task) { ThreadPoolItem tpi = new ThreadPoolItem() { Address = (ulong)task, Type = ThreadRoot.Task }; // look for the context in m_action._target var action = task.m_action; if (action == null) { tpi.MethodName = \u0026#34; [no action]\u0026#34;; return tpi; } var target = action._target; if (target == null) { tpi.MethodName = \u0026#34; [no target]\u0026#34;; return tpi; } tpi.MethodName = BuildDelegateMethodName(target.GetClrType(), action); // get the task scheduler if any var taskScheduler = task.m_taskScheduler; if (taskScheduler != null) { var schedulerType = taskScheduler.GetClrType().ToString(); if (\u0026#34;System.Threading.Tasks.ThreadPoolTaskScheduler\u0026#34; != schedulerType) tpi.MethodName = $\u0026#34;{tpi.MethodName} [{schedulerType}]\u0026#34;; } return tpi; } The code to extract the name of the method behind the action relies on the ClrRuntime.GetMethodByAddress helper from ClrMD:\nBuildDelegateMethodName-1.cs\n1 2 3 4 5 6 internal string BuildDelegateMethodName(ClrType targetType, dynamic action) { var methodPtr = action._methodPtr; if (methodPtr != null) { ClrMethod method = _clr.GetMethodByAddress((ulong)methodPtr); In case of anonymous methods used for tasks and work items, the call to GetMethodByAddress will succeed. However, if you code old style by passing a named method (static or not), then the other _methodPtrAux field should be used:\nBuildDelegateMethodName-2.cs\n1 2 3 4 5 6 if (method == null) { // could happen in case of static method methodPtr = action._methodPtrAux; method = _clr.GetMethodByAddress((ulong)methodPtr); } The next step is to figure out the name of the method and the name of the defining type. Again, anonymous methods are treated differently:\nBuildDelegateMethodName-3.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 // anonymous method if (method.Type.Name == targetType.Name) { return $\u0026#34;{targetType.Name}.{method.Name}\u0026#34;; } else // method is implemented by an class inherited from targetType // ... or a simple delegate indirection to a static/instance method { if ( (targetType.Name == \u0026#34;System.Threading.WaitCallback\u0026#34;) || targetType.Name.StartsWith(\u0026#34;System.Action\u0026lt;\u0026#34;) ) { return $\u0026#34;{method.Type.Name}.{method.Name}\u0026#34;; } else { return $\u0026#34;({targetType.Name}){method.Type.Name}.{method.Name}\u0026#34;; } } } The last trick is that tasks and work items use different types to store the method details so the code branches based on the target type name.\nHow to decode basic thread pool items When you use ThreadPool.QueueUserWorkItem to ask the .NET thread pool to execute a callback asynchronously, the expected parameter is a WaitCallback that is stored as a QueueUserWorkItem by the thread pool:\nOnce the WaitCallback is extracted from the callback field of QueueUserWorkItem, the same call to BuildDelegateMethodName does the rest:\nGetQueueUserWorkItemCallback.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 private ThreadPoolItem GetQueueUserWorkItemCallback(dynamic element) { ThreadPoolItem tpi = new ThreadPoolItem() { Address = (ulong)element, Type = ThreadRoot.WorkItem }; // look for the callback given to ThreadPool.QueueUserWorkItem() var callback = element.callback; if (callback == null) { tpi.MethodName = \u0026#34;[no callback]\u0026#34;; return tpi; } var target = callback._target; if (target == null) { tpi.MethodName = \u0026#34;[no callback target]\u0026#34;; return tpi; } ClrType targetType = target.GetClrType(); if (targetType == null) { tpi.MethodName = $\u0026#34; [target=0x{(ulong)target}]\u0026#34;; } else { // look for method name tpi.MethodName = BuildDelegateMethodName(targetType, callback); } return tpi; } And what about running ThreadPool threads? The previous code allows you to list the pending asynchronous actions that will be run by the thread pool. Getting the list of the running work items and tasks is as easy as iterating the Threads property from ClrRuntime and checking if their IsThreadpoolWorker property is true!\nThe following code lists the thread waiting on a lock first , then the dead ones and finally the real running ones:\nForEachOrderedThread.cs\n1 2 3 foreach (var thread in _host.Session.Clr.Threads .Where(t =\u0026gt; t.IsThreadpoolWorker) .OrderBy(t =\u0026gt; (t.LockCount \u0026gt; 0) ? -1 : (!t.IsAlive ? t.ManagedThreadId + 10000 : t.ManagedThreadId))) For each worker ClrThread, you know if there was an exception (via the CurrentException property) but more important, you have access to the stack trace. The ClrThread.EnumerateStackTrace method iterate on each stack frame represented by a ClrStackFrame object. The SetThreadWaiters method in the lockingInspection.cs shows how to detect on what locking patterns threads are blocked such as Monitor.Enter, lock, Join, wait on native handles or reader/writer locks.\nLast but not least, the following code allows you to know if a thread currently processes a task or a simple work item:\nTaskOrWorkitem.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 if (frame.Method.Type.Name == \u0026#34;System.Threading.Tasks.Task\u0026#34;) { if (frame.Method.Name == \u0026#34;Execute\u0026#34;) { // the previous frame should contain the name of the method called by the task if (lastFrame != null) { // this is a task executing the method // given by lastFrame.DisplayString } break; } } else if (frame.Method.Type.Name == \u0026#34;System.Threading.ExecutionContext\u0026#34;) { if (frame.Method.Name == \u0026#34;RunInternal\u0026#34;) { // the previous frame should contain the name of the method called by QueueUserWorkItem if (lastFrame != null) { // this is a work item executing the method // given by lastFrame.DisplayString } break; } } Next step… This is the last episode of the ClrMD series but it is not the end! We have started a short series about the new version of WinDBG. The next long series will show how to take advantage of the ETW (on Windows) and LTTng (on Linux) events emitted by the CLR to monitor your live applications at close range.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-12-22_clrmd-part-9-tasks-thread-pool/QueueUserWorkItemCallback.png","date":"2017-12-22","permalink":"https://chrisnas.github.io/posts/2017-12-22_clrmd-part-9-tasks-thread-pool/","summary":"\u003cp\u003eThis post of the series shows how to easily list pending tasks and work items managed by the .NET thread pool using DynaMD proxies.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrap ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFind duplicated strings with ClrMD heap traversing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2017-05-03_clrmd-part-3-static-instance-fields/\"\u003eList timers by following static fields links\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2017-05-31_clrmd-part-4-timer-callbacks/\"\u003eIdentify timers callback and other properties\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2017-06-29_clrmd-part-5-extend-sos-windbg/\"\u003eUse ClrMD to extend SOS in WinDBG\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 6: \u003ca href=\"/posts/2017-08-01_clrmd-part-6-memory-structures/\"\u003eManipulate memory structures like real objects\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 7: \u003ca href=\"/posts/2017-08-28_clrmd-part-7-nested-structs-dynamic/\"\u003eManipulate nested structs using dynamic\u003c/a\u003e.\u003c/p\u003e","title":"ClrMD Part 9 – Deciphering Tasks and Thread Pool items"},{"content":"This post of the series shows how to easily list pending tasks and work items managed by the .NET thread pool using DynaMD proxies.\nPart 1: Bootstrap ClrMD to load a dump.\nPart 2: Find duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nPart 4: Identify timers callback and other properties.\nPart 5: Use ClrMD to extend SOS in WinDBG.\nPart 6: Manipulate memory structures like real objects.\nPart 7: Manipulate nested structs using dynamic.\nIntroduction The previous posts introduced the DynaMD nuget that helps navigating among type instances using a C#-like syntax “instance.field”. Let’s see how to use it to enumerate the asynchronous items queued in the .NET thread pool. As a bonus, the running tasks and work items won’t be forgotten.\nThreadPool internals The .NET ThreadPool is keeping track of the pending work items into two different data structures:\nA global queue: stored as a ThreadPoolWorkQueue instance referenced by the workQueue static field several per-thread (TLS) local queues: stored in SparseArray\u0026lt;ThreadPoolWorkQueue+WorkStealingQueue\u0026gt; linked from the allThreadQueues static field As you can see, the algorithm to list the pending tasks and work items starts from a static field and iterate on a linked list of QueueSegment for global queue and array of WorkStealingQueue for per thread queues. Both are storing arrays of IThreadPoolWorkItem that Task and QueueUserWorkItemCallback are implementing:\nToo much theory… Let’s write some code!\nGlobal ThreadPool queue You have seen in a previous post how to access the value of a static field per application domain:\nEnumerateGlobalThreadPoolItems-1.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 public IEnumerable\u0026lt;ThreadPoolItem\u0026gt; EnumerateGlobalThreadPoolItems() { // look for the ThreadPoolGlobals.workQueue static field ClrModule mscorlib = GetMscorlib(); if (mscorlib == null) throw new InvalidOperationException(\u0026#34;Impossible to find mscorlib.dll\u0026#34;); ClrType queueType = mscorlib.GetTypeByName(\u0026#34;System.Threading.ThreadPoolGlobals\u0026#34;); if (queueType == null) yield break; ClrStaticField workQueueField = queueType.GetStaticFieldByName(\u0026#34;workQueue\u0026#34;); if (workQueueField == null) yield break; // the CLR keeps one static instance per application domain foreach (var appDomain in _clr.AppDomains) { For an application domain in which the threadpool is not used, we need to check against null for the expected ThreadPoolWorkQueue:\nEnumerateGlobalThreadPoolItems-2.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 object workQueueValue = workQueueField.GetValue(appDomain); ulong workQueueRef = (workQueueValue == null) ? 0L : (ulong)workQueueValue; if (workQueueRef == 0) continue; // should be System.Threading.ThreadPoolWorkQueue ClrType workQueueType = _heap.GetObjectType(workQueueRef); if (workQueueType == null) continue; if (workQueueType.Name != \u0026#34;System.Threading.ThreadPoolWorkQueue\u0026#34;) continue; foreach (var item in EnumerateThreadPoolWorkQueue(workQueueRef)) { yield return item; } } } The role of the EnumerateThreadPoolWorkQueue helper method is to iterate on each QueueSegment of the linked list pointed to by the queueTail field of the per appdomain ThreadPoolWorkQueue object.\nAt the beginning of the following code, note that dynamic allows writing C# code even though the queueTail and nodes fields are not known at compile time. Even more convenient, a foreach statement is possible when the instance behind the DynaMD proxy is an array:\nEnumerateThreadPoolWorkQueue.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 private IEnumerable\u0026lt;ThreadPoolItem\u0026gt; EnumerateThreadPoolWorkQueue(ulong workQueueRef) { // start from the tail and follow the Next var proxy = _heap.GetProxy(workQueueRef); var currentQueueSegment = proxy.queueTail; while (currentQueueSegment != null) { // get the System.Threading.ThreadPoolWorkQueue+QueueSegment nodes array var nodes = currentQueueSegment.nodes; if (nodes == null) continue; foreach (var item in nodes) { if (item == null) continue; yield return GetThreadPoolItem(item); } currentQueueSegment = currentQueueSegment.Next; } } The GetThreadPoolItem helper method will be described soon but first, let’s see how to get the items from the thread local queues.\nLocal ThreadPool queues The same static field driven operations are needed to access the sparse array containing… more arrays:\nEnumerateLocalThreadPoolItems.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 public IEnumerable\u0026lt;ThreadPoolItem\u0026gt; EnumerateLocalThreadPoolItems() { var queueType = GetMscorlib().GetTypeByName(\u0026#34;System.Threading.ThreadPoolWorkQueue\u0026#34;); if (queueType == null) yield break; ClrStaticField threadQueuesField = queueType.GetStaticFieldByName(\u0026#34;allThreadQueues\u0026#34;); if (threadQueuesField == null) yield break; foreach (ClrAppDomain domain in _clr.AppDomains) { ulong? threadQueueRef = (ulong?)threadQueuesField.GetValue(domain); if (!threadQueueRef.HasValue || threadQueueRef.Value == 0) continue; var threadQueue = _heap.GetProxy((ulong)threadQueueRef); if (threadQueue == null) continue; var sparseArray = threadQueue.m_array; if (sparseArray == null) continue; foreach (var stealingQueue in sparseArray) { if (stealingQueue == null) continue; foreach (var item in EnumerateThreadPoolStealingQueue(stealingQueue)) { yield return item; } } } } The spare arrays contain either null or a stealing queue that itself contains… an array:\nEnumerateThreadPoolStealingQueue.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 private IEnumerable\u0026lt;ThreadPoolItem\u0026gt; EnumerateThreadPoolStealingQueue(dynamic stealingQueue) { var array = stealingQueue.m_array; if (array == null) yield break; foreach (var item in array) { if (item == null) continue; yield return GetThreadPoolItem(item); } } Now that we managed to retrieve the thread pool items, we can try to decipher them.\nDeciphering thread pool items A thread pool item stored in the global or in the local queues could be a Task, a QueueUserWorkItemCallback or a simple method:\nGetThreadPoolItem.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private ThreadPoolItem GetThreadPoolItem(dynamic item) { // get the ClrType directly from the dynamic proxy ClrType itemType = item.GetClrType(); if (itemType.Name == \u0026#34;System.Threading.Tasks.Task\u0026#34;) { return GetTask(item); } else if (itemType.Name == \u0026#34;System.Threading.QueueUserWorkItemCallback\u0026#34;) { return GetQueueUserWorkItemCallback(item); } else { // create a raw information ThreadPoolItem tpi = new ThreadPoolItem() { Type = ThreadRoot.Raw, Address = (ulong)item, MethodName = itemType.Name }; return tpi; } } The kind of item is computed from the ClrType of the object given by the ClrHeap.GetObjectType method. An ulong address is expected by this ClrMD method and it would be easy to get from the dynamic returned by DynaMD by just casting it to ulong. However, it is easier to simply call the GetClrType method on the dynamic proxy!\nNext step… The next and last episode of the ClrMD series will show you how to decipher tasks and thread pool items to know which of your methods will be called. As a bonus, the running tasks and work items won’t be forgotten.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-11-03_clrmd-part-8-net-thread-pool/GlobalThreadPoolQueue.png","date":"2017-11-03","permalink":"https://chrisnas.github.io/posts/2017-11-03_clrmd-part-8-net-thread-pool/","summary":"\u003cp\u003eThis post of the series shows how to easily list pending tasks and work items managed by the .NET thread pool using DynaMD proxies.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrap ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFind duplicated strings with ClrMD heap traversing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2017-05-03_clrmd-part-3-static-instance-fields/\"\u003eList timers by following static fields links\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2017-05-31_clrmd-part-4-timer-callbacks/\"\u003eIdentify timers callback and other properties\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2017-06-29_clrmd-part-5-extend-sos-windbg/\"\u003eUse ClrMD to extend SOS in WinDBG\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 6: \u003ca href=\"/posts/2017-08-01_clrmd-part-6-memory-structures/\"\u003eManipulate memory structures like real objects\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 7: \u003ca href=\"/posts/2017-08-28_clrmd-part-7-nested-structs-dynamic/\"\u003eManipulate nested structs using dynamic\u003c/a\u003e.\u003c/p\u003e","title":"ClrMD Part 8 – Spelunking inside the .NET Thread Pool"},{"content":"In the previous post of the ClrMD series, we’ve seen how to use dynamic to manipulate objects from a memory dump the same way as you would with actual objects. However, the code we wrote was limited to class instances. This time, we’re going to see how to extend it to structs. The associated code is part of the DynaMD library and is available on GitHub and nuget.\nPart 1: Bootstrap ClrMD to load a dump.\nPart 2: Find duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nPart 4: Identify timers callback and other properties.\nPart 5: Use ClrMD to extend SOS in WinDBG.\nPart 6: Manipulate memory structures like real objects.\nLet’s start with a reminder of the object we’re manipulating via our proxy:\nValues.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public struct Size { public int Width; public int Height; } public struct Description { public int Id; public Size Size; } public class Sample { public int Value; public string Name { get; } public Description Description; public Sample Child; } Accessing the value of proxy.Description.Id is a special case. Why? Because Size is a struct, and therefore is embedded directly inside of Sample outside of the responsibility of the managed heap. This scenario isn’t directly supported by ClrMD, and calling GetValue on the Size field will return null instead of the address of the struct. We need to compute this address ourselves.\nWhat is the layout of those objects in memory? To find out, we can create a Sample object and then check how the memory is structured within WinDBG\nSample.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 new Sample { Name = \u0026#34;Test\u0026#34;, Value = 1, Child = new Sample(), Description = new Description { Id = 2, Size = new Size { Width = 3, Height = 4 } } } The first address (4 bytes because the dump was taken on a 32-bit process) points to the “method table”; some metadata used internally by the CLR to identify the strong type corresponding to this instance in the managed heap. Next, we get the “1” stored by the Value field, followed by a pointer to the “Test” string stored by the Name field. The field values of the Description structure (in dark blue) are then embedded directly inside of the Sample object. And inside of that structure, we find the values of the fields of the Size structure (in light blue). Finally, the reference to the Child field is stored last in the memory of the Sample object.\nTo sum it up, to get the address of the Description field, we need to take the address of the Sample instance, add 4 bytes for the method table (8 bytes for 64-bit memory dumps), then add the offset of the field (which counts the size of the previous fields and the padding if any):\nDynamicProxy.cs\n1 2 3 4 5 6 private DynamicProxy LinkToStruct(ClrField field) { var childAddress = _address + (ulong)field.Offset + (ulong)_heap.PointerSize; return new DynamicProxy(_heap, childAddress); } We call this method from TryGetMember when we have a struct that isn’t a primitive type. ClrMD gives that information thanks to the HasSimpleValue property of ClrField:\nDynamicProxy.cs\n1 2 3 4 5 6 if (!field.HasSimpleValue) { result = LinkToStruct(field); return true; } There’s another subtlety though. As we’ve seen in the layout, the Description struct, embedded inside of the Sample class, doesn’t have a method table of its own. The consequence is that we can’t find out its type by using ClrHeap.GetObjectType. As a workaround, we add a constructor allowing us to manually set the underlying ClrMD type of the object impersonated by the proxy:\nDynamicProxy.cs\n1 2 3 4 5 private DynamicProxy(ClrHeap heap, ulong address, ClrType overrideType) : this(heap, address) { _type = overrideType; } This constructor is called when we still have the type information (taken from the ClrField):\nDynamicProxy.cs\n1 2 3 4 5 6 private DynamicProxy LinkToStruct(ClrField field) { var childAddress = _address + (ulong)field.Offset + (ulong)_heap.PointerSize; return new DynamicProxy(_heap, childAddress, field.Type); } Now we’re able to read the value of a field of a nested struct:\nProgram.cs\n1 Console.WriteLine(sample.Description.Id); Are we done now? Almost. There is one last case to handle, the Size struct that is nested inside of Description, itself a struct nested inside of a Sample instance. How is that an issue? If we use the same code to compute the address of the Size struct, then we will be adding the 4/8 bytes for the method table. Except that, as we’ve seen, the nested Size struct doesn’t have a method table! We need to add a condition to the code to know whenever we are inside of a nested struct and handle this corner case:\nDynamicProxy.cs\n1 2 3 4 5 6 7 8 9 10 11 private DynamicProxy LinkToStruct(ClrField field) { var childAddress = _address + (ulong)field.Offset; if (!_interior) { childAddress += (ulong)_heap.PointerSize; } return new DynamicProxy(_heap, childAddress, field.Type); } The Boolean flag _interior is set in the second constructor that accepts a ClrType:\nDynamicProxy.cs\n1 2 3 4 5 6 private DynamicProxy(ClrHeap heap, ulong address, ClrType overrideType) : this(heap, address) { _type = overrideType; _interior = true; } We’re finally able to read the value of the nested-nested struct:\nProgram.cs\n1 Console.WriteLine(sample.Description.Size.Width * sample.Description.Size.Height); Next time, we’ll see how to use the same mechanisms to manipulate arrays in a convenient way.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-08-28_clrmd-part-7-nested-structs-dynamic/windbg.png","date":"2017-08-28","permalink":"https://chrisnas.github.io/posts/2017-08-28_clrmd-part-7-nested-structs-dynamic/","summary":"\u003cp\u003eIn the previous post of the ClrMD series, we’ve seen how to use \u003cstrong\u003edynamic\u003c/strong\u003e to manipulate objects from a memory dump the same way as you would with actual objects. However, the code we wrote was limited to class instances. This time, we’re going to see how to extend it to structs. The associated code is part of the DynaMD library and is \u003ca href=\"https://github.com/kevingosse/DynaMD\"\u003eavailable on GitHub\u003c/a\u003e and \u003ca href=\"https://www.nuget.org/packages/DynaMD/\"\u003enuget\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrap ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e","title":"ClrMD Part 7 – Manipulate nested structs using dynamic"},{"content":"This sixth post of the ClrMD series details how to make object fields navigation simple with C# like syntax thanks to the dynamic infrastructure. The associated code is part of the DynaMD library and is available on GitHub and nuget.\nPart 1: Bootstrap ClrMD to load a dump.\nPart 2: Find duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nPart 4: Identify timers callback and other properties.\nPart 5: Use ClrMD to extend SOS in WinDBG.\nAs we’ve seen in the previous articles of the series, exploring a complex data structure using ClrMD can quickly become tedious.\nLet’s take a concrete example. Imagine we have those types declared:\nValues.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 public struct Size { public int Width; public int Height; } public struct Description { public int Id; public Size Size; } public class Sample { public int Value; public string Name { get; } public Description Description; public Sample Child; } Given the address of the Sample object in the memory dump, even with the GetFieldValue helper method to make it simpler, the code to navigate these recursive data types is still… verbose:\nProgram.cs\n1 Console.WriteLine((uint)GetFieldValue(heap, currentSampleRef, \u0026#34;Value\u0026#34;)); And now, how to get the value of the Name property?\nSame question for the inner Child fields or deeper Size field of its Description?\nWouldn’t that be great if we could navigate just like through real strongly typed instances? In short, to be able to write:\nProgram.cs\n1 2 3 4 5 6 7 var sample = heap.GetProxy(0x00001000); // Some address obtained by other ways Console.WriteLine(sample.Value); Console.WriteLine(sample.Name); Console.WriteLine(sample.Child.Name); Console.WriteLine(sample.Description.Id); Console.WriteLine(sample.Description.Size.Width * sample.Description.Size.Height); Instead of:\nProgram.cs\n1 2 3 4 Console.WriteLine(GetFieldValue(heap, sampleRef, \u0026#34;Value\u0026#34;)); Console.WriteLine(GetFieldValue(heap, sampleRef, \u0026#34;\u0026lt;Name\u0026gt;k__BackingField\u0026#34;)); Console.WriteLine(GetFieldValue(heap, GetFieldValue(heap, sampleRef, \u0026#34;Child\u0026#34;), \u0026#34;Name\u0026#34;)); // and so on... The first issue is: what is the GetProxy method going to return? Since we don’t know at compilation time the properties of the object the code is going to manipulate, we need a way to support some kind of late-binding. Fortunately, this scenario is supported in C# through the usage of the dynamic keyword. As you will see in the rest of this post, this is not only a keyword but also an extensible mechanism that perfectly fits our need to define fields at runtime instead of compile time.\nWe start by creating a class inheriting from System.Dynamic.DynamicObject.\nDynamicProxy.cs\n1 internal class DynamicProxy : DynamicObject This base class provides all the facilities needed for late-binding:\nAs you will see, only a few of these virtual methods need to be overridden to support our scenario.\nTo construct our proxy, we need two parameters: the ClrMD ClrHeap object, that allows us to browse the objects in the memory, and the address of the object we want to impersonate.\nDynamicProxy.cs\n1 2 3 4 5 public DynamicProxy(ClrHeap heap, ulong address) { _heap = heap; _address = address; } We also provide an extension method for convenience:\nExtensions.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 namespace Microsoft.Diagnostics.Runtime { public static class Extensions { public static dynamic GetProxy(this ClrHeap heap, ulong address) { if (address == 0) { return null; } return new DynamicProxy(heap, address); } } } The next step is to override the virtual TryGetMember method, inherited from DynamicObject. It is automatically invoked whenever somebody tries to access a any member of the dynamic object, including its fields.\nDynamicProxy.cs\n1 public override bool TryGetMember(GetMemberBinder binder, out object result) The Name property of the binder parameter provides the name of the accessed member and we are supposed to return the corresponding proxy object as the out result parameter.\nWe’re going to need the type of the object. For convenience, we store it in a property:\nDynamicProxy.cs\n1 2 3 4 5 6 7 8 9 10 11 12 protected ClrType Type { get { if (_type == null) { _type = _heap.GetObjectType(_address); } return _type; } } Using the binder.Name property containing the name of the field we’re trying to access, we retrieve the ClrMD field description:\nDynamicProxy.cs\n1 var field = Type.GetFieldByName(binder.Name); From there, we get the value marshalled by ClrMD and assign it to the result out parameter:\nDynamicProxy.cs\n1 result = field.GetValue(_address); Finally, we signal that we managed to bind the invoked member:\nDynamicProxy.cs\n1 return true; This is just a handful of lines of code, but it’s enough for the simple cases where field values are primitive types.. This covers the “Value” field for our Sample type. For the auto-property “Name”, that’s trickier, because the name of the underlying field has characters that are forbidden in C#: “k__BackingField”. If we write this, it won’t compile:\nProgram.cs\n1 Console.WriteLine(proxy.\u0026lt;Name\u0026gt;k__BackingField); We can handle this case by guessing the name of the compiler-generated field, then accessing it:\nDynamicProxy.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 var field = Type.GetFieldByName(binder.Name); if (field == null) { // The field wasn\u0026#39;t found, it could be an autoproperty field = Type.GetFieldByName($\u0026#34;\u0026lt;{binder.Name}\u0026gt;k__BackingField\u0026#34;); if (field == null) { // Still not found throw new InvalidOperationException(\u0026#34;Field not found: \u0026#34; + binder.Name); } } Thanks to this trick, we can write:\nProgram.cs\n1 Console.WriteLine(proxy.Name); Great! The next challenge is to transparently manipulate the “Child” field as a reference to another “Sample” object. To achieve this goal, the field could simply return another DynamicProxy object that we can manipulate the same way as its parent.\nFirst, we need a helper to find out whether a value is a reference or not:\nDynamicProxy.cs\n1 2 3 4 private static bool IsReference(object result, ClrType type) { return !(result is string) \u0026amp;\u0026amp; type.IsObjectReference; } We treat string as a special case, because ClrMD gives us the marshaled string rather than a reference like for all other types. That’s how we were able to retrieve the value of the Name field previously.\nNow we call the helper and return a new proxy whenever we’re dealing with a reference:\nDynamicProxy.cs\n1 2 3 4 if (IsReference(result, field.Type)) { result = new DynamicProxy(_heap, (ulong)result); } We can now write:\nProgram.cs\n1 Console.WriteLine(proxy.Child.Name); That’s it for accessing a referenced object allocated on the heap. However, this won’t work for accessing an embedded struct such as proxy.Description.Id. We’ll see in the next part how to handle this specific case.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-08-01_clrmd-part-6-memory-structures/clrmd.png","date":"2017-08-01","permalink":"https://chrisnas.github.io/posts/2017-08-01_clrmd-part-6-memory-structures/","summary":"\u003cp\u003eThis sixth post of the ClrMD series details how to make object fields navigation simple with C# like syntax thanks to the \u003cstrong\u003edynamic\u003c/strong\u003e infrastructure. The associated code is part of the DynaMD library and is \u003ca href=\"https://github.com/kevingosse/DynaMD\"\u003eavailable on GitHub\u003c/a\u003e and \u003ca href=\"https://www.nuget.org/packages/DynaMD/\"\u003enuget\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrap ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFind duplicated strings with ClrMD heap traversing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2017-05-03_clrmd-part-3-static-instance-fields/\"\u003eList timers by following static fields links\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2017-05-31_clrmd-part-4-timer-callbacks/\"\u003eIdentify timers callback and other properties\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 5: \u003ca href=\"/posts/2017-06-29_clrmd-part-5-extend-sos-windbg/\"\u003eUse ClrMD to extend SOS in WinDBG\u003c/a\u003e.\u003c/p\u003e","title":"ClrMD Part 6 - Manipulate memory structures like real objects"},{"content":"This fifth post of the ClrMD series shows how to leverage this API inside a WinDBG extension. The associated code allows you to translate a task state into a human readable value.\nPart 1: Bootstrap ClrMD to load a dump.\nPart 2: Find duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nPart 4: Identify timers callback and other properties.\nIntroduction Since the beginning of this series, you have seen how to use ClrMD to write your own tool to extract meaningful information from a dump file (or a live process). However, most of the time, you are also using WinDBG and SOS to navigate inside the .NET data structures.\nIt would be convenient if you could leverage the new .NET exploration features based on ClrMD the same way you are using SOS. This post will explain how to achieve this goal by implementing an extension that exports commands callable from within WinDBG.\nDeciphering a Task status During one of our debugging investigations, we needed to get the value of the Status property for a few Task instances. If you take a look at the implementation of the property getter in a decompiler (or from source code), you will see that it is computed based on the value of the internal m_stateFlags field.\nIn WinDBG, the !DumpHeap -stat command lists all types with their instance count. If the .prefer_dml 1 command has been set, you even get hyperlinks on some values such as the address or MT (for MethodTable). If you click the MT value for System.Threading.Tasks.Task, you get all instances of type Task:\nClick any address and look at the value of the m_stateFlags field:\nIt is easy to automate the retrieval of the m_stateFlags instance field value with ClrMD as explained earlier:\nGetTaskStateFromAddress.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 private static ulong GetTaskStateFromAddress(ulong address) { var type = Runtime.GetHeap().GetObjectType(address); if ((type != null) \u0026amp;\u0026amp; (type.Name.StartsWith(\u0026#34;System.Threading.Task\u0026#34;))) { // try to get the m_stateFlags field value ClrInstanceField field = type.GetFieldByName(\u0026#34;m_stateFlags\u0026#34;); if (field != null) { var val = field.GetValue(address); if (val != null) { try { return (ulong)(int)val; } catch (InvalidCastException) { } } } } return 0; } The ClrType corresponding to the address is first checked to ensure that it represents a Task instance. Next, its GetFieldByname helper method returns a ClrInstanceField that provides the status via its GetValue function.\nThe next step is to transform this number into a TaskStatus enumeration value by simply using a decompiler and copying the logic from the Task getter code:\nGetTaskState.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 private static string GetTaskState(ulong flag) { TaskStatus rval; if ((flag \u0026amp; TASK_STATE_FAULTED) != 0) rval = TaskStatus.Faulted; else if ((flag \u0026amp; TASK_STATE_CANCELED) != 0) rval = TaskStatus.Canceled; else if ((flag \u0026amp; TASK_STATE_RAN_TO_COMPLETION) != 0) rval = TaskStatus.RanToCompletion; else if ((flag \u0026amp; TASK_STATE_WAITING_ON_CHILDREN) != 0) rval = TaskStatus.WaitingForChildrenToComplete; else if ((flag \u0026amp; TASK_STATE_DELEGATE_INVOKED) != 0) rval = TaskStatus.Running; else if ((flag \u0026amp; TASK_STATE_STARTED) != 0) rval = TaskStatus.WaitingToRun; else if ((flag \u0026amp; TASK_STATE_WAITINGFORACTIVATION) != 0) rval = TaskStatus.WaitingForActivation; else if (flag == 0) rval = TaskStatus.Created; else return null; return rval.ToString(); } It would be a time saver if this translation could be done by a command right inside WinDBG instead of relying on another tool based on ClrMD in which addresses are pasted.\nWinDBG extension 101 In addition of being a native Windows debugger, WinDBG supports extensions: .dll files that you load with the .load command. They are exporting commands that are callable from within WinDBG with the “!” prefix. These commands are usual native exports that can be seen with tools such as http://www.dependencywalker.com/ as shown by the next screenshot:\nAs you can see, all SOS commands are functions exported by the sos.dll native binary. Before digging into the extension functions implementation, notice that a few other functions could also be exported. Among them, the DebugExtensionInitialize function provides version information (i.e. which version of the debugging API is expected) and must be exported to be called by WinDBG when the dll is loaded.\nRead this post for more details about how to develop a native WinDBG extension.\nAll extension command functions take two parameters:\nAn IDebugClient instance to interact with WinDBG An ANSI string for the arguments (such as “-stat” for !dumpheap) The bridge between your extension commands and WinDBG is provided by the IDebugClient COM interface. But don’t be scared: no need to manually deal with native COM interface with ClrMD! The DataTarget**.**CreateFromDebuggerInterface method takes an IDebugClient interface and returns an instance of DataTarget. As you might remember from the initial post of this series, DataTarget is the gateway to the dump (or live-debugged attached process): we are now back to the known ClrMD world.\nReuse ClrMD Samples Hopefully, most of the glue to bind the native world to ClrMD is already available! You simply reuse the partial DebuggerExtensions class given in the samples.\nYou extend the class with your extension methods that take the following signature:\nMyCommandSignature.cs\n1 public static void MyCommand(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) The first parameter is a pointer to the IDebugClient interface provided by WinDBG. The first thing to do in your extension command method is to call the InitApi static method with the interface pointer and let the magic happens.\nMyCommand-2.cs\n1 2 3 // Must be the first thing in our extension. if (!InitApi(client)) return; After that call, the output of the Console will be redirected to WinDBG and your code is free to use the following properties to access the dump via ClrMD:\nDebuggerExtensionPartial.cs\n1 2 3 4 5 6 7 public partial class DebuggerExtensions { public static IDebugClient DebugClient { get; private set; } public static DataTarget DataTarget { get; private set; } public static ClrRuntime Runtime { get; private set; } The second parameter args received by your method is a string that contains the parameters added by the user after the name of your command. For example, if the user types “MyCommand param1 param2”, the args parameter will be “param1 param2”.\nExposing native functions The last part of magic glue is how to export a native function from a .NET assembly. This is made possible by the UnmanagedExports nuget package by Robert Giesecke.\nOnce added to your project, decorate the functions to export with the DllExport attribute and the native name of the function that will be visible in WinDBG as a command.\nThere is a little trick here: the names of exported functions are case sensitive for WinDBG. If you take a look again at sos.dll in Dependency Walker and sort exports by Function column, you will notice a few duplicates such as CLRStack/ ClrStack/ clrstack as shown in the following screenshot:\nFor usability sake, it is a good practice to provide several syntaxes for the same command, including short version such as !dso for !DumpStackObjectin SOS. Unfortunately the DllExport attribute does not allow multiple applications on the same method with different exported names. You need to define a different method per exported name and all of them will call the same internal helper method.\nMultipleDllExport.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [DllExport(\u0026#34;tks\u0026#34;)] public static void tks(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { OnTkState(client, args); } [DllExport(\u0026#34;tkstate\u0026#34;)] public static void tkstate(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { OnTkState (client, args); } [DllExport(\u0026#34;tkState\u0026#34;)] public static void tkState(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { OnTkState (client, args); } public static void OnTkState (IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { // Must be the first thing in our extension. if (!InitApi(client)) return; ... } Thanks to the GetTaskStateFromAddress and GetTaskState helper methods described earlier, the implementation of the OnTkState method is straightforward once the address or the value has been extracted from the args parameter.\nDon’t forget your user: implement help A good extension always provides an help command that (1) lists the available commands with shortcuts and (2) additional details on each command. Simply add a new file that defines the exports for help/Help and parses the string argument if needed.\nDebuggerExtensionImpl.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 public partial class DebuggerExtensions { [DllExport(\u0026#34;Help\u0026#34;)] public static void Help(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { OnHelp(client, args); } [DllExport(\u0026#34;help\u0026#34;)] public static void help(IntPtr client, [MarshalAs(UnmanagedType.LPStr)] string args) { OnHelp(client, args); } const string _help = \u0026#34;...\u0026#34;; const string _tksHelp = \u0026#34;...\u0026#34;; private static void OnHelp(IntPtr client, string args) { // Must be the first thing in our extension. if (!InitApi(client)) return; string command = args; if (args != null) command = args.ToLower(); switch (command) { case \u0026#34;tks\u0026#34;: case \u0026#34;tkstate\u0026#34;: Console.WriteLine(_tksHelp); break; default: Console.WriteLine(_help); break; } } } Tips to use the extension Don’t forget that you might need two versions of your assembly: one for the x86 version of WinDBG if your applications are 32 bit and one for the x64 version of WinDBG in the 64 bit case. If you want to be able to easily load your extension with the .load command, copy it with Microsoft.Diagnostics.Runtime.dll (i.e. ClrMD assembly) to the winext subfolder of x64/x86 WinDBG folders:\nBefore being able to use any of its commands, you must load SOS with the well-known .loadby sos clr mantra. But this is not enough: you also have to run at least one SOS command. You are now ready to call any of your extension commands!\nNext step… The next episodes will bring you into the mysteries under the dynamic keyword and how to simplify the syntax to leverage ClrMD.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-06-29_clrmd-part-5-extend-sos-windbg/DWwithSOS.png","date":"2017-06-29","permalink":"https://chrisnas.github.io/posts/2017-06-29_clrmd-part-5-extend-sos-windbg/","summary":"\u003cp\u003eThis fifth post of the ClrMD series shows how to leverage this API inside a WinDBG extension. The \u003ca href=\"https://github.com/criteo/criteo-dotnet-blog/tree/master/ClrMD-Part5_WinDBG-Extension\"\u003eassociated code\u003c/a\u003e allows you to translate a task state into a human readable value.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrap ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFind duplicated strings with ClrMD heap traversing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2017-05-03_clrmd-part-3-static-instance-fields/\"\u003eList timers by following static fields links\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 4: \u003ca href=\"/posts/2017-05-31_clrmd-part-4-timer-callbacks/\"\u003eIdentify timers callback and other properties\u003c/a\u003e.\u003c/p\u003e\n\u003ch3 id=\"introduction\"\u003eIntroduction\u003c/h3\u003e\n\u003cp\u003eSince the beginning of this series, you have seen how to use ClrMD to write your own tool to extract meaningful information from a dump file (or a live process). However, most of the time, you are also using WinDBG and SOS to navigate inside the .NET data structures.\u003c/p\u003e","title":"ClrMD Part 5 – How to use ClrMD to extend SOS in WinDBG"},{"content":"This fourth post of the ClrMD series digs into the details of figuring out which method gets called when a timer triggers. The associated code lists all timers in a dump.\nPart 1: Bootstrapping ClrMD to load a dump.\nPart 2: Finding duplicated strings with ClrMD heap traversing.\nPart 3: List timers by following static fields links.\nLooking at my timer In the previous post, we explained how to access a static field of TimerQueue to start iterating the list of TimerQueueTimer wrapping the created timers. Now that the currentPointer variable contains the address of each TimerQueueTimer, it is time to extract the details of the timer we have created.\nThe following code extracts the value of the TimerQueueTimer fields corresponding to each Timer thanks to the GetFieldValue helper presented in the previous post:\nTimerQueueTimerFields.cs\n1 2 3 4 5 6 7 8 var val = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_dueTime\u0026#34;); ti.DueTime = (uint)val; val = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_period\u0026#34;); ti.Period = (uint)val; val = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_canceled\u0026#34;); ti.Cancelled = (bool)val; Note that the value for m_dueTime is always the same as the value of m_period. This is not a bug but it seems that .NET is only keeping the due time during construction but use the corresponding field for other purpose after.\nThe m_state field case is a little bit more complicated to decipher because the type of the object passed to the timer needs to be figured out in addition to its address, if the latter is not null:\nTimerQueueTimerState.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 val = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_state\u0026#34;); ti.StateTypeName = \u0026#34;\u0026#34;; if (val == null) { ti.StateAddress = 0; } else { ti.StateAddress = (ulong)val; var stateType = heap.GetObjectType(ti.StateAddress); if (stateType != null) { ti.StateTypeName = stateType.Name; } } As usual with ClrMD, you need to get the ClrType corresponding to the object referenced by an address before being able to access its fields or to get its name. However, instead of looking into a module as it has been done for TimerQueue, it is easier and more efficient to call the GetObjectType from ClrHeap. Remember that the mandatory test against a null value for the ClrType might seem overkill but the ClrMD implementation states that it is possible that the internal CLR state could be corrupted.\nWhat is the timer callback? The last piece of information to retrieve is the callback the timer will call when it triggers. The _timerCallback field references a TimerCallback instance that stores these details.\nGetTimerCallBackDetails.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 // decypher the callback details val = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_timerCallback\u0026#34;); if (val != null) { ulong elementAddress = (ulong)val; if (elementAddress == 0) continue; var elementType = _heap.GetObjectType(elementAddress); if (elementType != null) { if (elementType.Name == \u0026#34;System.Threading.TimerCallback\u0026#34;) { ti.MethodName = BuildTimerCallbackMethodName(runtime, elementAddress); } else { ti.MethodName = \u0026#34;\u0026lt;\u0026#34; + elementType.Name + \u0026#34;\u0026gt;\u0026#34;; } } else { ti.MethodName = \u0026#34;{no callback type?}\u0026#34;; } } else { ti.MethodName = \u0026#34;???\u0026#34;; } yield return ti; currentPointer = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_next\u0026#34;); But how to get the name of the method just with the address of a TimerCallback object? Again, open up your favorite decompiler and look at the type hierarchy:\nHere are the two fields of the Delegate type that are interesting:\nThe _methodPtr field stores the pointer to the method. By chance, the ClrRuntime GetMethodByAddress method takes this address and returns the name of the method!\nIf this method is static, the _target fields is null. Otherwise, it stores the value of this, the hidden parameter received by all instance methods. In case of type inheritance, it is interesting to know which override will be called. All these steps are wrapped in the following helper function:\nBuildTimerCallbackMethodName.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 private string BuildTimerCallbackMethodName(ClrRuntime runtime, ulong timerCallbackRef) { var heap = runtime.GetHeap(); var methodPtr = GetFieldValue(heap, timerCallbackRef, \u0026#34;_methodPtr\u0026#34;); if (methodPtr != null) { ClrMethod method = runtime.GetMethodByAddress((ulong)(long)methodPtr); if (method != null) { // figure out the real callback implementor type thanks to _target string thisTypeName = \u0026#34;?\u0026#34;; var thisPtr = GetFieldValue(heap, timerCallbackRef, \u0026#34;_target\u0026#34;); if ((thisPtr != null) \u0026amp;\u0026amp; ((ulong) thisPtr) != 0) { ulong thisRef = (ulong) thisPtr; var thisType = heap.GetObjectType(thisRef); if (thisType != null) { thisTypeName = thisType.Name; } } else { thisTypeName = (method.Type != null) ? method.Type.Name : \u0026#34;?\u0026#34;; } return string.Format(\u0026#34;{0}.{1}\u0026#34;, thisTypeName, method.Name); } } return string.Empty; } Building a usable summary Even though the EnumerateTimers helper provides a way to list all timers, you often don’t want to show them all; especially when thousands exist and most of them are duplicates. The sample code associated to this post lists the different timers, count the duplicates and sort the result by duplicate count as shown in the following screenshot:\nNext step… After timers, the next post will show how to integrate your ClrMD-based code into an extension for WinDBG to help decyphering Task state.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-05-31_clrmd-part-4-timer-callbacks/DelegateClass.png","date":"2017-05-31","permalink":"https://chrisnas.github.io/posts/2017-05-31_clrmd-part-4-timer-callbacks/","summary":"\u003cp\u003eThis fourth post of the ClrMD series digs into the details of figuring out which method gets called when a timer triggers. The \u003ca href=\"https://github.com/criteo/criteo-dotnet-blog/tree/master/ClrMD-Parts3%2B4_Timers\"\u003eassociated code\u003c/a\u003e lists all timers in a dump.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrapping ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFinding duplicated strings with ClrMD heap traversing\u003c/a\u003e.\u003c/p\u003e\n\u003cp\u003ePart 3: \u003ca href=\"/posts/2017-05-03_clrmd-part-3-static-instance-fields/\"\u003eList timers by following static fields links\u003c/a\u003e.\u003c/p\u003e\n\u003ch3 id=\"looking-at-my-timer\"\u003eLooking at my timer\u003c/h3\u003e\n\u003cp\u003eIn the previous post, we explained how to access a static field of \u003cstrong\u003eTimerQueue\u003c/strong\u003e to start iterating the list of \u003cstrong\u003eTimerQueueTimer\u003c/strong\u003e wrapping the created timers. Now that the \u003cstrong\u003ecurrentPointer\u003c/strong\u003e variable contains the address of each \u003cstrong\u003eTimerQueueTimer\u003c/strong\u003e, it is time to extract the details of the timer we have created.\u003c/p\u003e","title":"ClrMD Part 4 – What callbacks are called by my timers?"},{"content":"This third post of the ClrMD series focuses on how to retrieve value of static and instance fields by taking timers as an example. The next post will dig into the details of figuring out which method gets called when a timer triggers. As an example, the associated code lists all timers in a dump and covers both articles.\nPart 1: Bootstrapping ClrMD\nPost 2: Finding duplicated strings with ClrMD\nMarshaling data from a dump Beyond heap navigation shown in the previous post, the big thing to understand about ClrMD is that the retrieved information is often an address. An address from another address space because the dump is seen as another process just like if you were debugging it live. Your code will need to access the other process memory corresponding to this address; not directly with a pointer/reference indirection or with the raw Win32 ReadProcessMemory API function but via APIs like GetObjectType or GetValue.\nTo illustrate how to navigate into the dump address space with ClrMD, we will show how to list the timers that have been started. This can be useful to investigate various issues, such as leaks or timers being stuck.\nKnow your framework A naive implementation, like the string example of the previous post, would try to list all object instances in the CLR heap and look at Timer instances only. However, as it has been mentioned already, this is very inefficient in terms of performance; especially for 10+ GB dumps…\nIt is time to figure out what happens in the .NET runtime when your code creates a new timer. If the source code of the version of the CLR you are using is not available, start your favorite IL decompiler and look at the System.Threading.Timer implementation details. The parameters given to the constructors (such as the due time, period, and callback method, in addition to its optional parameter if any) are not stored in the class itself but in the TimerQueueTimer helper class.\nThe Timer constructor code, after a few sanity checks, calls the TimerSetup method to wrap a TimerQueueTimer in a TimerHolder that is stored in the Timer m_timer field.\nThis is where things start to become interesting: this TimerQueueTimer class adds each new instance into a linked list kept by a singleton object stored in the static s_queue field of the TimerQueue class. The following figure shows the relation between instances after three timers are created:\nSo… a fast way to list the timers would be to get the unique static instance of TimerQueue, look at its m_timers field and iterate on each TimerQueueTimer by following their m_next field until it contains null. The rest of the post details the following operations with ClrMD:\nquickly getting a ClrType reading a static field reading an instance field to fill up a collection of our own TimerInfo used to easily create a summary:\nTimerInfo.cs\n1 2 3 4 5 6 7 8 9 10 11 public class TimerInfo { public ulong TimerQueueTimerAddress { get; set; } public uint DueTime { get; set; } public uint Period { get; set; } public bool Cancelled { get; set; } public ulong StateAddress { get; set; } public string StateTypeName { get; set; } public ulong ThisAddress { get; set; } public string MethodName { get; set; } } This is wrapped inside a helper method described in the next few sections:\nEnumerateTimers-1\n1 2 3 4 5 public IEnumerable\u0026lt;TimerInfo\u0026gt; EnumerateTimers(ClrRuntime runtime) { ClrHeap heap = runtime.GetHeap(); if (!heap.CanWalkHeap) yield break; As explained in the previous post, you need to ensure that the process was not in the middle of a garbage collection when the dump was taken by checking the value of the ClrHeap.CanWalkHeap property.\nStanding on the shoulders of giants I have found the different steps to get access to the static fields of classes in the ClrMD implementation from GitHub. In addition, I highly recommend that you take a look at the samples.\nLet’s go back to our first goal: getting the value of the static s_queue field of the TimerQueue class. One of the very efficient optimization found in the ClrMD implementation is to directly get a ClrType from a module and call its GetTypeByName method instead of iterating the heap until an instance of the type is found. In our case, we need to access TimerQueue which is a type from mscorlib. Here is the code of the helper function from Desktop\\threadpool.cs to get a ClrModule for mscorlib:\nGetMscorlib.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 private ClrModule GetMscorlib(ClrRuntime runtime) { foreach (ClrModule module in runtime.Modules) if (module.AssemblyName.Contains(\u0026#34;mscorlib.dll\u0026#34;)) return module; // Uh oh, this shouldn\u0026#39;t have happened. Let\u0026#39;s look more carefully (slowly). foreach (ClrModule module in runtime.Modules) if (module.AssemblyName.ToLower().Contains(\u0026#34;mscorlib\u0026#34;)) return module; // Ok...not sure why we couldn\u0026#39;t find it. return null; } The following line sets timerQueueType with the ClrType corresponding to TimerQueue:\nEnumerateTimers-2.cs\n1 var timerQueueType = GetMscorlib(runtime).GetTypeByName(\u0026#34;System.Threading.TimerQueue\u0026#34;); Next, get the ClrStaticField corresponding to the static field s_queue:\nEnumerateTimers-3.cs\n1 ClrStaticField staticField = timerQueueType.GetStaticFieldByName(\u0026#34;s_queue\u0026#34;); The staticField variable is not the static instance but rather a way to access it… or them.\nBut where are my statics! Let’s take some time to explain a “detail” of the .NET Framework to help you understand how to get the static TimerQueue instance. Unlike previous Windows frameworks, .NET allows a process to contain several running environments called application domains (a.k.a. AppDomains). For a better isolation, each AppDomain has its own set of static variables: this is why you need to iterate on each AppDomain with ClrMD to access the static instances:\nEnumerateTimers-4.cs\n1 2 3 4 5 foreach (ClrAppDomain domain in runtime.AppDomains) { ulong? timerQueue = (ulong?)staticField.GetValue(domain); if (!timerQueue.HasValue || timerQueue.Value == 0) continue; The address returned by ClrStaticField.GetValue is nullable because, in an AppDomain where no TimerQueue has ever been used, its fields won’t be initialized.\nWe don’t really need to map this address from the dump address space into something usable in the tool. Instead, only the value of the m_timers field is interesting to be able to start iterating on the list of timers.\nHow to get the values of instance fields? Now that we have an address in the dump and the ClrType describing the type of the corresponding object (TimerQueue here), it is easy to retrieve the value of one of its instance fields. Since this action is needed again and again to move from one TimerQueueTimer object to the next, it is valuable to create a helper method:\nGetFieldValue.cs\n1 2 3 4 5 6 7 8 9 private object GetFieldValue(ClrHeap heap, ulong address, string fieldName) { var type = heap.GetObjectType(address); ClrInstanceField field = type.GetFieldByName(fieldName); if (field == null) return null; return field.GetValue(address); } The address of the object in the dump is used to get its ClrType. The ClrInstanceField (instead of a ClrStaticField as for the s_queue case) describing the property exposes the expected GetValue method. Note that the return value of GetValue is defined as System.Object but you should understand it as the numeric value stored in the dump (or the other process address space) at the given address. For the simple value types such as boolean, number and even ulong address, a cast will be enough to transparently marshal the value into the tool with ClrMD.\nLet’s go back to writing the code to access to head of the TimerQueueTimer list from the TimerQueue static instance:\nEnumerateTimers-5.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 // m_timers is the start of the list of TimerQueueTimer var currentPointer = GetFieldValue(heap, timerQueue.Value, \u0026#34;m_timers\u0026#34;); while ((currentPointer != null) \u0026amp;\u0026amp; (((ulong)currentPointer) != 0)) { // currentPointer points to a TimerQueueTimer instance ulong currentTimerQueueTimerRef = (ulong)currentPointer; TimerInfo ti = new TimerInfo() { TimerQueueTimerAddress = currentTimerQueueTimerRef }; ... currentPointer = GetFieldValue(heap, currentTimerQueueTimerRef, \u0026#34;m_next\u0026#34;); } currentPointer holds the address of each TimerQueueTimer in the list kept by the static TimerQueue.\nNote the ((ulong)currentPointer) != 0) test in the while loop to detect the end of the list when the m_next field is null.\nNext step… After enumerating each timer, the next post will show how to extract details such as the due time, the period, and even which method is called when it ticks.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-05-03_clrmd-part-3-static-instance-fields/TimerClassDependencies.png","date":"2017-05-03","permalink":"https://chrisnas.github.io/posts/2017-05-03_clrmd-part-3-static-instance-fields/","summary":"\u003cp\u003eThis third post of the ClrMD series focuses on how to retrieve value of static and instance fields by taking timers as an example. The next post will dig into the details of figuring out which method gets called when a timer triggers. As an example, the \u003ca href=\"https://github.com/criteo/criteo-dotnet-blog/tree/master/ClrMD-Parts3%2B4_Timers\"\u003eassociated code\u003c/a\u003e lists all timers in a dump and covers both articles.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrapping ClrMD\u003c/a\u003e\u003c/p\u003e\n\u003cp\u003ePost 2: \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003eFinding duplicated strings with ClrMD\u003c/a\u003e\u003c/p\u003e\n\u003ch3 id=\"marshaling-data-from-a-dump\"\u003eMarshaling data from a dump\u003c/h3\u003e\n\u003cp\u003eBeyond heap navigation shown in \u003ca href=\"/posts/2017-03-24_clrmd-part-2-from-clrruntime/\"\u003ethe previous post\u003c/a\u003e, the big thing to understand about ClrMD is that the retrieved information is often an \u003cstrong\u003eaddress\u003c/strong\u003e. An address from another address space because the dump is seen as another process just like if you were debugging it live. Your code will need to access the other process memory corresponding to this address; not directly with a pointer/reference indirection or with the raw Win32 \u003ca href=\"https://msdn.microsoft.com/en-us/library/windows/desktop/ms680553.aspx\"\u003eReadProcessMemory\u003c/a\u003e API function but via APIs like \u003cstrong\u003eGetObjectType\u003c/strong\u003e or \u003cstrong\u003eGetValue\u003c/strong\u003e.\u003c/p\u003e","title":"ClrMD Part 3 - Dealing with static and instance fields to list timers"},{"content":"When you see this, you know for sure that something is wrong with a server:\nThis chart counts the number of first-chance exceptions thrown by the server. We have here an average of 840K exceptions thrown per minute, or 14K exceptions per second. That’s a lot, especially considering that this server only processes about 400 requests per second. Impossible to find anything meaningful or even related in the logs, and the server seems to respond properly to requests: head scratching situation for sure.\nThe ratio between processed requests and exceptions thrown does not make any sense. It would be great if we could see what are these unbelievable exceptions with our own eyes. Thankfully, the procdump tool from Sysinternals lists the first chance exceptions with the following magical command line:\n1 procdump -ma -e 1 -f E0434352.CLR \u0026lt;pid\u0026gt; Instead of capturing a memory dump, the tool will display all CLR related exceptions in the console as they are thrown. We expected a never ending flow of exception details but… For some reason, procdump seemed to completely slow down the process, generating an output polluted by timeout exceptions, that supposedly aren’t thrown when procdump isn’t attached.\nIt was the time for drastic measures: remote debugging the faulty process with Visual Studio could give us hints about what was going on. This time, we were able to catch a ThreadAbortException we never saw before. Retrying a few times helped us to confirm that it was indeed the culprit.\nThe likely culprit As good .NET citizen, we know that it is bad not recommended to call Thread.Abort but to be sure, we did a search on our code, to discover… that we never abort any thread. We started to search by dichotomy any code change since the last known good run of the application. We found a fix for a code that was using Thread.Abort: maybe the server was still running with the old code! We checked the version of the assembly and double-checked by decompiling it (just in case) but it was the fixed code without any Thread.Abort.\nBack to scratching our head, we kept Visual Studio remotely attached to the process and thankfully, we managed to catch a CannotUnloadAppDomainException from time to time. As the name indicates, this exception is thrown by the AppDomain when it doesn’t manage to unload in a timely fashion. It turns out that unloading an AppDomain is a known cause ofThreadAbortException, so we’ve got a serious lead.\nThat ASP.NET would try to unload an appdomain isn’t surprising. It can occur, for instance, when a configuration file is modified. But it’s certainly not enough to explain the 14K exceptions per second, unless we’re continuously creating and unloading new appdomains.\nDepressingly, we also managed to explain why we weren’t getting anything in the log files. For performance and scalability reason, our logging API is asynchronous: the logger thread responsible for saving the traces was actually one of the first threads being aborted, thus killing our main source of information. Speak about bad luck.\nBack to the problem and digging further into the callstacks, we found out that the exception was always thrown in the same method:\nProgram.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public void SomeMethod() { while (someCondition) { try { // Do some stuff // Sleep 15 minutes Thread.Sleep(15 * 60 * 1000); } catch (Exception ex) { // Log the exception } } } Did we find the culprit? This loop was a good candidate to explain the repeated exceptions: if an exception is thrown then we will go to the catch block, to immediately try again. However, ThreadAbortException is a weird beast: whenever caught, it will automatically be rethrown at the end of the catch block. So in our case, it means that we would leave the “while” block when the exception is rethrown behind our back, and exit the thread. There was no way the code could loop to generate the expected 14k exceptions per second!\nThe only other option was that appdomains would be created and unloaded at a crazy pace, but why?\nAfter turning around for a while, we noticed something peculiar: the ThreadAbortException was always thrown in the same thread; just like if the code was really looping despite what we mentioned earlier…. What’s going on?\nThe plot twist When you’ve excluded every other possible cause, you start doubting of your own knowledge. And so we’ve put together a small program to test the behavior of the ThreadAbortException:\nProgram.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 static void Main(string[] args) { var mutex = new ManualResetEventSlim(); var t = new Thread(() =\u0026gt; { while (true) { try { if (!mutex.IsSet) { mutex.Set(); } // Do some stuff Thread.Sleep(100); } catch (Exception ex) { Console.WriteLine(\u0026#34;Exception: \u0026#34; + ex.Message); } } }); t.Start(); // Wait for the thread to start mutex.Wait(); t.Abort(); Console.ReadLine(); } We start a background thread, doing some fake work in a loop, catching any thrown exception. The main thread then waits for the background thread to start before asking the CLR to stop it by calling Thread.Abort. Following the abort request, a ThreadAbortException will be magically thrown in the context of the background thread. The exception will be caught there, logged in the console, then automatically be rethrown because of its special nature. Then the thread will exit because the exception is rethrown outside of the try/catch block.\nCompile, run, and lo and behold…\nIt seems everything we knew about ThreadAbortException is actually wrong!\nThe behavior can only be reproduced in a release build while , and running in 64 bit. That last part tipped us towards suspecting the JIT. And indeed, disabling RyuJIT in the configuration seems to fix the issue:\nApp.config\n1 2 3 4 5 6 \u0026lt;?xml version=\u0026#34;1.0\u0026#34; encoding=\u0026#34;utf-8\u0026#34;?\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;runtime\u0026gt; \u0026lt;useLegacyJit enabled=\u0026#34;1\u0026#34; /\u0026gt; \u0026lt;/runtime\u0026gt; \u0026lt;/configuration\u0026gt; How could the JIT be involved? The call to Thread.Abort is in fact asynchronous and sets the AbortRequested flag on the Thread object. The CLR will throw a ThreadAbortException as soon as it is reaching a safe place as Jeffrey Richter explains page 580 of his “Clr via C#” book. The code responsible for throwing the exception at the right time is generated by the JIT compiler.\nA JIT bug! After having notified Microsoft, we still had to fix our code. We just need to catch the ThreadAbortException and… rethrow it because it is not done for us by the JIT generated code.\nIn the end… Sometimes, the odds are really against you when investigating an issue. Starting with Procdump changing the behavior of the process and the logger crashing during the appdomain unloading, to end up with a bug in the least visible part of the .NET framework. During those times, it is important to keep a clear head, have a few coworkers around to bounce ideas on, and methodically test every single assertion you make, no matter how confident you feel about it.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-04-06_ryujit-never-ending-threadabort/Abort.png","date":"2017-04-06","permalink":"https://chrisnas.github.io/posts/2017-04-06_ryujit-never-ending-threadabort/","summary":"\u003cp\u003e\u003cstrong\u003eWhen you see this, you know for sure that something is wrong with a server:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cimg loading=\"lazy\" src=\"/posts/2017-04-06_ryujit-never-ending-threadabort/Untitled.png\"\u003e\u003c/p\u003e\n\u003cp\u003eThis chart counts the number of first-chance exceptions thrown by the server. We have here an average of 840K exceptions thrown per minute, or 14K exceptions per second. That’s a lot, especially considering that this server only processes about 400 requests per second. Impossible to find anything meaningful or even related in the logs, and the server seems to respond properly to requests: head scratching situation for sure.\u003c/p\u003e","title":"RyuJIT and the never-ending ThreadAbortException"},{"content":"This second post in the ClrMD series details the basics of parsing the CLR heaps. The associated code checks string duplicates as sample.\nPart 1: Bootstrapping ClrMD to load a dump.\nFrom ClrRuntime to ClrHeap or how to traverse the managed heap In the previous post, we have boostrapped the code needed to load a memory dump and get an instance of ClrRuntime. This type is the starting point for accessing the content of a managed process with ClrMD:\nMost of the memory and heap management is well described in the ClrMD documentation, you are able to:\nlist the application domains with AppDomains, dig into the memory regions with EnumerateMemoryRegions, access the managed heap with GetHeap As shown in the ClrMD samples, the ClrHeap type helps you traversing the managed memory:\nBefore doing any heap exploration with ClrHeap, you need to ensure that the process was not in the middle of a garbage collection when the dump was taken. This can be done by checking CanWalkHeap:\nclrmd.cs\n1 2 3 4 var heap = clr.GetHeap(); if (!heap.CanWalkHeap) return; Then, you can start browsing the objects in memory with the following code:\nclrmd.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 foreach (ClrObject obj in heap.EnumerateObjects()) { try { var objType = obj.Type; if (objType == null) continue; ... } catch (Exception x) { WriteLine(x); // some InvalidOperationException might occur sometimes } } As you can see, each reference to an object is enumerated as ClrObject used to get the type of the corresponding instance via GetObjectType. Note that the Type property might return null in some memory corruption scenario (as commented in the ClrMD samples) so don’t forget to check it in your own code. This kind of simple loop is not really efficient in term of performance when you are dealing with multi-GB dump files. Unfortunately, as the rest of the post explains, sometimes, you don’t have a choice.\nHow duplicated are your strings? Internally at Criteo, we are always trying to improve the performance of the code that runs in production. Lately, one of the leads to limit the memory consumption was to leverage the interningfeature of strings. Since string instances are immutable (i.e. once created, you can’t change their value), The idea is to ask the CLR to keep a single instance of each repeated string in an internal cache. Then, this instance can be shared whenever a string would be duplicated, thus saving memory. This would be especially efficient if an object model is stored as a dictionary where the keys and most of the string fields of the value data share the same values: even if millions of items are stored, their fields points to the very few different hundreds of strings.\nBut before starting any major refactoring, it is mandatory to have metrics about the current status and being able to measure the possible gains. In that context, it would be interesting to get a summary of which strings are the most repeated with their corresponding size in memory: something close to !sos.dumpheap -stat.\nThe code to achieve this goal is simple and straightforward from the loop previously listed: you just have to check if the type is System.String and count every different value in a dictionary.\nClrmd.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Dictionary\u0026lt;string, int\u0026gt; strings = new Dictionary\u0026lt;string, int\u0026gt;(); foreach (var obj in heap.EnumerateObjects()) { ... if (obj.Type.Name != \u0026#34;System.String\u0026#34;) continue; string s = obj as string; if (!strings.ContainsKey(s)) { strings[s] = 0; } strings[s] = strings[s] + 1; ... } The formatting of the results is also simple. The strings are sorted by the size in bytes of all duplicated instances (hence the x2 multiplier factor because a character is UTF-16 encoded on 2 bytes):\nClrmd.cs\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 int totalSize = 0; // Sort by size taken by the instances of string var query = strings .Where(s =\u0026gt; s.Value \u0026gt; minCountThreshold) .Select(e =\u0026gt; new { Count = e.Value, Size = 2 * e.Value * e.Key.Length, Key = e.Key }) .OrderBy(ai =\u0026gt; ai.Size); foreach (var aggregatedInfo in query) { WriteLine(string.Format( \u0026#34;{0,8} {1,12} {2}\u0026#34;, aggregatedInfo.Count, aggregatedInfo.Size, aggregatedInfo.Key.Replace(\u0026#34;\\n\u0026#34;, \u0026#34;## \u0026#34;).Replace(\u0026#34;\\r\u0026#34;, \u0026#34; ##\u0026#34;) )); totalSize += aggregatedInfo.Size; } WriteLine(\u0026#34;-------------------------------------------------------------------------\u0026#34;); WriteLine(string.Format(\u0026#34; {0,12} MB\u0026#34;, totalSize/(1024*1024))); Note that a minCountThreshold parameter is used here as a minimum number of instances of the same string to avoid listing strings not so duplicated in memory. For a better formatting, the \\r\\n are transformed into “##” so each string stays on one line. Here is the result for a simple sample app:\nThe gain would be less than a MB here…\nNext step… In this example, we let ClrMD transparently marshal the strings instances from the dump address space into our tool. However, sometimes, you need to directly and explicitly access the value of type fields. This will be the subject of the next post where we describe how to list the timers running in a process.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-03-24_clrmd-part-2-from-clrruntime/ClrMD1.png","date":"2017-03-24","permalink":"https://chrisnas.github.io/posts/2017-03-24_clrmd-part-2-from-clrruntime/","summary":"\u003cp\u003eThis second post in the ClrMD series details the basics of parsing the CLR heaps. The \u003ca href=\"https://github.com/criteo/criteo-dotnet-blog/tree/master/ClrMD-Part2\"\u003eassociated code\u003c/a\u003e checks string duplicates as sample.\u003c/p\u003e\n\u003cp\u003ePart 1: \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003eBootstrapping ClrMD to load a dump\u003c/a\u003e.\u003c/p\u003e\n\u003ch3 id=\"from-clrruntime-to-clrheap-or-how-to-traverse-the-managed-heap\"\u003eFrom ClrRuntime to ClrHeap or how to traverse the managed heap\u003c/h3\u003e\n\u003cp\u003eIn \u003ca href=\"/posts/2017-02-21_clrmd-part-1-going-beyond/\"\u003ethe previous post\u003c/a\u003e, we have boostrapped the code needed to load a memory dump and get an instance of \u003cstrong\u003eClrRuntime\u003c/strong\u003e. This type is the starting point for accessing the content of a managed process with \u003ca href=\"https://github.com/microsoft/clrmd\"\u003eClrMD\u003c/a\u003e:\u003c/p\u003e","title":"ClrMD Part 2 - From ClrRuntime to ClrHeap or how to traverse the managed heap"},{"content":"A little bit of context Thousands of servers are closely monitored at Criteo and when inconsistent behaviors are detected, an investigation is started based on these deviant machines. The level of details provided by the monitoring is close to what is provided by performance counters. Our team is using them to guess where the problem could come from. The next step is to have a closer look to one of the faulting machines in order to figure out whether our guess is valid… or not.\nLately, we got a few situations where the number of running threads was growing up to hundreds, even thousands. To limit the impact on production machines, we are taking dumps with procdump from SysInternals and we are loading them into WinDBG to dig into the CLR data structures with SOS commands. However, when you are dealing with 20+ GB dumps and thousands of threads, the investigation starts to become complicated simply due to the mass of data, the performance hit and the complexity of the highly multi-threaded code.\nSizing and tooling limits The magic of using sos with WinDBG on a .NET dump is the ability to navigate among your data; either your types, BCL or other libraries types. With !do, you are able to see the values of the fields in the instances of objects manipulated by your application. However, it is clearly less efficient than the navigation provided by the Watch or QuickWatch panes in Visual Studio. In addition, you first need to find the instance(s) you are interested in and it usually means calling !dumpheap -stat and !dumpheap -mt to narrow down the search. On multi-GB dumps, it will cost you minutes just before being able to !do one of the possibly interesting instances.\nOne solution is to build tools that leverage sos commands textual output to automate well known scenari. The LeakShell tool has been built as a WinDBG companion application to ease a memory leaks hunt. Even if LeakShell has been enhanced to directly consume dumps, parsing text outputs from !dumpheap -stat might be fragile if it changes but also not very scalable in large dumps.\nAs developers, we would definitively prefer to write our own tool based on clean and easy to use APIs instead of parsing cryptic textual output. This is exactly the purpose of ClrMD as stated by its first line of documentation: CLR MD is a C# API used to build diagnostics tools. This Microsoft project (thank you so much Lee Culver!) is available from Github and provides a managed wrapper that brings the power of sos and symbol engine to your C# code. Take the time to follow the tutorials and open the samples code: you have everything you need!\nThis series of posts will detail how to write your own tool with ClrMD and describe some of the code we had to write to help our investigations for real world production machines servicing millions of requests per second worldwide.\nBootstrapping a tool based on ClrMD The basic scenario that our tool needs to support is opening a dump file and automate CLR data structures analysis to provide high level summaries.\nWhen you start a project that will use ClrMD, you should tell Nuget to get the official version for you. Right-click the References node of your project and select Manage NuGet Packages. Look for microsoft.diagnostics.runtime (don’t forget the “Include prerelease” check) and install it\nAs stated by the last line of the description: no other dependency is required.\nNote: this version 0.8.31 is still in beta but you could get the latest level of code from Github (more on this in a later post).\nThe root class you’re starting with in ClrMD is DataTarget:\nThis class wraps a debugging session, either a live one by calling AttachToProcess or a post-mortem analysis on a dump file with the LoadCrashDump. For the rest of the series, we will focus on the dump analysis.\nLoading a dump The first step is to tell ClrMD to open a .dmp file and use it to create the DataTarget:\n1 target = DataTarget.LoadCrashDump(dumpFilename); Since .NET 4.0, it is possible to load “several” versions of the CLR in the same process at the same time. The list of the loaded CLR runtimes is provided by the DataTarget.ClrVersions property and for simplicity sake, only the first one will be taken into account in the rest of the series. This property returns a list of ClrInfo:\nSeveral members of this class relate to “Dac”: this makes reference to the data access layer provided by mscordacwks.dll. As most of you should know (if this is not the case, read this detailed post), this library is used by sos.dll and ClrMD to access the internal CLR data structures. This sos/mscordacwks pair is unique per version of the CLR and therefore unique to the dumps you download from a server.\nLoading the right Dac The symbol engine API used by WinDBG and ClrMD behind the scene are slightly different in figuring how to get the right version; i.e. the version corresponding to your dump (which might be different from the one your machine). An easy way to load the right dac is to copy the sos/mscordacwks dlls from the server (from C:\\Windows\\Microsoft.NET\\Framework64\u0026lt;v4.0.30319\u0026gt;) where the dump was taken and paste them in the dump folder.\nYou tell WinDBG where to find mscordacwks.dll with the following command:\n1 .cordll -lp \u0026lt;path\u0026gt; Next, here is the command to explicitly load sos from a folder:\n1 .load \u0026lt;path\u0026gt;\\sos The story is a little bit different for ClrMD: you have to manually simulate the work WinDBG is doing. The ClrInfo instance will help getting the right version of the Runtime instance corresponding to the right version of mscordacwks.dll that you copied from the dump machine:\n1 2 3 Clr = target.ClrVersions[0].CreateRuntime( Path.Combine(Path.GetDirectoryName(dumpFilename), \u0026#34;mscordacwks.dll\u0026#34;) ); If you were not able to copy the dll with the dump file, you could leverage the symbol engine to automatically retrieve it (with the risks of not finding the exact same version if your machines have been patched and the symbols/mscordacwks.dll are not available from the Microsoft servers):\n1 Clr = target.ClrVersions[0].CreateRuntime(); As explained in the getting started page, this call will try to use the “local” mscordacwks.dll from current machine Windows subfolder and if there is no matching version, it will use the DataTarget.SymbolLocator.Find method to download this dll from the Microsoft public servers.\nWinDBG and Visual Studio are taking two environment variables into account when time comes to load .pdb symbols files (_NT_SYMBOL_PATH) and sos.dll/mscordacwks.dll (_NT_EXECUTABLE_IMAGE_PATH).\nThe syntax is simple:\n1 srv*c:\\symbols*http://msdl.microsoft.com/download/symbols After the srv prefix, the different locations where the files could be found are listed in order, prefixed by the * character; from a local one to the Microsoft remote web site.\nThe ClrMD SymbolLocator also leverages the _NT_SYMBOL_PATH environment variable (unlike what the documentation states). Note that secure https syntax for the web site is not supported by the SymbolLocator implementation so be very careful on checking the value stored in your environment variables…\nIf this environment variable is not set, the following default values will be used by the symbol locator (exposed by its SymbolPath property) to download the .pdb symbol files:\n1 SRV*http://msdl.microsoft.com/download/symbols;SRV*http://referencesource.microsoft.com/symbols and they are stored locally into:\n1 C:\\Users\\\u0026lt;user\u0026gt;\\AppData\\Local\\Temp\\symbols (exposed by its SymbolCache property). Unlike WinDBG, to find binary files such as mscordacwks.dll, ClrMD does not take into account the _NT_EXECUTABLE_IMAGE_PATH environment variable. Even worse, if the dll has not been downloaded into the local cache, the CreateRuntime() call throws a FileNotFoundException with the name of the searched dac as its Message property value (ex: mscordacwks_Amd64_Amd64_4.6.1076.00.dll). Note that the deprecated TryDownloadDac method does not throw an exception but returns null instead.\nNext step… Once a runtime has been created from the DataTarget, it is now possible to dig into the dump… to detect string duplicates as the next post will present.\nCo-authored with Kevin Gosse\n","cover":"https://chrisnas.github.io/posts/2017-02-21_clrmd-part-1-going-beyond/clrInfo.png","date":"2017-02-21","permalink":"https://chrisnas.github.io/posts/2017-02-21_clrmd-part-1-going-beyond/","summary":"\u003ch3 id=\"a-little-bit-of-context\"\u003eA little bit of context\u003c/h3\u003e\n\u003cp\u003eThousands of servers are closely monitored at Criteo and when inconsistent behaviors are detected, an investigation is started based on these deviant machines. The level of details provided by the monitoring is close to what is provided by performance counters. Our team is using them to guess where the problem could come from. The next step is to have a closer look to one of the faulting machines in order to figure out whether our guess is valid… or not.\u003c/p\u003e","title":"ClrMD Part 1 - Going beyond SOS"}]