Introduction
During the implementation of our .NET allocation profiler, we realized that the current sampling mechanism based on a fixed threshold did not provide a good enough statistical distribution. With the help of Noah Falk from the CLR Diagnostics team, I started to implement a randomized sampling based on a Bernoulli distribution model for .NET.

With this kind of changes, you need to ensure that you don’t break any existing code, the impact on performance is limited and the mathematical results map the expected mathematical distribution.
The rest of this blog series details the different tests I wrote and the corresponding tips and tricks that could be reused when you write C# code.
Testing the basics
From a high-level view, the code change does something simple: each time an allocation context is needed to fulfill an allocation, the code checks if it should be sampled. In that case, a new AllocationSampled event is emitted with the same information as the existing AllocationTick event plus an additional field. So, the first level of testing is to validate that the events are emitted when the keyword and verbosity are enabled for the .NET runtime provider.
The runtime has already some tests in place to validate that some events are emitted under the** \src\tests\tracing\eventpipe** folder. Here is the code of my XUnit test that mimics the existing ones such as simpleruntimeeventvalidation:
| |
The IpcTraceTest.RunAndValidateEventCounts helper method accepts:
- The list of providers to enable with which keyword and verbosity level.
- How many events are expected (using -1 in my case because I can’t predict how many random events will be generated
- A callback with the code that will generate events (allocating a lot of instances of a custom type in my case)
- A callback that looks at emitted events
The last callback code relies on TraceEvent to listen to emitted events:
| |
In my case, I’m adding a new event that is emitted when a new keyword is enabled. It means that TraceEvent does not know yet its ID (hence the 303 hardcoded value) or how to unpack the new event payload. This is why I created the AllocationSampleData type to expose the payload as public fields:
| |
This offsetBeforeString value is computed based on the size of UInt32 (=4 bytes), UInt16 (=2 bytes) and a Pointer (depends on 32 bit=4 or 64 bit=8) fields before the string. As Span
| |
Since I know the size of each field from the payload definition in ClrEtwAll.man, the numeric fields are extracted thanks to the BitConverter methods:
| |
Things start to be more complicated when you need to get the value of an address. Its size is 4 bytes in 32 bit and 8 bytes in 64 bit:
| |
The bitness of the monitored application is given by the EventPipeSource’s PointerSize property that is passed to the AllocationSampledData constructor.
For the string case, you need to know that it is stored as UTF16 (so each character requires 2 bytes) with the trailing \0 and its length is the total size of the payload minus the size of the other fields. That way, you can slice the Span to properly read the characters:
| |
The rest of the fields are extracted with BitConverter helpers taking into account the size of the string:
| |
The name of the sampled allocated type from the parsed payload is used to ensure that the expected allocations are indeed emitted when the keyword/verbosity are enabled for the .NET provider.
Testing the performance impact
The next step was to validate the impact of the changes on the GC performance. The baseline was the .NET 9 branch before the changes and in Release. The GCPerfSim library from the performance repository was used to allocate 500 GB of mixed size objects on 4 threads with a 50MB live object size. From the output, the seconds_taken line provides the duration to allocate these objects.
To ensure that you run with the rebuilt branch, you need to use the following commands:
| |
The next step is to use
| |
I run this scenario 10 times to compute the median and the average. I’m doing the same for the PR branch. So far so good. Now, how to do the same but to measure the impact of the random sampling? Remember that the code only triggers if the .NET provider is enabled with a certain keyword and verbosity. It means that you have to use a tool such as dotnet-trace to start an event pipe session but you would need the process id. I could have changed the code of GCPerfSim to show the process id but I would still need to wait for the session to have been created before starting the seconds_taken computation. Not really easy to script a 10x runs that way…
Don’t worry! dotnet-trace supports the — show-child-io true arguments that makes it start the session as the process starts and — providers allows you to enable a provider the way you want. Here is an example of the command line used for the performance runs:
| |
These dotnet-trace features are very handy for any scripting scenario unrelated to testing the CLR. For example, you could use Perfview to later on analyze how an application behaves thanks to the emitted events stored in the generated .nettrace file!
The next episode will describe unexpected usage of EventSource and debugging NativeAOT scenario.
