Please enable Javascript to view the contents

 ·  ☕ 3 分钟

https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics

CPU mental model and simplest PMU counter

In a really simplified view our processor looks like this:

There is a clock generator that sends pulses to every piece of the system to make everything moving to the next stage. This is called a cycle. If we add just a little bit of silicon and connect it to the pulse generator we can count a number of cycles, yay!

This is the simplest possible counter. It is called a counter for a reason of course, it’s purpose is to count certain events. Every time a new pulse comes out our counter is incremented by 1. In reality counter is just yet another HW register. You can sample it from time to time to know how many clockticks passed.

Counting cycles is great, however, that’s not super helpful if we want to collect statistics about, say, L1 cache or our execution units.

So, how about counting more?

We can connect our counter to other units just by laying out the wires from every element we interested in to our counter.

Notice, that I added one more element to the figure, it’s a configuration register. Because now we need a way to tell “now I want to sneak into L1” and “now I want to return back to counting cycles”.

I should also point out that this is not everything that is needed for our beautiful counters to work. We also need special assembly instructions to read the value from the counter and write to the config register. In order for those instructions to work we need physical paths from execution units to all the counters to be able to pull the values from it.

With only one counter it’s possible to count only one thing at a time. Ough! You maybe already guessed where I’m going with this. Each additional counter increases complexity and amount of wires quite significantly. And of course we have a limited amount of physical paths on a die.

In practice, architects don’t try to connect every component with every counter, because it increases amount of wires. Instead they try to put counters in a different places on a die to be as closer as possible to the components they are intended to observe. And also they connect each component to at least 2 different counters, so that it’s guaranteed to be able to count two different events at the same time. Taking in consideration our example, it will look something like this:

MSRs - model specific registers

PMU counters and configuration registers are implemented as MSR (Model Specific Registers) registers. What that means is that number of counters and their width can vary from model to model and you can’t rely on the same number of counters in your CPU, you should always query that first, using cpuid.

分享

Mark Zhu
作者
Mark Zhu
An old developer