· ☕ 3 分钟

https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics

CPU mental model and simplest PMU counter

In a really simplified view our processor looks like this:

There is a clock generator that sends pulses to every piece of the system to make everything moving to the next stage. This is called a cycle. If we add just a little bit of silicon and connect it to the pulse generator we can count a number of cycles, yay!


· ☕ 1 分钟

If we don’t set the scaling governor policy to be performance kernel can decide that it’s better to save power and throttle. Setting scaling_governor to ‘performance’ helps to avoid sub-nominal clocking. Here is the documentation about Linux CPU frequency governors.

Here is how we can set it for all the cores:

1
2
3
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  do echo performance > $i
done

Ref

https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux
https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt


· ☕ 1 分钟
pool1-n104-vpod1-wpool1-n23:~/pmu-tools # x86info -c
x86info vVERSION
Found 80 identical CPUsMP Configuration Table Header MISSING!

Extended Family: 0 Extended Model: 5 Family: 6 Model: 85 Stepping: 7
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Unknown model.
Processor name string (BIOS programmed): Intel(R) Xeon(R) Gold 6230N CPU @ 2.30GHz

Cache info
 L1 Data Cache: 32KB, 8-way associative, 64 byte line size
 L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
 L2 Unified Cache: 1024KB, 16-way associative, 64 byte line size
 L3 Unified Cache: 28160KB, 11-way associative, 64 byte line size
TLB info
 Instruction TLB: 2M/4M pages, fully associative, 8 entries
 Instruction TLB: 4K pages, 8-way associative, 64 entries
 Data TLB: 1GB pages, 4-way set associative, 4 entries
 Data TLB: 4KB pages, 4-way associative, 64 entries
 Shared L2 TLB: 4KB/2MB pages, 6-way associative, 1536 entries
 64 byte prefetching.
Total processor threads: 80
This system has 2 20-core processors with hyper-threading (2 threads per core) running at an estimated 2.30GHz

Ref

Huge pages part 5: A deeper look at TLBs and costs
TLB and Java


· ☕ 4 分钟

What is TLB

https://en.wikipedia.org/wiki/Translation_lookaside_buffer

A translation lookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location. It is a part of the chip’s memory-management unit (MMU). The TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache. A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache


· ☕ 1 分钟

turboboost

Intel Turbo Boost is a feature that automatically raises CPU operating frequency when demanding tasks are running. It can be permanently disabled in BIOS. Check FAQ for more information. To disable turbo in Linux do:

# Intel
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# AMD
echo 0 > /sys/devices/system/cpu/cpufreq/boost

Also you might want to take a look at how it’s done in uarch-bench.

Example (single-threaded workload running on Intel® Core™ i5-8259U):

# TurboBoost enabled
$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
0
$ perf stat -e task-clock,cycles -- ./a.out
      11984.691958      task-clock (msec)         #    1.000 CPUs utilized
    32,427,294,227      cycles                    #    2.706 GHz
      11.989164338 seconds time elapsed
# TurboBoost disabled
$ echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
1
$ perf stat -e task-clock,cycles -- ./a.out
      13055.200832      task-clock (msec)         #    0.993 CPUs utilized
    29,946,969,255      cycles                    #    2.294 GHz
      13.142983989 seconds time elapsed

You can see the average frequency is much higher when TurboBoost is on.


· ☕ 0 分钟

· ☕ 2 分钟

Normal TCP Close Phases

https://accedian.com/blog/close-tcp-sessions-diagnose-disconnections/

Figure 1 – Simplified TCP closing with FIN.

Figure 1 – Simplified TCP closing with FIN.

The standard way to close TCP sessions is to send a FIN packet, then wait for a FIN response from the other party.

  1. A sends a FIN packet and waits for a response; it can release some resources but awaits the response of the other part (Fin Wait)
  2. B receives the FIN packet and must release resources; it waits for a closing application level (Close Wait)
  3. B can now send a FIN to A and then await its acknowledgement (Last Ack wait).
  4. A can now fully close its job, but it must wait for network collision (?) (Time Wait); it may have to send the final ACK another time.
  5. B eventually receives the final ACK and destroys (kills) the connection.

This works fine in a perfect world. However, what happens when one part of the conversation is broken? That’s why the Reset (RST) packet exists.


· ☕ 1 分钟

https://accedian.com/blog/diagnose-tcp-connection-setup-issues/

A TCP connection, also called 3-way Handshake is achieved with SYN, SYN+ACK and ACK packets. From this handshake, we can extract a performance metric called Connection Time (CT), which summarizes how fast session a can be set up between a client and a server over a network. For more details, see this excellent article on Wikipedia.

Figure 1 – How TCP handshake is analyzed


· ☕ 3 分钟

CAP

BASE

BASE 是 Basically Available(基本可用)、Soft state(软状态)和 Eventually consistent(最终一致性)三个短语的简写

复制状态机

https://zhuanlan.zhihu.com/p/86999794

复制状态机的思想是一个分布式的复制状态机系统由多个复制单元组成,每个复制单元均是一个状态机,它的状态保存在一组状态变量中。状态机的状态能够并且只能通过外部命令来改变。


· ☕ 4 分钟

分布式一致性

Paxos

Multi-Paxos

Raft

EPaxos

对比分析

https://developer.aliyun.com/article/768655

1 可理解性

众所周知,Paxos是出了名的晦涩难懂,不仅难以理解,更难以实现。而Raft则以可理解性和易于实现为目标,Raft的提出大大降低了使用分布式一致性的门槛,将分布式一致性变的大众化、平民化,因此当Raft提出之后,迅速得到青睐,极大地推动了分布式一致性的工程应用。