寫點東西吧,懒人。

· ☕ 1 分钟

Memory Manager Goals

  • 保证最少 NUMA Node 去满足 POD 的内存需求: Offer guaranteed memory (and hugepages) allocation over a minimum number of NUMA nodes for containers (within a pod).

  • 长远是让pod中的所有 container 运行在尽量少的 NUMA NODE 中: Guaranteeing the affinity of memory and hugepages to the same NUMA node for the whole group of containers (within a pod). This is a long-term goal which will be achieved along with PR #1752 and the implementation of hintprovider.GetPodLevelTopologyHints() API in the Memory Manager.- Offer guaranteed memory (and hugepages) allocation over a minimum number of NUMA nodes for containers (within a pod).


· ☕ 2 分钟

K8s Memory Manager

Requriement

Your Kubernetes server must be at or later than version v1.21. To check the version, enter kubectl version.

To align memory resources with other requested resources in a Pod Spec:

Starting from v1.22, the Memory Manager is enabled by default through MemoryManager feature gate.


· ☕ 2 分钟

Topology Manager Scopes and Policies

Topology Manager provides two distinct knobs: scope and policy.

The scope defines the granularity at which you would like resource alignment to be performed (e.g. at the pod or container level). And the policy defines the actual strategy used to carry out the alignment (e.g. best-effort, restricted, single-numa-node, etc.).

Topology Manager Scopes

The Topology Manager can deal with the alignment of resources in a couple of distinct scopes:


· ☕ 1 分钟

kubectl debug

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/#ephemeral-container

https://towardsdatascience.com/the-easiest-way-to-debug-kubernetes-workloads-ff2ff5e3cc75

1
2
3
kubectl explain pod.spec.ephemeralContainers

kubectl debug -it ephemeral-demo --image=busybox --target=ephemeral-demo

Process Namespace Sharing

https://towardsdatascience.com/the-easiest-way-to-debug-kubernetes-workloads-ff2ff5e3cc75

kubectl debug -it some-app –image=busybox –share-processes –copy-to=some-app-debug


· ☕ 1 分钟
➜  2305 pidstat -t -p 2305 1
Linux 5.4.0-74-generic (labile-T30) 	2021年06月24日 	_x86_64_	(2 CPU)

18时37分39秒   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
18时37分40秒 64055      2305         -    1.00    2.00   14.00    3.00   17.00     1  qemu-system-x86
18时37分40秒 64055         -      2305    1.00    1.00    0.00    3.00    2.00     1  |__qemu-system-x86
18时37分40秒 64055         -      2307    0.00    0.00    0.00    0.00    0.00     0  |__qemu-system-x86
18时37分40秒 64055         -      2312    0.00    0.00    0.00    0.00    0.00     0  |__IO mon_iothread
18时37分40秒 64055         -      2313    0.00    0.00    9.00    5.00    9.00     1  |__CPU 0/KVM
18时37分40秒 64055         -      2314    0.00    2.00    5.00    5.00    7.00     0  |__CPU 1/KVM
18时37分40秒 64055         -      2316    0.00    0.00    0.00    1.00    0.00     1  |__SPICE Worker
18时37分40秒 64055         -     76701    0.00    0.00    0.00    0.00    0.00     0  |__worker
perf kvm stat live

18:42:23.185815

Analyze events for all VMs, all VCPUs:

             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time 

           MSR_WRITE        648    60.67%     5.04%      0.47us 181016.17us    715.22us ( +-  44.28% )
                 HLT        207    19.38%    92.36%      2.05us 3684299.87us  40991.17us ( +-  43.93% )
  EXTERNAL_INTERRUPT         85     7.96%     1.43%      0.34us  43275.57us   1540.36us ( +-  50.18% )
    PREEMPTION_TIMER         78     7.30%     0.25%      0.66us  12804.28us    294.09us ( +-  69.79% )
   PENDING_INTERRUPT         49     4.59%     0.93%      0.60us  84909.22us   1735.07us ( +-  99.87% )
   PAUSE_INSTRUCTION          1     0.09%     0.00%      0.82us      0.82us      0.82us ( +-   0.00% )

Total Samples:1068, Total events handled time:9187522.33us.

· ☕ 1 分钟

Website docs

Github



· ☕ 2 分钟

https://grafana.com/docs/grafana/latest/variables/variable-types/global-variables/

$__range

Currently only supported for Prometheus and Loki data sources. This variable represents the range for the current dashboard. It is calculated by to - from. It has a millisecond and a second representation called $__range_ms and $__range_s.

$__interval

You can use the $__interval variable as a parameter to group by time (for InfluxDB, MySQL, Postgres, MSSQL), Date histogram interval (for Elasticsearch), or as a summarize function parameter (for Graphite).


· ☕ 2 分钟

https://iximiuz.com/en/posts/prometheus-metrics-labels-time-series/

What is a time series in Prometheus?

*Side note 1: Despite being born in the age of distributed systems, every Prometheus server node is autonomous. I.e., there is no distributed metric storage in the default Prometheus setup, and every node acts as a self-sufficient monitoring server with local metric storage. It simplifies a lot of things, including the following explanation, because we don’t need to think of how to merge overlapping series from different Prometheus nodes *😉