Chris@82
|
1 Performance Counters
|
Chris@82
|
2 ====================
|
Chris@82
|
3
|
Chris@82
|
4 FFTW measures execution time in the planning stage, optionally taking advantage
|
Chris@82
|
5 of hardware performance counters. This document describes the supported
|
Chris@82
|
6 counters and additional steps needed to enable each on different architectures.
|
Chris@82
|
7
|
Chris@82
|
8 See `./configure --help` for flags for enabling each supported counter.
|
Chris@82
|
9 See [kernel/cycle.h](kernel/cycle.h) for the code that accesses the counters.
|
Chris@82
|
10
|
Chris@82
|
11 ARMv7-A (armv7a)
|
Chris@82
|
12 ================
|
Chris@82
|
13
|
Chris@82
|
14 `CNTVCT`: Virtual Count Register in VMSA
|
Chris@82
|
15 --------------------------------------
|
Chris@82
|
16
|
Chris@82
|
17 A 64-bit counter part of Virtual Memory System Architecture.
|
Chris@82
|
18 Section B4.1.34 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
|
Chris@82
|
19
|
Chris@82
|
20 For access from user mode, requires `CNTKCTL.PL0VCTEN == 1`, which must
|
Chris@82
|
21 be set in kernel mode on each CPU:
|
Chris@82
|
22
|
Chris@82
|
23 #define CNTKCTL_PL0VCTEN 0x2 /* B4.1.26 in ARM Architecture Rreference */
|
Chris@82
|
24 uint32_t r;
|
Chris@82
|
25 asm volatile("mrc p15, 0, %0, c14, c1, 0" : "=r"(r)); /* read */
|
Chris@82
|
26 r |= CNTKCTL_PL0VCTEN;
|
Chris@82
|
27 asm volatile("mcr p15, 0, %0, c14, c1, 0" :: "r"(r)); /* write */
|
Chris@82
|
28
|
Chris@82
|
29 Kernel module source *which can be patched with the above code* available at:
|
Chris@82
|
30 https://github.com/thoughtpolice/enable_arm_pmu
|
Chris@82
|
31
|
Chris@82
|
32 `PMCCNTR`: Performance Monitors Cycle Count Register in VMSA
|
Chris@82
|
33 ----------------------------------------------------------
|
Chris@82
|
34
|
Chris@82
|
35 A 32-bit counter part of Virtual Memory System Architecture.
|
Chris@82
|
36 Section B4.1.113 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
|
Chris@82
|
37
|
Chris@82
|
38 For access from user mode, requires user-mode access to PMU to be enabled
|
Chris@82
|
39 (`PMUSERENR.EN == 1`), which must be done from kernel mode on each CPU:
|
Chris@82
|
40
|
Chris@82
|
41 #define PERF_DEF_OPTS (1 | 16)
|
Chris@82
|
42 /* enable user-mode access to counters */
|
Chris@82
|
43 asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1));
|
Chris@82
|
44 /* Program PMU and enable all counters */
|
Chris@82
|
45 asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(PERF_DEF_OPTS));
|
Chris@82
|
46 asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));
|
Chris@82
|
47
|
Chris@82
|
48 Kernel module source with the above code available at:
|
Chris@82
|
49 [GitHub thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu)
|
Chris@82
|
50
|
Chris@82
|
51 More information:
|
Chris@82
|
52 http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html
|
Chris@82
|
53
|
Chris@82
|
54 ARMv8-A (aarch64)
|
Chris@82
|
55 =================
|
Chris@82
|
56
|
Chris@82
|
57 `CNTVCT_EL0`: Counter-timer Virtual Count Register
|
Chris@82
|
58 ------------------------------------------------
|
Chris@82
|
59
|
Chris@82
|
60 A 64-bit counter, part of Generic Registers.
|
Chris@82
|
61 Section D8.5.17 in ARM Architecture Reference Manual ARMv8-A
|
Chris@82
|
62
|
Chris@82
|
63 For user-mode access, requires `CNTKCTL_EL1.EL0VCTEN == 1`, which
|
Chris@82
|
64 must be set from kernel mode for each CPU:
|
Chris@82
|
65
|
Chris@82
|
66 #define CNTKCTL_EL0VCTEN 0x2
|
Chris@82
|
67 uint32_t r;
|
Chris@82
|
68 asm volatile("mrs %0, CNTKCTL_EL1" : "=r"(r)); /* read */
|
Chris@82
|
69 r |= CNTKCTL_EL0VCTEN;
|
Chris@82
|
70 asm volatile("msr CNTKCTL_EL1, %0" :: "r"(r)); /* write */
|
Chris@82
|
71
|
Chris@82
|
72 *WARNING*: Above code was not tested.
|
Chris@82
|
73
|
Chris@82
|
74 `PMCCNTR_EL0`: Performance Monitors Cycle Count Register
|
Chris@82
|
75 ------------------------------------------------------
|
Chris@82
|
76
|
Chris@82
|
77 A 64-bit counter, part of Performance Monitors.
|
Chris@82
|
78 Section D8.4.2 in ARM Architecture Reference Manual ARMv8-A
|
Chris@82
|
79
|
Chris@82
|
80 For access from user mode, requires user-mode access to PMU (`PMUSERENR_EL0.EN
|
Chris@82
|
81 == 1`), which must be set from kernel mode for each CPU:
|
Chris@82
|
82
|
Chris@82
|
83 #define PERF_DEF_OPTS (1 | 16)
|
Chris@82
|
84 /* enable user-mode access to counters */
|
Chris@82
|
85 asm volatile("msr PMUSERENR_EL0, %0" :: "r"(1));
|
Chris@82
|
86 /* Program PMU and enable all counters */
|
Chris@82
|
87 asm volatile("msr PMCR_EL0, %0" :: "r"(PERF_DEF_OPTS));
|
Chris@82
|
88 asm volatile("msr PMCNTENSET_EL0, %0" :: "r"(0x8000000f));
|
Chris@82
|
89 asm volatile("msr PMCCFILTR_EL0, %0" :: "r"(0));
|
Chris@82
|
90
|
Chris@82
|
91 Kernel module source with the above code available at:
|
Chris@82
|
92 [GitHub rdolbeau/enable\_arm\_pmu](https://github.com/rdolbeau/enable_arm_pmu)
|
Chris@82
|
93 or in [Pull Request #2 at thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu/pull/2)
|