annotate src/fftw-3.3.8/README-perfcnt.md @ 169:223a55898ab9 tip default

Add null config files
author Chris Cannam <cannam@all-day-breakfast.com>
date Mon, 02 Mar 2020 14:03:47 +0000
parents bd3cc4d1df30
children
rev   line source
cannam@167 1 Performance Counters
cannam@167 2 ====================
cannam@167 3
cannam@167 4 FFTW measures execution time in the planning stage, optionally taking advantage
cannam@167 5 of hardware performance counters. This document describes the supported
cannam@167 6 counters and additional steps needed to enable each on different architectures.
cannam@167 7
cannam@167 8 See `./configure --help` for flags for enabling each supported counter.
cannam@167 9 See [kernel/cycle.h](kernel/cycle.h) for the code that accesses the counters.
cannam@167 10
cannam@167 11 ARMv7-A (armv7a)
cannam@167 12 ================
cannam@167 13
cannam@167 14 `CNTVCT`: Virtual Count Register in VMSA
cannam@167 15 --------------------------------------
cannam@167 16
cannam@167 17 A 64-bit counter part of Virtual Memory System Architecture.
cannam@167 18 Section B4.1.34 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
cannam@167 19
cannam@167 20 For access from user mode, requires `CNTKCTL.PL0VCTEN == 1`, which must
cannam@167 21 be set in kernel mode on each CPU:
cannam@167 22
cannam@167 23 #define CNTKCTL_PL0VCTEN 0x2 /* B4.1.26 in ARM Architecture Rreference */
cannam@167 24 uint32_t r;
cannam@167 25 asm volatile("mrc p15, 0, %0, c14, c1, 0" : "=r"(r)); /* read */
cannam@167 26 r |= CNTKCTL_PL0VCTEN;
cannam@167 27 asm volatile("mcr p15, 0, %0, c14, c1, 0" :: "r"(r)); /* write */
cannam@167 28
cannam@167 29 Kernel module source *which can be patched with the above code* available at:
cannam@167 30 https://github.com/thoughtpolice/enable_arm_pmu
cannam@167 31
cannam@167 32 `PMCCNTR`: Performance Monitors Cycle Count Register in VMSA
cannam@167 33 ----------------------------------------------------------
cannam@167 34
cannam@167 35 A 32-bit counter part of Virtual Memory System Architecture.
cannam@167 36 Section B4.1.113 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
cannam@167 37
cannam@167 38 For access from user mode, requires user-mode access to PMU to be enabled
cannam@167 39 (`PMUSERENR.EN == 1`), which must be done from kernel mode on each CPU:
cannam@167 40
cannam@167 41 #define PERF_DEF_OPTS (1 | 16)
cannam@167 42 /* enable user-mode access to counters */
cannam@167 43 asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1));
cannam@167 44 /* Program PMU and enable all counters */
cannam@167 45 asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(PERF_DEF_OPTS));
cannam@167 46 asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));
cannam@167 47
cannam@167 48 Kernel module source with the above code available at:
cannam@167 49 [GitHub thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu)
cannam@167 50
cannam@167 51 More information:
cannam@167 52 http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html
cannam@167 53
cannam@167 54 ARMv8-A (aarch64)
cannam@167 55 =================
cannam@167 56
cannam@167 57 `CNTVCT_EL0`: Counter-timer Virtual Count Register
cannam@167 58 ------------------------------------------------
cannam@167 59
cannam@167 60 A 64-bit counter, part of Generic Registers.
cannam@167 61 Section D8.5.17 in ARM Architecture Reference Manual ARMv8-A
cannam@167 62
cannam@167 63 For user-mode access, requires `CNTKCTL_EL1.EL0VCTEN == 1`, which
cannam@167 64 must be set from kernel mode for each CPU:
cannam@167 65
cannam@167 66 #define CNTKCTL_EL0VCTEN 0x2
cannam@167 67 uint32_t r;
cannam@167 68 asm volatile("mrs %0, CNTKCTL_EL1" : "=r"(r)); /* read */
cannam@167 69 r |= CNTKCTL_EL0VCTEN;
cannam@167 70 asm volatile("msr CNTKCTL_EL1, %0" :: "r"(r)); /* write */
cannam@167 71
cannam@167 72 *WARNING*: Above code was not tested.
cannam@167 73
cannam@167 74 `PMCCNTR_EL0`: Performance Monitors Cycle Count Register
cannam@167 75 ------------------------------------------------------
cannam@167 76
cannam@167 77 A 64-bit counter, part of Performance Monitors.
cannam@167 78 Section D8.4.2 in ARM Architecture Reference Manual ARMv8-A
cannam@167 79
cannam@167 80 For access from user mode, requires user-mode access to PMU (`PMUSERENR_EL0.EN
cannam@167 81 == 1`), which must be set from kernel mode for each CPU:
cannam@167 82
cannam@167 83 #define PERF_DEF_OPTS (1 | 16)
cannam@167 84 /* enable user-mode access to counters */
cannam@167 85 asm volatile("msr PMUSERENR_EL0, %0" :: "r"(1));
cannam@167 86 /* Program PMU and enable all counters */
cannam@167 87 asm volatile("msr PMCR_EL0, %0" :: "r"(PERF_DEF_OPTS));
cannam@167 88 asm volatile("msr PMCNTENSET_EL0, %0" :: "r"(0x8000000f));
cannam@167 89 asm volatile("msr PMCCFILTR_EL0, %0" :: "r"(0));
cannam@167 90
cannam@167 91 Kernel module source with the above code available at:
cannam@167 92 [GitHub rdolbeau/enable\_arm\_pmu](https://github.com/rdolbeau/enable_arm_pmu)
cannam@167 93 or in [Pull Request #2 at thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu/pull/2)