annotate src/fftw-3.3.8/README-perfcnt.md @ 82:d0c2a83c1364

Add FFTW 3.3.8 source, and a Linux build
author Chris Cannam
date Tue, 19 Nov 2019 14:52:55 +0000
parents
children
rev   line source
Chris@82 1 Performance Counters
Chris@82 2 ====================
Chris@82 3
Chris@82 4 FFTW measures execution time in the planning stage, optionally taking advantage
Chris@82 5 of hardware performance counters. This document describes the supported
Chris@82 6 counters and additional steps needed to enable each on different architectures.
Chris@82 7
Chris@82 8 See `./configure --help` for flags for enabling each supported counter.
Chris@82 9 See [kernel/cycle.h](kernel/cycle.h) for the code that accesses the counters.
Chris@82 10
Chris@82 11 ARMv7-A (armv7a)
Chris@82 12 ================
Chris@82 13
Chris@82 14 `CNTVCT`: Virtual Count Register in VMSA
Chris@82 15 --------------------------------------
Chris@82 16
Chris@82 17 A 64-bit counter part of Virtual Memory System Architecture.
Chris@82 18 Section B4.1.34 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
Chris@82 19
Chris@82 20 For access from user mode, requires `CNTKCTL.PL0VCTEN == 1`, which must
Chris@82 21 be set in kernel mode on each CPU:
Chris@82 22
Chris@82 23 #define CNTKCTL_PL0VCTEN 0x2 /* B4.1.26 in ARM Architecture Rreference */
Chris@82 24 uint32_t r;
Chris@82 25 asm volatile("mrc p15, 0, %0, c14, c1, 0" : "=r"(r)); /* read */
Chris@82 26 r |= CNTKCTL_PL0VCTEN;
Chris@82 27 asm volatile("mcr p15, 0, %0, c14, c1, 0" :: "r"(r)); /* write */
Chris@82 28
Chris@82 29 Kernel module source *which can be patched with the above code* available at:
Chris@82 30 https://github.com/thoughtpolice/enable_arm_pmu
Chris@82 31
Chris@82 32 `PMCCNTR`: Performance Monitors Cycle Count Register in VMSA
Chris@82 33 ----------------------------------------------------------
Chris@82 34
Chris@82 35 A 32-bit counter part of Virtual Memory System Architecture.
Chris@82 36 Section B4.1.113 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R
Chris@82 37
Chris@82 38 For access from user mode, requires user-mode access to PMU to be enabled
Chris@82 39 (`PMUSERENR.EN == 1`), which must be done from kernel mode on each CPU:
Chris@82 40
Chris@82 41 #define PERF_DEF_OPTS (1 | 16)
Chris@82 42 /* enable user-mode access to counters */
Chris@82 43 asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1));
Chris@82 44 /* Program PMU and enable all counters */
Chris@82 45 asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(PERF_DEF_OPTS));
Chris@82 46 asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f));
Chris@82 47
Chris@82 48 Kernel module source with the above code available at:
Chris@82 49 [GitHub thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu)
Chris@82 50
Chris@82 51 More information:
Chris@82 52 http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html
Chris@82 53
Chris@82 54 ARMv8-A (aarch64)
Chris@82 55 =================
Chris@82 56
Chris@82 57 `CNTVCT_EL0`: Counter-timer Virtual Count Register
Chris@82 58 ------------------------------------------------
Chris@82 59
Chris@82 60 A 64-bit counter, part of Generic Registers.
Chris@82 61 Section D8.5.17 in ARM Architecture Reference Manual ARMv8-A
Chris@82 62
Chris@82 63 For user-mode access, requires `CNTKCTL_EL1.EL0VCTEN == 1`, which
Chris@82 64 must be set from kernel mode for each CPU:
Chris@82 65
Chris@82 66 #define CNTKCTL_EL0VCTEN 0x2
Chris@82 67 uint32_t r;
Chris@82 68 asm volatile("mrs %0, CNTKCTL_EL1" : "=r"(r)); /* read */
Chris@82 69 r |= CNTKCTL_EL0VCTEN;
Chris@82 70 asm volatile("msr CNTKCTL_EL1, %0" :: "r"(r)); /* write */
Chris@82 71
Chris@82 72 *WARNING*: Above code was not tested.
Chris@82 73
Chris@82 74 `PMCCNTR_EL0`: Performance Monitors Cycle Count Register
Chris@82 75 ------------------------------------------------------
Chris@82 76
Chris@82 77 A 64-bit counter, part of Performance Monitors.
Chris@82 78 Section D8.4.2 in ARM Architecture Reference Manual ARMv8-A
Chris@82 79
Chris@82 80 For access from user mode, requires user-mode access to PMU (`PMUSERENR_EL0.EN
Chris@82 81 == 1`), which must be set from kernel mode for each CPU:
Chris@82 82
Chris@82 83 #define PERF_DEF_OPTS (1 | 16)
Chris@82 84 /* enable user-mode access to counters */
Chris@82 85 asm volatile("msr PMUSERENR_EL0, %0" :: "r"(1));
Chris@82 86 /* Program PMU and enable all counters */
Chris@82 87 asm volatile("msr PMCR_EL0, %0" :: "r"(PERF_DEF_OPTS));
Chris@82 88 asm volatile("msr PMCNTENSET_EL0, %0" :: "r"(0x8000000f));
Chris@82 89 asm volatile("msr PMCCFILTR_EL0, %0" :: "r"(0));
Chris@82 90
Chris@82 91 Kernel module source with the above code available at:
Chris@82 92 [GitHub rdolbeau/enable\_arm\_pmu](https://github.com/rdolbeau/enable_arm_pmu)
Chris@82 93 or in [Pull Request #2 at thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu/pull/2)