cannam@167: Performance Counters cannam@167: ==================== cannam@167: cannam@167: FFTW measures execution time in the planning stage, optionally taking advantage cannam@167: of hardware performance counters. This document describes the supported cannam@167: counters and additional steps needed to enable each on different architectures. cannam@167: cannam@167: See `./configure --help` for flags for enabling each supported counter. cannam@167: See [kernel/cycle.h](kernel/cycle.h) for the code that accesses the counters. cannam@167: cannam@167: ARMv7-A (armv7a) cannam@167: ================ cannam@167: cannam@167: `CNTVCT`: Virtual Count Register in VMSA cannam@167: -------------------------------------- cannam@167: cannam@167: A 64-bit counter part of Virtual Memory System Architecture. cannam@167: Section B4.1.34 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R cannam@167: cannam@167: For access from user mode, requires `CNTKCTL.PL0VCTEN == 1`, which must cannam@167: be set in kernel mode on each CPU: cannam@167: cannam@167: #define CNTKCTL_PL0VCTEN 0x2 /* B4.1.26 in ARM Architecture Rreference */ cannam@167: uint32_t r; cannam@167: asm volatile("mrc p15, 0, %0, c14, c1, 0" : "=r"(r)); /* read */ cannam@167: r |= CNTKCTL_PL0VCTEN; cannam@167: asm volatile("mcr p15, 0, %0, c14, c1, 0" :: "r"(r)); /* write */ cannam@167: cannam@167: Kernel module source *which can be patched with the above code* available at: cannam@167: https://github.com/thoughtpolice/enable_arm_pmu cannam@167: cannam@167: `PMCCNTR`: Performance Monitors Cycle Count Register in VMSA cannam@167: ---------------------------------------------------------- cannam@167: cannam@167: A 32-bit counter part of Virtual Memory System Architecture. cannam@167: Section B4.1.113 in ARM Architecture Reference Manual ARMv7-A/ARMv7-R cannam@167: cannam@167: For access from user mode, requires user-mode access to PMU to be enabled cannam@167: (`PMUSERENR.EN == 1`), which must be done from kernel mode on each CPU: cannam@167: cannam@167: #define PERF_DEF_OPTS (1 | 16) cannam@167: /* enable user-mode access to counters */ cannam@167: asm volatile("mcr p15, 0, %0, c9, c14, 0" :: "r"(1)); cannam@167: /* Program PMU and enable all counters */ cannam@167: asm volatile("mcr p15, 0, %0, c9, c12, 0" :: "r"(PERF_DEF_OPTS)); cannam@167: asm volatile("mcr p15, 0, %0, c9, c12, 1" :: "r"(0x8000000f)); cannam@167: cannam@167: Kernel module source with the above code available at: cannam@167: [GitHub thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu) cannam@167: cannam@167: More information: cannam@167: http://neocontra.blogspot.com/2013/05/user-mode-performance-counters-for.html cannam@167: cannam@167: ARMv8-A (aarch64) cannam@167: ================= cannam@167: cannam@167: `CNTVCT_EL0`: Counter-timer Virtual Count Register cannam@167: ------------------------------------------------ cannam@167: cannam@167: A 64-bit counter, part of Generic Registers. cannam@167: Section D8.5.17 in ARM Architecture Reference Manual ARMv8-A cannam@167: cannam@167: For user-mode access, requires `CNTKCTL_EL1.EL0VCTEN == 1`, which cannam@167: must be set from kernel mode for each CPU: cannam@167: cannam@167: #define CNTKCTL_EL0VCTEN 0x2 cannam@167: uint32_t r; cannam@167: asm volatile("mrs %0, CNTKCTL_EL1" : "=r"(r)); /* read */ cannam@167: r |= CNTKCTL_EL0VCTEN; cannam@167: asm volatile("msr CNTKCTL_EL1, %0" :: "r"(r)); /* write */ cannam@167: cannam@167: *WARNING*: Above code was not tested. cannam@167: cannam@167: `PMCCNTR_EL0`: Performance Monitors Cycle Count Register cannam@167: ------------------------------------------------------ cannam@167: cannam@167: A 64-bit counter, part of Performance Monitors. cannam@167: Section D8.4.2 in ARM Architecture Reference Manual ARMv8-A cannam@167: cannam@167: For access from user mode, requires user-mode access to PMU (`PMUSERENR_EL0.EN cannam@167: == 1`), which must be set from kernel mode for each CPU: cannam@167: cannam@167: #define PERF_DEF_OPTS (1 | 16) cannam@167: /* enable user-mode access to counters */ cannam@167: asm volatile("msr PMUSERENR_EL0, %0" :: "r"(1)); cannam@167: /* Program PMU and enable all counters */ cannam@167: asm volatile("msr PMCR_EL0, %0" :: "r"(PERF_DEF_OPTS)); cannam@167: asm volatile("msr PMCNTENSET_EL0, %0" :: "r"(0x8000000f)); cannam@167: asm volatile("msr PMCCFILTR_EL0, %0" :: "r"(0)); cannam@167: cannam@167: Kernel module source with the above code available at: cannam@167: [GitHub rdolbeau/enable\_arm\_pmu](https://github.com/rdolbeau/enable_arm_pmu) cannam@167: or in [Pull Request #2 at thoughtpolice/enable\_arm\_pmu](https://github.com/thoughtpolice/enable_arm_pmu/pull/2)