Home > Software engineering >  How to find the utilization of the hardware performance counters for a specific function (block of c
How to find the utilization of the hardware performance counters for a specific function (block of c

Time:12-15

Problem: I want to measure the hardware performance counters (IPC, branch misprediction, cache miss rate, memory bandwidth, TLB miss, communication performance between CPU and GPU) for a specific part of code in an algorithm training on CPU-GPU. Is there any metric that I can use to find out the utilization of these counters?

Also, Is it possible with perf to collect hardware performance statistics for only part of a python program's execution?

Note: I initially checked performance-features from pip, but could not find the necessary perf counters I needed from the above list mentioned. Can someone let me know if there is any metric that I can use to calculate the utilization of perf counters?

CodePudding user response:

PAPI is certainly what you are looking for. It is a relatively portable performance API to get information about hardware counters. It is available from Python using the package python_papi. PAPI enable you to analyse only a given part of a program (be defining the section using start-stop annotations). It also supports GPUs as well. To start with, you can read this research paper about it. Good luck for your PhD.

  • Related