Home > OS >  About Intel-VTune Cores
About Intel-VTune Cores

Time:11-13

I am new to Intel VTune. So, I have a general doubt.

I am trying to profile an application with VTune and would like to know the placement of VTune core.

How many cores does VTune take up while profiling an application?

Does it depend on OS?

CodePudding user response:

Collecting data from hardware PMU events just requires a bit of work in interrupt handlers on the cores running the code being profiled. That's intentionally fairly light-weight, like only triggering when a counter wraps around. That's a "sample" if you're running something equivalent to perf record instead of perf stat: the CPU has to associate that event with an instruction address, even for events like cycles where the CPU is busy with hundreds of instructions in-flight.

A profiler will adjust the wrapping threshold to generate events with a useful frequency (so you get some samples even for rarer events, but for common events you're not spending all the CPU time handling interrupts).

IDK if VTune does any real-time visualization of that data while a profile is being collected; if so that would happen in the VTune process itself, whatever core(s) that ends up running on, according to the OS scheduling it.

CodePudding user response:

VTune is certainly capable of running "in the background" (via time-sharing on some of the cores) while it is monitoring a job that uses all the cores. Some versions handle this with no problems, while other versions (in combination with specific kernel versions) have struggled when monitoring jobs that use all the cores (i.e., large numbers of missed samples). (Given the insane number of cores & threads in recent processors, this is not surprising at all.)

From my observations, VTune does not do any significant post-processing while the job is running (not even compression of the output files). The "report" is usually run after the job under test is finished, but it can also be deferred to an independent step to be run at your convenience. The visualization and post-processing is done by the "viewer", not by the "collector", so it does not compete with the application under investigation for processing resources. This is for the "classic" version of VTune that saves the results to local files.

There is also a version that runs a web server to host the "viewer" functions via a web browser, but I have not ever tried to make that one work in my environment.

  • Related