Intel VTune Profiler Installation

April 28, 2020, 11:44 am

Latest and popular articles on Intel Technologies

≫ Next: My scenario and Intel VTune Profiler.

Hi,

I would like to install Intel VTune profiler on my Centos 7 machine.....AFAIK, the latest version does not work on CentOS 6.. But Centos 7 is fine, isn't it?

Please help me regarding the extracting the installation package to a writeable directory with the following command:

tar -xzf vtune_profiler_.<version>.tar.gz

I am struggling with the version part. Which version should I put? I mean what should be the complete command?

Thanks in advance .-)

BR
Bobby !

↧

My scenario and Intel VTune Profiler.

April 29, 2020, 4:19 am

Latest and popular articles on Intel Technologies

≫ Next: Vtune Data microarchitecture analysis metrics vs varying CPU frequency

≪ Previous: Intel VTune Profiler Installation

I am working on Ceph (https://en.wikipedia.org/wiki/Ceph_(software). It is an open-source software storage platform.

- Through git I have cloned Ceph in my home folder.
- Built its dependencies
- And compiled Ceph in debug mode.

In order to work as a developer on Ceph, a Ceph utility, vstart.sh, allows you to deploy fake local cluster for development purpose. The fake local cluster is deployed by a script vstart.sh.

As it deploys a cluster, as a developer you can develop READ and WRITE tests to test the deployed cluster. Hence client programs (client test codes). These READ and WRITE tests are compiled using GCC in build folder. Once compiled you get executable. I have some of these test codes in C and some in C++.

Here I would like to bring Intel VTune Profiler in my workflow. I would like to do profiling of Ceph through my READ and WRITE test codes. Profiling of call functions, loops, etc.

And I am using a single virtual machine (Linux CentOS 7). Ceph is written mostly in C++.

My questions:

- Does Intel VTune Profiler fits my scenario?

- If yes, given my scenario, where exactly Intel VTune Profiler should be installed?

- The executable of READ and WRITE test codes are in build folder i.e. /home/user/ceph/build....How can I launch VTune profiler in this case?

- Does Intel VTune Profler supports C executable?

Looking forward to help.

↧

Vtune Data microarchitecture analysis metrics vs varying CPU frequency

April 29, 2020, 8:27 am

Latest and popular articles on Intel Technologies

≫ Next: 2 problems with Threading analysis

≪ Previous: My scenario and Intel VTune Profiler.

Hi,
I ran a code with different frequencies and collected vtune data (on 8280 processor, rhel 7) using microarchitecture analysis. I understand that vtune(v2020) can be used to identify the portions of codes which are underutilizing the given hardware resources on a processor. I did this experiment in order to see how the application responds on variation of a particular component of hardware or , which hardware component limits the scaling of this application (example - memory frequency/cpu frequency etc.)?

So, i gathered the data with various frequencies (acpi-cpufreq) and followed the metrics breakdown trail of the numbers shown in red color on vtune GUI as -
1: Back End Bound --> 2: (Memory Bound, Core Bound) --> 3: DRAM Bound --> 4: (Memory Bandwidth, Memory Latency) --> 5: Local DRAM.

I noticed that -
a) Back-End Bound: = Memory Bound + Core Bound , example (62% of clock ticks = 42 % + 20 %)
b) Memory Bound ~= L1 Bound + L2 Bound + L3 Bound + DRam Bound + Store Bound(42 ~= 8% + 3% + 2% + 20% + 6%)
c) DRam Bound < Memory Bandwidth Bound + Memory Latency (20 < 28 + 10)
d) Memory Latency << Local DRAM + Remote DRAM + Remote Cache (10 << 97 + 2 + 1)

Q1: What could be the reason behing the subcategory total exceeding the category value for c & d ?
for c and d i was expecting something like DRam Bound = Memory Bandwidth Bound + Memory Latency.

Q2: On increasing the CPU frequency i got following from vtune for DRAM Memory Bandwidth
1GHz - 28 % of Clockticks
1.4GHz - 37 %
1.8GHz - 42 %
2GHz - 42.5 %
2.6GHz - 42.8 %
2.7GHz - 42.9 %
2.7+boost enabled - 41.7 %
- The number of CPU stalls (for DRAM) are not increasing when the frequency exceeds 1.8 GHz. Now i am looking for the reason behind this behaviour.
I expected that with higher frequency, stalls would grow as more CPU cycles/ pipeline slots will be wasted due to data unavailability.
I am focusing on metrics highlighted in red. As cache bound clock cycles were almost constant (.2/.4% increase in each of L1,L2,L3,Store) for all the frequencies mentioned above, could i say that larger cache will not help here? - contrary to what is mentioned here

Q3: I noted that on varying the frequencies, the Vector Capacity Usage (FPU) stays constant at around 70%. Which from the explanation here means that 70% of my floating point computations executed on VPU units (rest were scalar).
also, here i can see that there are different types of execution units which can process 256 but data. Is it possible to see the break up of the Floating point applications like - how many used 256-FP MUL, how many used 256 FP Add etc ?

Q4: Are 256 FP Add/256-FP MUL and FMA are different ? If yes then on which port the front end unit dispatches the uOPs for FMA? as i can't see the FMA unit in the block diagram

please let me now if some more information is required from my end or any of the questions mentioned above are vague / unclear.

↧

2 problems with Threading analysis

April 29, 2020, 11:15 pm

Latest and popular articles on Intel Technologies

≫ Next: Running finalize on different host from profiling

≪ Previous: Vtune Data microarchitecture analysis metrics vs varying CPU frequency

Hello!

I am trying to profile a C++ application with OpenMP using Intel Vtune Profiler and I have 2 troubles with Threading analysis.
1) If I run the application for a short time (there are different modes and options in this application, so I can vary the time of execution), I reach the limit of 1000 MB collected data in a few minutes.
2) Even if I run application for a short time (data limit isn't reached), after data collection, finalization of the results freezes (see "freeze" picture).

Tell me, please what should I do to solve these problems?

Many thanks! :)

P.S. For example, Hotspot analysis runs relatively correctly. I have Windows 10 and NetBeans with MinGW compiler.

UPD. I've accidentally discovered such options for application that Threading analysis runs correctly (these options corresponds to the most effective and short-time execution). But problems I've mentioned above are still interesting for me because I want to run Threading analysis with different options.

Attachment	Size
Download freeze.png	17.94 KB

↧

Running finalize on different host from profiling

April 30, 2020, 6:20 am

Latest and popular articles on Intel Technologies

≫ Next: Speeding up finalize

≪ Previous: 2 problems with Threading analysis

Hi. I need to run some code to be profiled on a special set of machines, but because they are a limited resource, I would like to do the finalization on a different machine. All of the machines have essentially the same set up (same OS, same file systems mounted, same libraries). However, when I try to finalize on a different host vtune detects that the host is different and doesn't automatically find the libraries and debug information, even though all of the information is in exactly the same place. The application is big and complicated, with hundreds of libraries and code spread over a large directory tree; it doesn't seem like I can just specify a top-level directory and have the tool find everything. Is there a way I can tell vtune to behave the same as if I was running finalize on the same host?

Thanks,

Ben

↧

Speeding up finalize

April 30, 2020, 6:27 am

Latest and popular articles on Intel Technologies

≫ Next: Interpretation of profiling results

≪ Previous: Running finalize on different host from profiling

Hi. I'm wondering if there are any tricks to speeding up finalization. My jobs usually run for 4-6 hours, of which maybe one or two hours has profiling enabled. However, finalization can take 2-4 days. I've tried limiting the sampling rate and the total data stored, but even then it still usually takes 10x longer to finalize than to profile (I have this sense that this didn't use to be the case when I was last using vtune a few years ago; perhaps I was using an older version?). I'm currently using VTune 2019. If it will make a big difference I could try to get it upgraded, but the tools are managed centrally so that's not always easy. I'm hoping there are some things I can do to bring the finalization time down without losing too much in the way of profiling coverage.

Thanks,

Ben

↧

Interpretation of profiling results

April 30, 2020, 9:10 am

Latest and popular articles on Intel Technologies

≫ Next: cache hit/miss rate calculation - cascadelake platform

≪ Previous: Speeding up finalize

Hello!

I am trying to profile a C++ application with OpenMP using Intel Vtune Profiler and I've run Hotspots, Threading user-mode and hardware-based analyzes (see "hotspots", "user" and "hardware" pictures +"threads" picture from hardware analysis).

I have several questions about results of these analyzes and I ask to help me.

1) What do these results generally mean? If I'm not mistaken, Hotspots analysis revealed that most of time was spent usefully and then Threading analyzes shows the opposite.
2) What is Semaphore object in Threading user-mode analysis?
3) Why one thread has such a lot of load? ("threads" picture) Most of work is done in parallel region.

What should I do to increase parallelism of this application?

I've read the documentation: https://software.intel.com/en-us/vtune-help-windows-targets but still can't understand what's happening in my case.

Algorithm of application is simple:
#pragma omp parallel num_threads(8){
    if(myID==0){
    <master thread job>
    }
    #pragma omp for schedule(static)
        <parallel cycle>
    if(myID==0){
    <master thread job>
    }
}

Many thanks! :)

P.S. I have Windows 10 and NetBeans with MinGW compiler

Attachment	Size
Download hotspots.png	122.33 KB
Download user.png	117.82 KB
Download hardware.png	79.46 KB
Download threads.png	36.51 KB

↧

cache hit/miss rate calculation - cascadelake platform

May 4, 2020, 9:13 am

Latest and popular articles on Intel Technologies

≫ Next: Intel VTune Profiler for time spent per function call

≪ Previous: Interpretation of profiling results

Hi,
I ran microarchitecture analysis on 8280 processor and i am looking for usage metrics related to cache utilization like - L1,L2 and L3 Hit/Miss rate (total L1 miss/total L1 requests ...., total L3 misses / total L3 requests) for the overall application. I was unable to see these in the vtune GUI summary page and from this article it seems i may have to figure it out by using a "custom profile".
From the explanation here (for sandybridge) , seems we have following for calculating "cache hit/miss rates" for demand requests-

Demand Data L1 Miss Rate => cannot calculate.
Demand Data L2 Miss Rate =>
(sum of all types of L2 demand data misses) / (sum of L2 demanded data requests) =>
(MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS) / (L2_RQSTS.ALL_DEMAND_DATA_RD)
Demand Data L3 Miss Rate =>
L3 demand data misses / (sum of all types of demand data L3 requests) =>
MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS / (MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)

Q1: As this post was for sandy bridge and i am using cascadelake, so wanted to ask if there is any change in the formula (mentioned above) for calculating the same for latest platform and are there some events which have changed/added in the latest platform which could help to calculate the -
- L1 Demand Data Hit/Miss rate
- L1,L2,L3 prefetch and instruction Hit/ Miss rate
also, in this post here , the events mentioned to get the cache hit rates does not include ones mentioned above (example MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS)

amplxe-cl -collect-with runsa -knob event-config=CPU_CLK_UNHALTED.REF_TSC,MEM_LOAD_UOPS_RETIRED.L1_HIT_PS,MEM_LOAD_UOPS_RETIRED.L1_MISS_PS,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS,MEM_UOPS_RETIRED.ALL_LOADS_PS,MEM_UOPS_RETIRED.ALL_STORES_PS,MEM_LOAD_UOPS_RETIRED.L2_HIT_PS:sa=100003,MEM_LOAD_UOPS_RETIRED.L2_MISS_PS -knob collectMemBandwidth=true -knob dram-bandwidth-limits=true -knob collectMemObjects=true

Q2: what will be the formula to calculate cache hit/miss rates with aforementioned events ?

Q3: is it possible to get few of these metrics (like MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS,... ) from the uarch analysis 's raw data which i already ran via -

mpirun -np 56 -ppn 56 amplxe-cl -collect uarch-exploration -data-limit 0 -result-dir result_uarchexpl -- $PWD/app.exe

So, the following will the correct way to run the custom analysis via command line ? -

mpirun -np 56 -ppn 56 amplxe-cl -collect-with runsa -data-limit 0 -result-dir result_cacheexpl -knob event-config=MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS,MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS,MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS,L2_RQSTS.ALL_DEMAND_DATA_RD,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS,CPU_CLK_UNHALTED.REF_TSC,MEM_LOAD_UOPS_RETIRED.L1_HIT_PS,MEM_LOAD_UOPS_RETIRED.L1_MISS_PS,MEM_LOAD_UOPS_RETIRED.L3_HIT_PS,MEM_LOAD_UOPS_RETIRED.L3_MISS_PS,MEM_UOPS_RETIRED.ALL_LOADS_PS,MEM_UOPS_RETIRED.ALL_STORES_PS,MEM_LOAD_UOPS_RETIRED.L2_HIT_PS:sa=100003,MEM_LOAD_UOPS_RETIRED.L2_MISS_PS  -- $PWD/app.exe

(please let me know if i need to use more/different events for cache hit calculations)

Q4: I noted that to calculate the cache miss rates, i need to get/view data as "Hardware Event Counts", not as "Hardware Event Sample Counts".https://software.intel.com/en-us/forums/vtune/topic/280087 How do i ensure this via vtune command line? as I generate summary via -

vtune -report summary -report-knob show-issues=false -r <my_result_dir>.

Let me know if i need to use a different command line to generate results/event values for the custom analysis type.

↧

Intel VTune Profiler for time spent per function call

May 5, 2020, 9:12 am

Latest and popular articles on Intel Technologies

≫ Next: Intel VTune Profiler on Amazon bare metal instances

≪ Previous: cache hit/miss rate calculation - cascadelake platform

Hi amazing support :-)

I have a question. Does Intel VTune profiler gives an option to see the time per call....I mean time per function call... 'time in sec' spent in each function. Is there any option that can be used in the command window, that would help me know the time, in seconds, spent in each function in my C ++ application?

Thanks in advance !

↧

Intel VTune Profiler on Amazon bare metal instances

May 5, 2020, 9:16 am

Latest and popular articles on Intel Technologies

≫ Next: Ubuntu 20.04 python3.5/python3.8 failure

≪ Previous: Intel VTune Profiler for time spent per function call

Hi,

Can we use Intel VTune Profiler on Amazon bare metal instances?

↧

Ubuntu 20.04 python3.5/python3.8 failure

May 6, 2020, 6:41 am

Latest and popular articles on Intel Technologies

≫ Next: Bad parallelism on a laptop

≪ Previous: Intel VTune Profiler on Amazon bare metal instances

I have installed Intel Parallel XE 2020 on the ubuntu 20.04.

I am trying to run the following command

vtune -run-pass-thru=--no-altstack -collect hotspots -- /usr/bin/python3 test.py which calls python3.8(system default). The vtune immediatly stops and the following crash is generated by the system.

(see pic attached)

when I run it with vtune -run-pass-thru=--no-altstack -collect hotspots -- /usr/bin/python3.5 test.py it runs fine. However, the soft I want to profile is compiled againts python3.8, so I can not profile it now.

How can I solve this problem without recompiling the software with python3.5?

Cheers.

test.py:

import time

time.sleep(60)

System:

cmake version 3.16.3

gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0

Ubuntu 20.04 LTS.

vtune_profiler_2020.1.0.607630/

Attachment	Size
Download vtune_crash.png	94.38 KB

↧

Bad parallelism on a laptop

May 11, 2020, 4:28 am

Latest and popular articles on Intel Technologies

≫ Next: Sampling drivers not loaded correctly

≪ Previous: Ubuntu 20.04 python3.5/python3.8 failure

Hello!

I have very big problems with parallelism on my laptop. I've written simple OpenMP parallel program like

#pragma omp parallel num_threads(8)
#pragma omp for schedule(static)
<arithmetics>

and then Hotspots analysis have revealed that parallelism is very bad. Debug version runs for a relatively long time, but it has at least some parallelism (can't attach pictures due to an error on this site :( ). Release version is much faster, but it has no parallelism at all. By the way, microarchitecture exploration analysis revealed that I have many problems with hardware (for example, bad Front-End Bound parameters).

I have no idea why parallelism is so bad even for simple arithmetic program :( The model of laptop is Acer Nitro AN515-51

P.S. I have Windows 10 and NetBeans with MinGW compiler.

↧

Sampling drivers not loaded correctly

May 11, 2020, 7:24 am

Latest and popular articles on Intel Technologies

≫ Next: perf_event_paranoid setting

≪ Previous: Bad parallelism on a laptop

It's been 3 days that I'm trying to make hardware event-based sampling work on intel vtune but at this point I cannot really figure out what my problem can be.

I'm running Ubuntu 18.04 on a macbook pro with Intel i7-4870HQ, 2.5Ghz. It's not a virtual machine, it's booted and I'm using rEFInd for selecting the OS at startup. When I run uname -r I get: 5.3.0-51-generic.

I installed vtune as described in the installation page. I also checked the kernel configuration to match for correct hardware event-based sampling and I also set the kptr_restrict to 0.

To open vtune I run sudo ./vtune-gui from /opt/intel/vtune_profiler_2020.1.0.607630/bin64/.

The binary that gets profiled is compiled with gcc and using the -g and -O3 flags.

However when I try running the vtune profiler with hardware event-based sampling I get the warning displayed in the image at the bottom. Before I alsohad the warnings like: cannot locate 'vtssoo.ko', and: cannot locate debugging information for the linux kernel. Now they're gone but I'm not sure if they are solved though.

I think these warnings boil down to the drivers. I run ./insmod-sep -q from /opt/intel/vtune_profiler_2020.1.0.607630/sepdk/src and I get this output:

pax driver is loaded and owned by group "vtune" with file permissions "660".
socperf3 driver is not correctly loaded.
sep5 driver is not correctly loaded.
socwatch driver is loaded.
vtsspp driver is loaded and owned by group "vtune" with file permissions "660".

2 drivers are not loaded correctly. Following the instructions to build the drivers didn't help either and I still get this result.
A colleague of mine has the same setup (same OS and version) but he's using a lenovo and not macbook pro. In his case after the installation he could already run hardware event-based sampling without warnings and with all the useful information.
I'm starting to think that it might be something with the macbook pro build. What could the problem be? I would really appreciate any help because I really need this feature. Thanks a lot for you effort and time.

↧

perf_event_paranoid setting

May 12, 2020, 1:51 am

Latest and popular articles on Intel Technologies

≫ Next: VTune empty Report issue with a Gromacs workload

≪ Previous: Sampling drivers not loaded correctly

Hi all,

I have got a warning message in vtune profiling.
"amplxe: Warning: Only user space will be profiled due to credentials lack. Consider changing /proc/sys/kernel/perf_event_paranoid file for enabling kernel space profiling."
and current "perf_event_paranoid" is 2.

I would like to use the options including hotspots and hpc-performance in Vtune profiler, Advisor and Tracer.
In the current situation, Are there any methods to use the profiler without changing the "perf_event_paranoid" file?
Because I am not an administrator of whole systems, I can't modify the files.
To ask the administrator, I need solid evidence to modify the file. Is there any document containing the recommended option about "perf_event_paranoid".
Also, In my case, what is the recommended value (0 or 1)?

Thank you in advance.

↧

VTune empty Report issue with a Gromacs workload

May 13, 2020, 1:40 pm

Latest and popular articles on Intel Technologies

≫ Next: Vtune stuck in Finalizing Results

≪ Previous: perf_event_paranoid setting

I have encountered an issue with VTune 2021.1-beta06 on DevCloud where it completes profiling a workload with a specific dataset, but when I try to create reports they are empty. If I run the workload with a different, smaller dataset the reports work fine.

-operating system: Ubuntu 18.04.4 LTS and gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

Steps to recreate:

1. Download Gromacs: wget http://ftp.gromacs.org/pub/gromacs/gromacs-2020.2.tar.gz

2. Follow the installation steps specified in the following guide (cmake step shown below): http://manual.gromacs.org/2020.2/install-guide/index.html

cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX= <path to local install dir> ( e.g. /home/u42144/software/gromacs)

3. Download the dataset then extract: wget https://repository.prace-ri.eu/ueabs/GROMACS/1.2/GROMACS_TestCaseA.tar.gz

4. Run VTune with the workload: vtune -collect hotspots -r v-hotspots -- gmx mdrun -s ion_channel.tpr

5. Create hotspots report: vtune -report hotspots -results-dir v-hotspots

example hotspots report output:

Function CPU Time CPU Time:Effective Time CPU Time:Effective Time:Idle CPU Time:Effective Time:Poor CPU Time:Effective Time:Ok CPU Time:Effective Time:Ideal CPU Time:Effective Time:Over CPU Time:Spin Time CPU Time:Spin Time:Imbalance or Serial Spinning CPU Time:Spin Time:Lock Contention CPU Time:Spin Time:Other CPU Time:Overhead Time CPU Time:Overhead Time:Creation CPU Time:Overhead Time:Scheduling CPU Time:Overhead Time:Reduction CPU Time:Overhead Time:Atomics CPU Time:Overhead Time:Other Module Function (Full) Source File Start Address
-------- -------- ----------------------- ---------------------------- ---------------------------- -------------------------- ----------------------------- ---------------------------- ------------------ ----------------------------------------------- ---------------------------------- ------------------------ ---------------------- ------------------------------- --------------------------------- -------------------------------- ------------------------------ ---------------------------- ------ --------------- ----------- -------------

↧

Vtune stuck in Finalizing Results

May 15, 2020, 6:34 am

Latest and popular articles on Intel Technologies

≫ Next: Intel VTune Profiler Installation on Intel Machine and Prerequisites

≪ Previous: VTune empty Report issue with a Gromacs workload

Every one of my captures lead to Vtune being stuck during the Finalizing Result phase.

I see the 'data collection is completed successfully' message and the progress bar gets stuck. There's not specific pattern where it gets stuck, it could be while reading the trace file or a on a dll used by the program. I tried to leaving the process running for hours to see if it would eventually finish, no luck.

To get pass this I usually have to kill VTune from the task manager; re-open VTune and load the capture on which it crashed. At which point the resolution just takes a few seconds and I'm able to see navigate my capture normally.

This happens whether I use Vtune as a stand alone or withing Visual Studio (which i stopped doing since this process usually meant my instance of VS would need to be killed as well).

I'm on Windows 10 1809 (17763.1158)

and using Vtune 2020 (build 605129)

I have been plagued with this behavior for a while and on various versions of Vtune. I have to carefully pick a version from which I can apply the workaround described above, for example, I tried 2020 patch 1 and I'm not able to kill the Vtune instance and resume from it, therefore I had to revert to the version I'm currently using.

Any suggestions would be greatly appreciated.

↧

Intel VTune Profiler Installation on Intel Machine and Prerequisites

May 18, 2020, 4:19 am

Latest and popular articles on Intel Technologies

≫ Next: VTune manual remote command run parameters and local analysis ?

≪ Previous: Vtune stuck in Finalizing Results

Hi,

Earlier I was using Intel VTune profiler on a AMD machine. Though I could use User Based Sampling for my application but as it was AMD machine, therefore HW sampling was not possible. Now I am using an Intel Machine. And I would really like to use all profiling options finally.

I am using Intel VTune on my Virtual Machine through Virtual Box. Following are my machine details:

- OS: Centos 7

- uname -r : 3.10.0-1127.e17.x86_64

- virt-what: virtual box kvm

- model name: Intel(R) Core(TM) i7-7500U CPU

While installing Intel VTune Profiler 2020 Update 1, I am getting following two message in Prerequisites ( also attached in this thread):

- The system is running in the virtual environment. Sampling drivers will not be installed.

- Kernel source directory is not found. Unable to build the sampling drivers.

I have also executed the below command but I still get these two messages in my Preequisites installation step:

sudo yum install kernel-devel-3.10.0-1127.e17.x86_64

Having said that, I could proceed with the installation but I don't want to use user mode sampling only. I would like to have all profiling options. And I think until and unless I don't resolve these two messages in preequisites, I won't be able to use all profiling option.

Also, if you can see in the attached file, the prerequiste steps says "set up this parameter in Advanced Options -> Driver Build Options dialog". The question also is. where do I find this option?

I would be really grateful if you can please help me resolving these two messages.

Thanks in advance !!

Attachment	Size
Download IntelVtune.JPG	84.66 KB

↧

VTune manual remote command run parameters and local analysis ?

May 18, 2020, 10:56 am

Latest and popular articles on Intel Technologies

≫ Next: questions about memory access/hotspots analysis and sample exe files.

≪ Previous: Intel VTune Profiler Installation on Intel Machine and Prerequisites

Greetings:

I am running VTune locally on my Apple laptop, attempting to analyze a remote system which runs Linux. However, due to system configuration and administration requirements within my company, I am unable to configure a remote Linux target via SSH for VTune.

Is there a way to determine the command line flags that the VTune GUI would have tried to run via SSH on the remote system? I would like to log into the remote Linux host and run that exact command manually, then download the resulting data on to my local workstation to analyze the results with the VTune GUI.

Ideally, I'd like to ask VTune on the remote host to attach to a specific PID (and all its child processes/threads), extract all the data that the GUI was expecting with my configuration settings, and then copy the resulting data that was collected to my local workstation.

Is this possible? Is there documentation explaining how to do this and how to import the results? It would be amazing if the VTune GUI had a display somewhere that said "Run this command on the remote host if you can't setup an SSH target: [command]"

Thanks!

↧

questions about memory access/hotspots analysis and sample exe files.

May 20, 2020, 5:49 am

Latest and popular articles on Intel Technologies

≫ Next: Unable to profile program with clang-9 and libc++ with VTune

≪ Previous: VTune manual remote command run parameters and local analysis ?

Hi,

In order to learn how Vtune works, I am trying to analyze assembly commands’ running time of the pre-build "matrix.exe". I have several problems:

By performing memory access analysis I receive running times of matrix.exe’s functions but cannot see some of the function’s name (e.g some of the matrix.exe functions appear as func@0X1400345 while other functions appear with their actual name “init”, “multipy”, etc..)
By performing hotspots analysis, the output indeed contains all the functions names. However, now all the running times are ‘0’ or “unknown”

I would like to know what I am doing wrong and how to fix it.

Another question - I want to demonstrate Vtune's capabilities to my colleagues and having problem to analyze my original C++ exe files. I would like to know where I can find existing files just like "matrix.exe" that by performing the analysis on them I will probably receive reasonable and informative results.

Thanks!

↧

Unable to profile program with clang-9 and libc++ with VTune

May 21, 2020, 5:48 am

Latest and popular articles on Intel Technologies

≫ Next: How do I open a case?

≪ Previous: questions about memory access/hotspots analysis and sample exe files.

Steps to reproduce -

compile the following program:

#include <iostream>

int main() {
    std::cout << "Hello World"<< std::endl;
    return 0;
}

with

clang++-9 hello_world.cpp --stdlib=libc++  -g -O2 -o hello_world.exe

Attempting to profile with VTune results in the following error

Source/pin/elfio/img_elf.cpp: ProcessSectionHeaders: 809: unknown section type 0x6fff4c04 for sec[124,.deplibs] in /usr/lib/x86_64-linux-gnu /libc++.so.1

# Product Version

Intel VTune Profile 2020 Update 1

Product Build 607630

# System info:

Collection and Platform Info
Application Command Line: /home/andrew/projects/fuel3d/work/hello_world.exe
Operating System: 5.3.0-53-generic NAME="Ubuntu"

VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"

CPU
Name:   Intel(R) microarchitecture code named Coffeelake
Frequency:   3.6 GHz
Logical CPU Count:   16

↧