Profiling with Perf and HotSpot
For years I used the valgrind tool suite to find memory bugs but also for profiling with toolboxes like callgrind and cachegrind. And while those are indeed super powerful and I wouldnt want to miss em they are really slow and not particularly well suited for big production runs.
Perf is a profiler tool for Linux, and is part of the kernel tools. It is a great tool to take sampling based statistics about threaded model runs.
using it is straight forward, in my case for example:
perf record --call-graph dwarf -o /tmp/perf.data -- mpirun -np 2 bin/ex_plex_rrtmg_fish -out /tmp/o.h5 -solver 5_8 -skip_load_LUT -Nx 16 -Ny 17 -Nz 20 -dx 1e3 -dz 5e2
which records performance data, saves the call-graph, puts the output performance data onto a local partition as to not overwhelm the home directory and runs a model example with 2 mpi processes.
You could use the perf report tool but in this case I prefer a GUI to investigate the data. There are tools to convert perf.data so as to read it into kcachegrind which I am familiar with to read valgrind output but I couldnt get them to run. Instead I stumbled over HotSpot which is also really nice but can be a pain to get running. To make that process easier, I grabbed the AppImage from https://appimage.github.io/hotspot/.
Download the AppImage, make it executable with
chmod +x Hotspot.AppImage
./Hotspot.AppImage
and you are good to go.