BlogHeaptrack - A Heap Memory Profiler for Linux Syndicate content

Tue, 12/02/2014 - 19:25

Hello everyone,

with a tingly feeling in my belly, I’m happy to announce heaptrack, a heap memory profiler for Linux. Over the last couple of months I’ve worked on this new tool in my free time. What started as a “what if” experiment quickly became such a promising tool that I couldn’t stop working on it, at the cost of neglecting my physics masters thesis (who needs that anyways, eh?). In the following, I’ll show you how to use this tool, and why you should start using it.

A faster Massif?

Massif, from the Valgrind suite, is an invaluable tool for me. Paired with my Massif-Visualizer, I found and fixed many problems in applications that lead to excessive heap memory consumption. There are some issues with Massif though:

  • It is relatively slow. Especially on multi-threaded applications the overhead is large, as Valgrind serializes the code execution. In the end, this sometimes prevents one from using Massif altogether, as running an application for hours is unpractical. I know that we at KDAB sometimes had to resort to over-night or even over-weekend Massif sessions in the hope to analyze elusive heap memory consumption issues.
  • It is not easy to use. Sure, running valgrind --tool=massif <your app> is simple, but most of the time, the resulting data will be too coarse. Frequently, one has to play around to find the correct parameters to pass to --depth, --detailed-freq and --max-snapshots. Paired with the above, this is cumbersome. Oh and don’t forget to pass --smc-check=all-non-file when your application uses a JIT engine internally. Forget that, and your Massif session will abort eventually.
  • The output is only written at the end. When you try to debug an issue that takes a long time to show up, it would be useful to regularly inspect the current Massif data. Maybe the problem is already apparent and we can stop the debug session? With Massif, this is not an option, as it only writes the output data at the end, when the debugee stops.

With these issues in mind, I often wondered whether there isn’t a better alternative. To track the heap memory consumption, all we need to track are the calls to the allocation functions like malloc and free. The rest of what Valgrind is doing is not required, so shouldn’t it be possible to write a custom tracker with the help of the LD_PRELOAD trick which solves the issues above? But we need to get backtraces, and that quickly, as malloc & friends are often called extremely often. How is that possible?

The Shoulders of Giants

For a long time, I did not know any solution to the backtrace problem. But early this year, a colleague of mine told me that vogl also uses the LD_PRELOAD trick to overload the OpenGL functions and has the ability to grab backtraces. Apparently it was also quite efficient, so I had a look at what its doing and indeed, there I found my holy grail: libunwind, paired with a patched libbacktrace and some (to me) esoteric Linux C APIs. Combined, this makes it possible to efficiently grab backtraces with libunwind and delay the DWARF debug symbol interpretation until a later time. Without the example code in vogl, I’d never come up with this - so many thanks to Valve for releasing the source code on GitHub!

Introducing heaptrack

From here on, the rest was mostly plumbing. heaptrack consists of five parts:

  • libheaptrack_preload.so: The shared library that is injected into the debugee application using the LD_PRELOAD trick. It overloads malloc & friends, grabs a backtrace of raw instruction pointers with unw_backtrace and writes it all to file specified via the DUMP_HEAPTRACK_OUTPUT environment variable. Additionally, dlopen and dlclose are overwritten and trigger the collection of runtime information on shared libraries, which is required to later translate the instruction pointer addresses with DWARF debug information. Finally, a timer is also started with allows us to correlate allocations and memory consumption with real time.
  • libheaptrack_inject.so: Similar to the preload variant, this library is used for runtime-attachement to an existing process. Frequently, I found myself wondering why suddenly an application’s heap memory consumption increases. Neither massif, nor any other tool I know of, can do runtime-attaching, but heaptrack can now do this!
  • heaptrack_interpret: This process reads the output of libheaptrack.so over stdin and annotates the instruction pointer addresses with DWARF debug symbols with the help of libbacktrace. The annotated data stream is then sent to stdout. I recommend gzip‘ing it to save some disk space, as the data files can easily consume hundreds of megabytes otherwise. The resulting data file is then “final”, meaning you can transfer it to any other machine as no further processing is required that is machine dependent.
  • heaptrack: To simplify the process, there is a small shell script which combines the first two tools. It launches the arguments passed to it as a process with the correct LD_PRELOAD environment. The output of libheaptrack.so is directly transmitted to a heaptrack_interpret process with the help of mkfifo. And the heaptrack_interpret output finally is compressed on the fly and stored to disk. This is the tool you want to use:

    1. $ heaptrack yourapp [your arguments...]
    2. starting application, this might take some time...
    3. output will be written to /home/milian/heaptrack.yourapp.12345.gz
    4. ...
    5. Heaptrack finished! Now run the following to investigate the data:
    6.  
    7. heaptrack_print /home/milian/heaptrack.yourapp.12345.gz | less
  • heaptrack_print: Similar to ms_print, this process analyzes the output of heaptrack_interpret. It has many features, which I’ll outline below. You can run it at any time on the output file that heaptrack creates, and it supports transparent decompression of gzip‘ed files. The output is written directly to the CLI, which is often cumbersome to interpret. I plan to work on a proper heaptrack-visualizer in the future.

The temporary file format of the libheaptrack.so output, as well as the permanent one by heaptrack_interpreted is currently undocumented. It’s plain text though and should be easy to decipher, esp. with the source code at hand.

Note that heaptrack, contrary to Massif, does not do any aggregation of the data. It only minimizes the data files by not printing the same backtrace information repeatedly. But each individual malloc or free call, together with the function arguments, will be tracked. This allows some extremely interesting insights into the heap usage of a debugee, as we can later analyze the data to find all of the following:

  • heap memory consumption: this is what Massif does, and often the most interesting
  • number of calls to allocation functions: usually you’d need a profiler like Valgrinds callgrind to figure out where you frequently allocate memory. Heaptrack gives you that information as well, and much quicker. I used this data already in many places to get rid of temporary memory allocations. This is extremely worthwhile, as not only are memory allocations relatively slow, your performance also benefits from “secondary” effects: when you reuse memory, the chances are much higher that it is cached already, and cache-misses are often the biggest slow-down of current applications.
  • total amount of memory allocated, ignoring deallocations: Not so useful, but sometimes interesting and nicely accompanies the call count data to find temporary memory allocations
  • leaked memory: Even without the fancy analysis of Valgrind’s memcheck tool to distinguish between still reachable, possible and definitely lost memory, heaptrack can give you a quick look at what memory has not been freed when the debugee stopped.
  • histogram of allocation sizes over the number of calls: So far one can only
  • …: Your ideas are welcome - I’m confident that many more insights can be found from heaptracks data.

NOTE: Just like other profilers and tools, heaptrack relies on the DWARF debug information in your application. If you try to analyze a stripped release build without debug symbols, you’ll have a hard time making sense of it.

Using heaptrack_print

Assume we have run heaptrack on an application and now want to evaluate the obtained data. heaptrack_print is the tool to do that, but it’s relatively cumbersome to use (plain ASCII output, not even an ncurses GUI!). Thus, I explain the output here, such that you can make sense of it. Do take a look at the --help output as well.

Calls to Allocation Functions

Enabled by default, disable via -a / --print-allocators 0.

The output below the MOST CALLS TO ALLOCATION FUNCTIONS header is a list of the top 10 locations that call memory allocation functions. The format, by default, is merged, e.g., for code similar to this:

  1. void asdf() { new int; }
  2. void bar() { asdf(); }
  3. void laaa() { bar(); asdf(); }

will produce output like this, when laa() is called ten times from main():

  1. MOST CALLS TO ALLOCATION FUNCTIONS
  2. 11 calls to allocation functions with 44B peak consumption from
  3. asdf()
  4. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:24
  5. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  6. 10 calls with 40B peak consumption from:
  7. bar()
  8. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:36
  9. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  10. laaa()
  11. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:41
  12. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  13. main
  14. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:103
  15. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  16. 1 calls with 4B peak consumption from:
  17. bar()
  18. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:36
  19. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  20. laaa()
  21. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:41
  22. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  23. main
  24. at /ssd/milian/projects/kde4/heaptrack/tests/test.cpp:105
  25. in /ssd/milian/projects/.build/kde4/heaptrack/tests/test_cpp
  26. ...

Here, the backtraces are merged on the location of the new int allocation in asdf(), and all sub-traces are displayed beneath. Since heaptrack_print sorts the data, you can just read its output from the top to find the top 10 hotspots of allocation functions. You can disable backtrace merging with -m / --merge-backtraces 0.

Peak Memory Consumption

Enabled by default, disable with -p / --print-peaks 0.

To decrease your memory consumption, you need to decrease the peak memory consumption. Under the PEAK MEMORY CONSUMERS caption, heaptrack_print shows the top ten hotspots, sorted by the peak size in bytes. This can look e.g. like this:

  1. PEAK MEMORY CONSUMERS
  2. 3.98MB peak memory consumed over 37473 calls from
  3. QString::realloc(int)
  4. in /usr/lib/libQtCore.so.4
  5. 1.04MB over 4 calls from:
  6. QString::append(QString const&)
  7. in /usr/lib/libQtCore.so.4
  8. 0x7fa9ce54bf73
  9. in /usr/lib/libQtCore.so.4
  10. 0x7fa9ce54c5ee
  11. in /usr/lib/libQtCore.so.4
  12. QTextStream::readAll()
  13. in /usr/lib/libQtCore.so.4
  14. Kate::Script::readFile(QString const&, QString&)
  15. at /ssd/milian/projects/kde4/kate/part/script/katescripthelpers.cpp:82
  16. in /ssd/milian/projects/compiled/kde4/lib/libkatepartinterfaces.so.4
  17. Kate::Script::require(QScriptContext*, QScriptEngine*)
  18. at /ssd/milian/projects/kde4/kate/part/script/katescripthelpers.cpp:289
  19. in /ssd/milian/projects/compiled/kde4/lib/libkatepartinterfaces.so.4
  20. 0x7fa9bac7d228
  21. in /usr/lib/libQtScript.so.4
  22. ...
Massif Compatibility

Pass a file path to -M / --print-massif. Tune output with --massif-threshold and --massif-detailed-freq.

heaptrack_print, since yesterday, also supports converting the heaptrack data to the Massif file format. This can then be visualized with my Massif-Visualizer. The resulting files are relatively large as much more detailed snapshots are included. I optimized the visualizer a bit as well to speed up the evaluation of these files. It is worth it though! Since the time axis uses real time, it is much easier to correlate to the actual runtime behavior of your application (note: you can configure Massif to also use “real time”, but due to its high overhead, the results are still confusing and not much different to the instruction count). The higher level of detail also makes it simpler to interpret the results. Note though, that the converter currently has no code to ensure the peak is not missed, which can be seen in the images below. I plan to add this eventually.


heaptrack


Massif

Comparison of heaptrack and Massif on the same work load shows the much higher level of detail. Overall, the results are compatible, but note that heaptrack uses real time whereas Massif defaults to instruction count for the abscissa time axis. Also, the Massif file generated by heaptrack currently misses the peak, which is accurately tracked by Massif.

Memory Leaks

Disabled by default, enable with -l, --print-leaks 1.

The leaks reported by heaptrack are simply all calls to malloc & friends which where never free‘ed afterwards. It is not possible to do a “still reachable” or “possibly lost” analysis as Valgrind’s memcheck tool does. Still, it is often quite helpful. Note though that it does not support suppression files, which is crucial here as otherwise you’ll often see leaks reported inside libc and other external libraries which are often intentional.

Size Histogram

Disabled by default, enable by passing an output file to -H, --print-histogram.

The size histogram gives an insight into whether you potentially could benefit from a pool allocator or similar optimization technique. heaptrack_print just writes the raw data to the output file you specify. With octave or gnuplot, you can then evaluate this manually, yielding a graph such as the following:


Double-logarithmic size histogram obtained from a heaptrack run on Kate.

Note how many allocations below 8 byte are done by this application. All of these waste memory space, as the value itself could easily be stored in the space required for a single pointer on a 64 bit machine. For those interested, most of these allocations come from small strings, since Qt’s QString class has no small-string optimization (yet, planned for Qt 6). In the future, the heaptrack data could be analyzed such that it directly points you to the culprits of such memory wastes.

Try it out

So far, I developed this tool mostly to scratch my own itch. I demoed it to some colleagues, but until yesterday, some essential features where missing. Now, I think, it is ready for a wider audience. If you are interested, try it out - I’m interested in your feedback:

  1. git clone git://anongit.kde.org/heaptrack
  2. cd heaptrack
  3. mkdir build
  4. cd build
  5. cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
  6. make install

This should be all that is required to get heaptrack up and running. It depends on Boost (for heaptrack_print and heaptrack_interpret) and a recent libunwind (for libheaptrack.so). If in doubt, compile libunwind also from source, as I fixed one significant performance issue there on my platform. Thus, if heaptrack is extremely slow, please try to update libunwind first. Also note that I have another patch for libunwind in the pipeline to increase the DWARF cache, which improves the runtime performance of heaptrack even futher.

Furthermore I took the liberty of leveraging C++11 features wherever I needed it. You will need a recent compiler to build heaptrack. CMake should tell you if your compiler is too old.

Also note again that this tool currently only works on Linux. With some work, it might be able to port it to other Unixoid platforms. Personally, I won’t spent time on this, as it is not worth it for me. I develop cross-platform Qt applications, and can thus easily investigate the memory consumption on a Linux host.

Platform wise, I build and tested this code only on X86-64 platforms. I hope it also works fine on 32Bit x86, as well as ARM, but I’ll have to test it.

A note on Performance
CPU Overhead

I do not have any reliable benchmark, but I still want to share some rough estimates on the overhead of heaptrack and compare it with Massif. In heaptrakcs source tree, you can find e.g. tests/threaded.cpp, which allocates and frees memory repeatedly and in parallel with multiple threads. With perf stat, we can estimate the worst-case overhead of heaptrack with this test:

Baseline
  1. $ perf stat -r 5 ./tests/threaded
  2.  
  3. Performance counter stats for './tests/threaded' (5 runs):
  4.  
  5. 147.544073 task-clock (msec) # 1.736 CPUs utilized ( +- 6.18% )
  6. 563 context-switches # 0.004 M/sec ( +- 6.12% )
  7. 735 cpu-migrations # 0.005 M/sec ( +- 0.78% )
  8. 910 page-faults # 0.006 M/sec ( +- 4.38% )
  9. 235,074,081 cycles # 1.593 GHz ( +- 10.63% ) [71.15%]
  10. <not supported> stalled-cycles-frontend
  11. <not supported> stalled-cycles-backend
  12. 156,034,336 instructions # 0.66 insns per cycle ( +- 6.64% ) [91.21%]
  13. 35,155,936 branches # 238.274 M/sec ( +- 3.27% ) [90.29%]
  14. 366,564 branch-misses # 1.04% of all branches ( +- 8.44% ) [86.60%]
  15.  
  16. 0.084972509 seconds time elapsed ( +- 6.20% )

Averaged over five runs, this test finishes in less than 100ms and roughly 150 million instructions are executed.

heaptrack
  1. $ perf stat -r 5 heaptrack ./tests/threaded
  2.  
  3. Performance counter stats for 'heaptrack ./tests/threaded' (5 runs):
  4.  
  5. 2126.580121 task-clock (msec) # 2.120 CPUs utilized ( +- 4.10% )
  6. 60,137 context-switches # 0.028 M/sec ( +- 10.29% )
  7. 4,853 cpu-migrations # 0.002 M/sec ( +- 7.48% )
  8. 106,589 page-faults # 0.050 M/sec ( +- 0.11% )
  9. 5,398,514,290 cycles # 2.539 GHz ( +- 4.01% ) [55.40%]
  10. <not supported> stalled-cycles-frontend
  11. <not supported> stalled-cycles-backend
  12. 5,403,905,664 instructions # 1.00 insns per cycle ( +- 1.23% ) [87.56%]
  13. 1,154,188,099 branches # 542.744 M/sec ( +- 1.58% ) [75.83%]
  14. 22,868,779 branch-misses # 1.98% of all branches ( +- 4.20% ) [76.43%]
  15.  
  16. 1.003186573 seconds time elapsed ( +- 2.40% )

With heaptrack, the test application runs considerably slower. According to perf stat, it is approximately ~12 times slower. Furthermore, we are not executing ca. 5.4 billion instructions., have many more page-faults etc. pp.

Massif
  1. $ perf stat -r 5 valgrind --tool=massif ./tests/threaded
  2.  
  3. Performance counter stats for 'valgrind --tool=massif ./tests/threaded' (5 runs):
  4.  
  5. 2589.948615 task-clock (msec) # 1.022 CPUs utilized ( +- 0.21% )
  6. 11,318 context-switches # 0.004 M/sec ( +- 1.23% )
  7. 7,168 cpu-migrations # 0.003 M/sec ( +- 1.73% )
  8. 8,856 page-faults # 0.003 M/sec ( +- 0.18% )
  9. 6,178,853,885 cycles # 2.386 GHz ( +- 0.19% ) [50.14%]
  10. <not supported> stalled-cycles-frontend
  11. <not supported> stalled-cycles-backend
  12. 9,692,798,770 instructions # 1.57 insns per cycle ( +- 0.69% ) [81.01%]
  13. 2,311,709,276 branches # 892.570 M/sec ( +- 0.26% ) [77.11%]
  14. 29,323,133 branch-misses # 1.27% of all branches ( +- 0.38% ) [77.79%]
  15.  
  16. 2.534584622 seconds time elapsed ( +- 0.27% )

With Massif, the situation is even worse. It synchronizes all threads, as evidenced by the task-clock report which shows that only once CPU is utilized. Overall, it is roughly 2.5 times slower than heaptrack and also executes nearly twice as many instructions.

This result is quite promising in favor of heaptrack. In many other tests, the test applications also feel much more fluent when running under heaptrack compared to Massif. But YMMV and take this with a grain of salt.

Memory Overhead

Be also aware that heaptrack not only slows down your application, but also adds a considerable memory overhead, both in-process (libheaptrack.so) as well as out-of-process (heaptrack_interpret). In a non-scientific measurement of the memory consumption of kwrite showing a medium sized text file, I acquired the following numbers for the total memory used after the file is loaded:

  • Baseline: 26.1MB
  • heaptrack: 39.3MB + 19.2MB = 58.5MB
  • Massif: 264.9MB

So again, heaptrack seems to be significantly leaner compared to Massif, but YMMV.

What’s left to do?

I will probably not spent much more time on heaptrack in the coming months, but rather hope to finally be able to concentrate on finishing my studies. Mid-term next year, after a long vacation, I then plan to start working on the following (if noone beat me to it until then):

  • do a proper release: I plan to move this tool to KDE’s extragear and undergo a code review. Once that is settled, I will release a first version and hope for packagers to distribute it.
  • heaptrack-gui: Generating massif.out files and looking at them in my Massif-Visualizer is nice, but inefficient and only shows a fraction of the data we have available. Thus, a proper GUI application is required to show all of the data in a heaptrack output file. Additionally, it could visualize the data as it comes in, giving you the ability to track the heap behavior of your application in real-time!
  • public API: heaptrack does not support custom allocators yet. To support this, a simple API could be added similar to Valgrind’s Client Request API.
  • I/O profiling etc.: The technique used for heap profiling can also be used to profile I/O, mutex lock contention and more.

Note that stack memory consumption cannot be profiled in this way. Use Massif for that, if you need to look at that.

Reinventing the Wheel

Initially, I thought heaptrack is unique in what it does. Over the time I realized that this is not quite the case. Google’s gperftools has a similar tool, and there is libmemusage.so and many others like it. Thankfully, none of them gives as much data as heaptrack, while still being efficient. So my time was not wasted. And I learned a ton in the process. I invite everyone to inspect my code and give suggestions. It is so far only about ~1.6kloc of code, but probably a bit lacking on the documentation side. I’ll improve this over time, I hope.

I also tried to implement this tool with perf probe, but could not get it to work reliably. The perf script support is still lacking the ability to run native code, which is crucial here to get high performance. Additionally, perf requires root access in order to use user-space probes on e.g. malloc and friends in libc.so. This is not practicable - heaptrack and LD_PRELOAD work as-is just fine.

Thanks

To wrap up this lengthy blog post, I want to express my deepest gratitude again to all those who made this tool possible. In no particular order:

  • Julian Seward and the Valgrind team: This tool suite will always remain invaluable to me. Without it, I’d never been able to cross-check the results of heaptrack reliably. For me, while I’ll probably use Massif less and less, I will still be use it to verify that the results obtained by heaptrack are correct. And the error-checking tools in the Valgrind suite, like memcheck, helgrind or drd are still unmatched in their quality.
  • Michael Sartain, Peter Lohrman and Valve: Without the code in vogl, I’d still be out on the hunt for an efficient scheme to obtain backtraces, or would be clueless how to translate the instruction pointer addresses with DWARF debug symbols.
  • Arun Sharma, Lassi Tuura and the libunwind team: The core tool to actually get the backtrace. Many thanks for this fast, easy-to-use library!
  • GCC team: Not only an excellent compiler, but also the source of the libbacktrace library, which does the heave ELF/DWARF heavy lifting to translate raw instruction pointer addresses.
  • My colleagues at KDAB: Fruitful discussions with them lead to the solution for many of my problems over the last months. And thanks to the pre-alpha testing!
  • all future contributors: Patches welcome! :)

UPDATE 09/12/14: Please use the new, official, heaptrack repository at git://anongit.kde.org/heaptrack. My scratch repository has been updated to notify users about this change. UPDATE 10/12/14: Updated memory “benchmark” data, to reflect recent changes in heaptrack_interpret leading to significantly lower memory overhead.

Comments

Having trouble Tue, 04/26/2016 - 17:24 — Anonymous

Having trouble building

CMake Error at /usr/lib64/boost/Boost.cmake:536 (message): The imported target “boost_date_time-static” references the file

  1. "/usr/lib64/lib64/libboost_date_time.a"

but this file does not exist. Possible reasons include:

  • The file was deleted, renamed, or moved to another location.

  • An install or uninstall procedure did not complete successfully.

  • The installation package was faulty and contained

    “/usr/lib64/boost/Boost.cmake”

    but not all the files it references.

Call Stack (most recent call first): /usr/lib64/boost/BoostConfig.cmake:28 (include) /opt/cmake/cmake-3.5.2-Linux-x86_64/share/cmake-3.5/Modules/FindBoost.cmake:245 (find_package) CMakeLists.txt:16 (find_package)

What distro is that? Sounds Thu, 04/28/2016 - 11:34 — Milian Wolff

What distro is that? Sounds like a setup issue with CMake / Boost on your system - completely unrelated to heaptrack itself.

Looking forward to trying Tue, 04/26/2016 - 15:13 — Anonymous

Looking forward to trying this out but I cannot get your project:

$ git clone git://anongit.kde.org/heaptrack Initialized empty Git repository in /opt/heaptrack/.git/ anongit.kde.org[0: 31.216.41.69]: errno=Connection timed out anongit.kde.org[0: 5.39.11.196]: errno=Connection timed out fatal: unable to connect a socket (Connection timed out)

Odd, sounds like a temporary Tue, 04/26/2016 - 15:15 — Milian Wolff

Odd,

sounds like a temporary network glitch, maybe in one of the KDE git mirrors. Please try again. I just tested it and it does work for me right now… Worst-case you can also refer to the Github mirror at https://github.com/kde/heaptrack

thank you https worked Tue, 04/26/2016 - 15:41 — Anonymous

thank you https worked fine…likely a firewall issue - am inside my companies network

I really enjoy using Thu, 02/25/2016 - 12:52 — Anonymous

I really enjoy using heaptrack, thanks a lot for it! I am currently looking at “memory leak” problems where the memory is actually wrapped safely in shared_ptrs but somehow the shared_ptrs never go out of scope and thus the memory is not freed. Is it possible to find out which reference is still around using heaptrack somehow? By exchanging ref/deref functions?

Thanks for the feedback! But Thu, 02/25/2016 - 13:33 — Milian Wolff

Thanks for the feedback!

But I’m afraid to tell you that currently I have no plan how to extend heaptrack to track reference counting bugs. I completely agree that it would be really nice to have in, but I think it would need a separate tool, or a separate view in the gui, to display these issues. And of course first we must get the data… Overloading C++ symbols via LD_PRELOAD is possible 1, albeit ugly, and also only works for non-inline exported symbols afaik. None of this is probably true for the shared_ptr refing code… What would be possible of course is to custom-patch the STL header to call an extern C function for every ref/deref, then hijack that in heaptrack. It’s doable, but lots of work of course.

If you are interested , I’d certainly be available to mentor you to write a reftrack tool which leverages a large part of the existing heaptrack code to grep backtraces etc. pp.

I could probably use template Thu, 02/25/2016 - 14:37 — Anonymous

I could probably use template specialization or the fact that all shared pointers shared_ptr<foo> are actually typedef as fooPtr in this code to trigger some special function as well. But there is no easy way to request a stacktrace dump from heaptrack I guess?

if you just want to (ab)use Thu, 02/25/2016 - 17:34 — Milian Wolff

if you just want to (ab)use the existing code, you’d have to do the following:

  • introduce a libheatrack.so which only wraps libheaptrack.{cpp,h}
  • link your code against that library
  • initialize heaptrack via heaptrack_init (see existing code for the preload and inject libs for how to do that). essentially you just want to give it a file name where to write the result file to
  • then call heaptrack_malloc and heaptrack_free for every ref/deref in your code
  • then pipe the result file through heaptrack_interpret and save the final result, e.g. something like

    1. path/to/heaptrack_interpret < yourfile.trace | gzip > reftrack.gz
  • beware: the result file can be quite large. if that is an issue, adapt heaptrack.sh.cmake to your needs. It does the above on-the-fly to save disk space.

  • then open reftrack.gz in heaptrack_gui

If you need more help, let us please continue this discussion via email (blog comments are tedious imo). You can reach me via mail@milianw.de.

Thanks!

I have a program that Fri, 02/05/2016 - 17:45 — Anonymous

I have a program that allocates space for large data sets processes them and deletes them. It also keeps some small amount of data in memory about files that have been processed. What I think is happening is that some of the meta data is being allocated at the top of the heap which means that when the large data sets are deleted the heap cannot shrink.

Does HeapTrack (or any other heap analysis tool) have the ability to help me find which allocations are at the top of the heap and preventing freed memory being returned to the system?

One could add such features, Fri, 02/05/2016 - 17:52 — Milian Wolff

One could add such features, but as of now, no - heaptrack won’t easily answer this question. I do hope to add this capability in the future though, but it may be a bit problematic to access the data about loaded pages efficiently, maybe parsing /proc/<pid>/smaps for every timestamp or some such…. Anyhow, some notes to your problem, or how I understand it:

a) Most malloc implementations will use separate pages for large allocations and keep small allocations out of such regions. Thus once the large area is freed, it should be handed back to the kernel. If I understood you correctly, you have a pool allocator allocating large chunks, right? Once you freed that large chunk, it should thus be freed again. b) If you have small allocations otoh, and lots of them, you can easily run into https://sourceware.org/bugzilla/show_bug.cgi?id=14827. Can you try to add a call to malloc_trim(0) to your code - does that have a significant affect? c) If none of this helps, have a look at the output from malloc_info, it will output metrics about the different arenas, and what it thinks is fragmented etc. pp. That should answer your question.

HTH

Is it possible to elaborate Mon, 12/21/2015 - 22:34 — Anonymous

Is it possible to elaborate compilation of gui part? I have tried to compile it on OpenSuse Leap 42.1, failed miserably while compiling kdiagram, kcoreaddons, ki18n, … etc. dependencies. Also would it be possible to compile gui part on Ubuntu LTS 14.04? Thanks a lot.

What exactly are your issues Wed, 12/23/2015 - 00:39 — Milian Wolff

What exactly are your issues when you compiled the dependencies?

Nevermind, I have managed to Mon, 12/28/2015 - 10:28 — Anonymous

Nevermind, I have managed to run gui part on ubuntu 14.04 LTS! For those whom are interested, here is what i did:

  1. sudo apt-get install libqt5opengl5-dev libpolkit-qt-1-dev libxslt1-dev libqt5x11extras5-dev qttools5-dev libsparsehash-dev libqt5svg5-dev qtscript5-dev

The problem is my distro comes with qt 5.2.1 and I didn’t want to update it myself so i had to checkout a commit before qt 5.3.0 update.

  1. git clone https://github.com/KDE/kcoreaddons.git
  2. git checkout f8e8360e33b052d2716167399f660bddeb6d2de6
  3.  
  4. git clone https://github.com/KDE/ki18n.git
  5. git checkout 7646dc9a9bda3b2f9dd0be1e9f1bf0169fba0710
  6.  
  7. git clone https://github.com/KDE/kitemmodels.git
  8. git checkout 8ce7f030ae22927d4f04e5dc5175abce6bd887e8
  9.  
  10. git clone https://github.com/KDE/threadweaver.git
  11. git checkout ddf1c4b7f64c33ad7fbd1b770cf0703c73b6275b
  12.  
  13. git clone https://github.com/KDE/kauth.git
  14. git checkout 4878a94b5e44b5dc79dd14efb7c21e28fb09eeea
  15.  
  16. git clone https://github.com/KDE/kcodecs.git
  17. git checkout dbbd4d0c3980d4fbdb989cec25fe39fa280e6298
  18.  
  19. git clone https://github.com/KDE/kconfig.git
  20. git checkout 30d5270305a196a452579e2a45068f5c744fee0c
  21.  
  22. git clone https://github.com/KDE/karchive.git
  23. git checkout 1816049a3316c9f93e9722d68bf007edcca0ec8c
  24.  
  25. git clone https://github.com/KDE/kdoctools.git
  26. git checkout 741507710068cfcd7ceaa2d331bbea92b32dbe61
  27.  
  28. git clone https://github.com/KDE/kguiaddons.git
  29. git checkout 8a774f8d9845f852af66771cb9cea897bbe34910
  30.  
  31. git clone https://github.com/KDE/kwidgetsaddons.git
  32. git checkout eaa8db04232e66f1cea7b9efccfad435c6c0fb60
  33.  
  34. git clone https://github.com/KDE/kconfigwidgets.git
  35. git checkout 7d98e905c5dc26e1adcc4a62be96dd305e0d06e6
  36.  
  37. git clone https://github.com/KDE/kdiagram.git

Thanks for this great project, Milian, it already helped me alot. Ali

Here’s the lazy version: set Tue, 03/15/2016 - 12:13 — Anonymous

Here’s the lazy version:

  1. set -e
  2.  
  3. PKG="
  4. kcoreaddons,f8e8360e33b052d2716167399f660bddeb6d2de6
  5. ki18n,7646dc9a9bda3b2f9dd0be1e9f1bf0169fba0710
  6. kitemmodels,8ce7f030ae22927d4f04e5dc5175abce6bd887e8
  7. threadweaver,ddf1c4b7f64c33ad7fbd1b770cf0703c73b6275b
  8. kauth,4878a94b5e44b5dc79dd14efb7c21e28fb09eeea
  9. kcodecs,dbbd4d0c3980d4fbdb989cec25fe39fa280e6298
  10. kconfig,30d5270305a196a452579e2a45068f5c744fee0c
  11. karchive,1816049a3316c9f93e9722d68bf007edcca0ec8c
  12. kdoctools,741507710068cfcd7ceaa2d331bbea92b32dbe61
  13. kguiaddons,8a774f8d9845f852af66771cb9cea897bbe34910
  14. kwidgetsaddons,eaa8db04232e66f1cea7b9efccfad435c6c0fb60
  15. kconfigwidgets,7d98e905c5dc26e1adcc4a62be96dd305e0d06e6
  16. kdiagram,master
  17. "
  18.  
  19. for i in $PKG; do
  20. IFS=","; set $i;
  21. git clone https://github.com/KDE/$1.git
  22. cd $1
  23. git checkout $2
  24. mkdir build && cd build
  25. cmake ..
  26. make -j8
  27. sudo make install
  28. cd ../..
  29. done

Btw, I did regular cmake & Mon, 12/28/2015 - 10:36 — Anonymous

Btw, I did regular cmake & make & make install for above all projects, nothing special. For example:

  1. git clone https://github.com/KDE/kcoreaddons.git
  2. cd kcoreaddons
  3. git checkout f8e8360e33b052d2716167399f660bddeb6d2de6
  4. mkdir build && cd build
  5. cmake ..
  6. make
  7. sudo make install

It would be very good if someone can make a debian package for it. Ali

Hi there! First of all, Mon, 11/16/2015 - 13:52 — Anonymous

Hi there!

First of all, great project - looks very promising! Can’t wait to try it. However, I get the following error after building heaptrack on Fedora 18, x86_64 (don’t ask why i’m using F18, not my choice):

“starting application, this might take some time… ERROR: ld.so: object ‘/usr/local/lib/heaptrack/libheaptrack_preload.so.1.0.0’ from LD_PRELOAD cannot be preloaded: ignored.”

Permissions look fine. I’m using Qt 5.5, but don’t think this causes an issue. Any idea what’s wrong?

Hm I’ve never seen this Mon, 11/16/2015 - 14:05 — Milian Wolff

Hm I’ve never seen this issue. My only guess would be that you are mixing ABIs, could you compare the output of file on the libheaptrack_preload.so.1.0.0 and on your application binary? Maybe it’s a mixture of 32bit and 64bit?

Hi, Is the —massif option to Sat, 11/14/2015 - 16:56 — Anonymous

Hi,

Is the —massif option to heaptrack_print supposed to be so slow? I have a 7.2 MB heaptrack .gz output file, and so far heaptrack_print has been chugging on it for 17 minutes and counting. (It seems that judging from the file offset, it’s about 2/3 there.)

You can speed it up by Sat, 11/14/2015 - 19:10 — Milian Wolff

You can speed it up by increasing the detailed frequency:

  1. heaptrack_print --massif-detailed-freq 100 ...

Massif’s text format is super inefficient to create. Try heaptrack_gui, which is much faster, if you have access to the KF5 dependencies to build it.

Cheers

heaptrack_gui was much Sun, 11/15/2015 - 02:29 — Anonymous

heaptrack_gui was much faster, yes. Unfortunately it needed something like 400 MB of KDE dependencies for showing a graph and a tabview :-) And even though load time is no longer measured in hours, it’s still measured in minutes. (I had a run that required 20 minutes or so.)

Perhaps it would be faster if the heaptrack format was in binary instead of requiring repeated regex application for parsing?

Hey there, can you tell me Sun, 11/15/2015 - 16:09 — Milian Wolff

Hey there,

can you tell me what dependencies make up the 400MB? I wouldn’t expect such a large size impact for the few frameworks I use. Maybe you installed the full KF5/plasma environment? That is not required, only these KF5 packages and their dependencies and devel packages are required:

  1. CoreAddons I18n ItemModels ThreadWeaver ConfigWidgets

For the charts you additionally need KDiagram/KChart which also only has minimal Qt 5 dependencies. On Arch, this amounts to less than 200MB, including Qt 5 and its dependencies. Excluding Qt 5, it’s just ~20MB.

Regarding regex parsing: I’m not doing any regular expression parsing anywhere. Why did you think I do that?

KDiagram does not have Wed, 01/06/2016 - 03:05 — Anonymous

KDiagram does not have minimal dependencies at all, really. I installed all the -dev packages I needed to compile it (on Debian); that’s where the 400 MB number comes from. The KF5 packages alone are 40 MB. Then you need Boost, which is 230 MB including all dependencies I didn’t have already… the list goes on. Of course, you are free to use whatever dependencies you want to (it is your software, after all), but it feels overkill for what the GUI actually seems to be doing.

The reason why I think you’re doing regex parsing is that regex functions showed up really high when I profiled heaptrack_gui to figure out what was taking so much time. (This was during one of the really long loads; I don’t have that data around anymore.) Perhaps it’s an indirect call somehow?

KDiagram only requires Qt 5 Wed, 01/06/2016 - 14:21 — Milian Wolff

KDiagram only requires Qt 5 and extra-cmake-modules. It does not require boost. Heaptrack itself does use some boost in a few places, outside of the GUI part that uses Qt 5.

Anyhow, if you have suggestions on how to improve the situation then I’m all ears. But currently, the file size of build dependencies is really of no concern to me.

Regarding regex hotspot: Please show me a callgraph that you got from your profiler. Also feel free to grep both heaptrack and kdiagram for regular expression classes (QRegExp, QRegularExpression) - they are not used. And I’ve profiled heaptrack_gui a lot in order to make it faster - never have I seen regular expressions pop up anywhere, let alone as a hotspot.

Wait, the default is with no Sun, 11/15/2015 - 11:37 — Anonymous

Wait, the default is with no optimization? You need to add an incantation (-DCMAKE_BUILD_TYPE=RelWithDebInfo) to get optimization, and it will hide the compile line from you by default unless you give it VERBOSE=1? No wonder there’s cmake hate in the comments…

I’ve now pushed a commit that Sun, 11/15/2015 - 15:57 — Milian Wolff

I’ve now pushed a commit that defaults to RelWithDebInfo when no CMAKE_BUILD_TYPE is specified.

The GUI seems to depend on a Mon, 10/26/2015 - 04:03 — Anonymous

The GUI seems to depend on a bunch of KDE stuff that’s not available for Ubuntu 14.04. Is it feasible to get it working, or should I just learn to love the cli?

Also, to offset some of the Mon, 10/26/2015 - 04:12 — Anonymous

Also, to offset some of the negativity in the comments beneath mine: thanks for making a great tool! Yeah, it’s a little tricky to build the GUI, but that should improve with time. I might just use massif until then, but I’d never have known about massif either if it weren’t for you.

Note that you can also Tue, 10/27/2015 - 12:48 — Milian Wolff

Note that you can also generate flame graphs with heaptrack_print. Alternatively, you can push the heaptrack.FOO.PID.tgz files to another machine with a more modern Linux distributions where you have access to the required packages to build heaptrack_gui.

Cheers

I don’t know how you managed Sun, 10/25/2015 - 19:38 — Anonymous

I don’t know how you managed to build this thing. You require .cmake files for QT5, but Debian does not provide them, not even with the qt5-dev package. (and not with cmake-data either). CMake is a terrible idea. It’s even worse than SCons. Thanks to you, I know never to use it in any of my projects.

A bit of googling helps: Tue, 10/27/2015 - 12:46 — Milian Wolff

A bit of googling helps: http://askubuntu.com/questions/374755/what-package-do-i-need-to-build-a-… Yes, this also applies to Debian.

You just had to use the Sun, 10/25/2015 - 18:49 — Anonymous

You just had to use the absolute latest version of QT instead of just going with QT4. Now, instead of just being able to do “cmake”, I have to spend hours fighting with Aptitude to get QT5 to install without deleting half my system first. Thanks a lot.

Exactly, I had to use Qt 5 Tue, 10/27/2015 - 12:47 — Milian Wolff

Exactly, I had to use Qt 5 which is available since 3 years now already. I also had to use it because it’s more fun than Qt 4. And I do this in my free time after all. You are welcome, glad that you like it as much as I do!

Resolving dependencies in Mon, 03/28/2016 - 09:12 — Anonymous

Resolving dependencies in Aptitude is NOT fun at all.

And I never was able to build your project, because the file “FindQt5.cmake” apparently only exists on your hard drive and nowhere else, and your project can’t build without it. I came back here because now I’m trying to explain why CMake sucks balls on Reddit.

Qt 5 itself ships the Tue, 03/29/2016 - 16:31 — Milian Wolff

Qt 5 itself ships the required cmake files, but it is not shipping FindQt5.cmake, but rather a Qt5/Qt5Config.cmake. See e.g. the contents of qtbase5-dev on Debian: https://packages.debian.org/search?searchon=contents&keywords=Qt5Config….

I’m pretty sure it’s similar on Ubuntu. So instead of blaming “CMake sucks balls”, maybe you should start understanding the tools and how they operate first.

Got it built and running Fri, 03/27/2015 - 03:03 — Anonymous

Got it built and running (somewhat)

Recursive hang in libunwind calling malloc (indirectly). Any ideas? On ubuntu 14.04 x64.

Do you have more input on how Sat, 04/18/2015 - 14:02 — Milian Wolff

Do you have more input on how to reproduce this issue? Maybe you can tell me what FOSS project you tried it on?

During installation process I Tue, 03/17/2015 - 19:12 — Anonymous

During installation process I had a problem with building heaptrack_print target (during execution ‘make install’ command) . It looked like multiple errors:

  1. /usr/local/include/boost/iostreams/filter/gzip.hpp:165: undefined reference to 'boost::iostreams::zlib::okay'
  2. /usr/local/include/boost/iostreams/filter/zlib.hpp:122: undefined reference to 'boost::iostreams::zlib::default_compression'

Solution for that was adding in {heaptrackRoot}/build/CMakeFiles/heaptrack_print.dir/link.txt to build component {boostRoot}/libs/iostreams/src/zlib.cpp and {boostRoot}/libs/iostreams/src/gzip.cpp`. Also adding linking zlib library via ‘-lz’ will be necessary.

Complete {heaptrackRoot}/build/CMakeFiles/heaptrack_print.dir/link.txt file for me looks like:

  1. /usr/bin/c++ -lz -std=c++11 -Wall -Wpedantic -O2 -g -DNDEBUG
  2. CMakeFiles/heaptrack_print.dir/heaptrack_print.cpp.o /home/kobak/boost_1_55_0/libs/iostreams/src/gzip.cpp
  3. /home/kobak/boost_1_55_0/libs/iostreams/src/zlib.cpp -o heaptrack_print -rdynamic /usr/local/lib/libboost_system.so
  4. /usr/local/lib/libboost_filesystem.so /usr/local/lib/libboost_iostreams.so /usr/local/lib/libboost_program_options.so
  5. -Wl,-rpath,/usr/local/lib:

After that ‘make install’ command is executing without errors. I hope above instruction will be useful for someone.

PS. Are you planning to add stack profiling to heaptrack?

Best regards, Rafal

I also had a lot of undefined Thu, 12/10/2015 - 15:24 — Anonymous

I also had a lot of undefined references in boost libraries and zlib at the step where heaptrack_print is build. My problem was the ABI incompatibility between gcc 5.2 and clang 3.6 in Ubuntu 15.10, as clang was set as default compiler. With gcc as default, set with

  1. sudo update-alternatives --config c++

and choosing g++, heaptrack compiled flawlessly.

This is odd, I explicitly Sat, 04/18/2015 - 14:08 — Milian Wolff

This is odd, I explicitly require the boost iostreams component and link against it. And for me libboost_iostreams.so.1.57.0 links dynamically to libz.so.1 already. What distro do you use? What does ldd say for you?

Regarding your PS, no stack profiling is out-of-scope for heaptrack. I see no way to achieve this with the current approach. If you need that, use massif from the valgrind suite.

Cheers

problem appears to be with Tue, 06/30/2015 - 00:14 — Anonymous

problem appears to be with compilation of libsharedprint in CMakeLists.txt.

I did the following and compilation succeeds without having to modify the generated link.txt file as the earlier poster did.

  1. removing the STATIC keyword from “add_library(sharedprint STATIC accumulatedtracedata.cpp)”,
  2. add the following line immediately after “add_library…” target_link_libraries(sharedprint ${Boost_LIBRARIES})

tested on ubuntu 14.04.2, libboost 1.54.0, gcc 4.8.2

tony

I’ve added a potential fix - Thu, 07/02/2015 - 23:19 — Milian Wolff

I’ve added a potential fix - could you test again please?

FWIW, I had the same issue. Fri, 06/26/2015 - 04:18 — Anonymous

FWIW, I had the same issue. Also in the new heaptrack_gui I had to do that as well as delete the option for no exceptions… Does heaptrack have a project page?

What kind of project page do Thu, 07/02/2015 - 23:52 — Milian Wolff

What kind of project page do you have in mind?

Doesn’t compile on armhf, Tue, 03/03/2015 - 14:17 — Anonymous

Doesn’t compile on armhf, ping me (tsdgeos) when you’re back from holidays if you read this :D

FWIW, there’s also Sat, 12/20/2014 - 09:39 — Anonymous

FWIW, there’s also https://github.com/jrfonseca/memtrail but I confess it still has some rough edges. — Jose

Is it working on uCLinux Fri, 12/19/2014 - 09:30 — Anonymous

Is it working on uCLinux Targets ?

I don’t know, try it out! Fri, 12/19/2014 - 12:17 — Milian Wolff

I don’t know, try it out!

This is a very interesting Sun, 12/07/2014 - 14:32 — Anonymous

This is a very interesting project, Milian. Thanks for sharing it!

This is really cool Milian. Wed, 12/03/2014 - 22:48 — Anonymous

This is really cool Milian. I’ve been using gperftools for this work right now. Modified it do snapshots when resident size goes up by a specified amount and then take diffs of those - been using this on our TF2 dedicated servers. I will definitely keep an eye on your work with heaptrack though. Thanks much! -Mike (Mike Sartain from Valve)

FYI: jemalloc has a built in Wed, 12/03/2014 - 12:32 — Anonymous

FYI: jemalloc has a built in statistical heap profiler that adds very little overhead and works well with programs with gigantic heaps. It generates pprof files so you use the same viewer tool as the gperftools. Its fast enough and low enough overhead that it can be turned on in production.

Thanks, very interesting! But Wed, 12/03/2014 - 14:15 — Milian Wolff

Thanks, very interesting! But to quote from https://github.com/jemalloc/jemalloc/wiki/Use-Case:-Heap-Profiling:

Walking the call stack to capture a backtrace is typically quite computationally intensive. Therefore it is infeasible to use precise leak checking for long-lived, heavily loaded applications. Statistical sampling of allocations makes it possible to keep the computational overhead low, yet get a general idea of how the application utilizes memory.

So they use sampling to speed up the process. heaptrack could do the same, to speed it up even further. But, imo, its overhead is so low, that we don’t need this. Getting the raw backtrace with libunwind is pretty fast. And since we do the DWARF annotation in a separate process, potentially at a different time, and also delay the actual interpretation of the data, the runtime overhead is small.

I’d be interested to see a comparison between such a sampling based method and heaptrack. Similar to perf or VTune, I assume that the sampling method will also find the hotspots. But it cannot give accurate heap memory measurements, nor hard numbers on the allocation calls or the like. Since with heaptrack we really get all data about heap allocations, we can do all sorts of analyses afterwards, and I’m not sure you can do all of that with the results one obtains from sampling.

W00t… I have long been Tue, 12/02/2014 - 20:14 — Anonymous

W00t… I have long been googling “linux head profiler” to find a tool to investigate ever-growing memory usage in KDE components, but massiv was not able to give me the data I needed (or I was too dumb to interprete them).

Massive thanks! :)

Post new comment

  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options