profiling Syndicate content

warning: Creating default object from empty value in /www/htdocs/w0065fc9/milianw/modules/taxonomy/taxonomy.pages.inc on line 33.

» Heaptrack 1.0.0

Tue, 02/28/2017 - 15:18

Hey all :)

I’ve finally managed to release heaptrack properly! The first stable release, i.e. v1.0.0 is available for download: https://download.kde.org/stable/heaptrack/1.0.0/src/

You can find more information on the official release announcement over on the KDAB page: https://www.kdab.com/heaptrack-v1-0-0-release/

If you want to read more about what heaptrack is, check out the README.md or have a look at the initial announcement of heaptrack, now three years old!

Cheers, happy profiling!

» Heaptrack - A Heap Memory Profiler for Linux

Tue, 12/02/2014 - 19:25

Hello everyone,

with a tingly feeling in my belly, I’m happy to announce heaptrack, a heap memory profiler for Linux. Over the last couple of months I’ve worked on this new tool in my free time. What started as a “what if” experiment quickly became such a promising tool that I couldn’t stop working on it, at the cost of neglecting my physics masters thesis (who needs that anyways, eh?). In the following, I’ll show you how to use this tool, and why you should start using it.

» Akademy 2014 - Come to my Profiling 101 Workshop!

Tue, 08/26/2014 - 19:08

Hello all!

I have the pleasure to attend Akademy this year again. From my past experience, I’m really looking forward to have a good time again. Lots of hacking, meeting known and unknown faces, drinking beer and socializing ahead! I also love that it’s in a (to me) new country again, and wonder what I will see of the Czech Republic and Brno!

This year, the conference schedule is a bit different from the past years. Not only do we have the usual two days packed with interesting talks and keynotes. No - this year there will also be workshops on the third day! These are more in-depth talks which hopefully teach the audience some new skills, be it QML, mobile development, testing, or … profiling :) Your’s truly has the honor to hold a one-hour Profiling 101 workshop.


I’m going to Akademy and will hold a Profiling 101 Workshop

I welcome all of you to attend my presentation. My plan, currently, is to do some life demoing of how I profile and optimize code. For that purpose, I just wrote a (really slow and badly written) word count test-app. I pushed the sources to kde:scratch/mwolff/akademy-2014.git. If you plan to join my workshop, I encourage you to download the sources and take a shot at optimizing it. I tried my best to write slow code this time, to leave plenty of opportunity for optimizations :) There are many low-hanging fruits in the code. I’m confident that I’ll be able to teach you some more advanced tips and tricks on how you can improve a Qt application’s performance. We’ll see in the end who can come up with the fastest version :)

During my workshop, I’ll investigate the performance of the wordcount app with various tools: On one hand this should teach you how to use the powerful existing opensource tools such as Linux perf and the valgrind suite. I will also show you Intel VTune though, as it is still unparalleled in many aspects and available free-of-charge for non-commercial usage on Linux. Then, I’ll present a few of my own tools to you, such as heaptrack. If you never heard of some of these tools, go try them out before Akademy!

I’ll see what else I’ll fit in and maybe I’ll extend my akademy-2014.git scratch repository with more examples over the next days.

Bye, hope to see you soon!

» Improving Massif-Visualizer For Large Data Files

Fri, 03/16/2012 - 15:42

As I just wrote in another article, Massif is an invaluable tool. The [Visualizer](https://projects.kde.org/massif-visualizer] I wrote is well appreciated and widely used as far as I can see.

A few days ago though, I did a very long (~16h) Massif run on an application, which resulted in a 204MB massif.out data file. This proved to be a very good stress test for my visualizer, which triggered me to spent some time on optimizing it. The results are pretty nice I thing, so look forward to Massif-Visualizer 0.4:

Reduced Memory Consumption

Yeah, meta eh? Just how I like it! I’ve used Massif to improve the memory consumption of Massif-Visualizer, and analyzed the data in the Visualizer of course… :)

Initial Version
initial memory consumption of the visualizer
fig. 1: initial memory consumption of the visualizer

The initial version of my visualizer took about ~470MB of memory to load the 204MB data file above. 80% of that was required for QString allocations in the callgraph of each detailed snapshot, i.e. the function signatures and location. See fig. 1 for the details.

QString to QByteArray
memory consumption of the visualizer using QByteArray instead of QString
fig. 2: `QByteArray` instead of `QString`: 50% less memory

Thomas McGuire gave me the tip of using QByteArray instead, since the Massif callgraph data is just ASCII data. We can convert the data to QString where required, essentially saving us 50% of the memory consumption. You can see that applied in fig. 2. It was simple to code and already reduced the memory consumption considerably.

Implicit Sharing
memory consumption of the visualizer leveraging implicit sharing
fig. 3: leveraging implicit sharing

I committed the above, thinking this was it. But thanks to the awesome people in the KDE community, this time André Wöbbeking, I was thankfully shown wrong: He commented on my commit, arguing that I should try out to leverage the implicit sharing of Qt containers, such as QByteArray. After all, the strings we have here are function signatures and file locations, which are repeated quite often. Especially when you have recursion in your call tree, or the same functions are encountered again and again in Massif snapshots, you can potentially safe a lot of memory by leveraging implicit sharing.

Personally, I’m suprised to see just how much this gains in this case! See fig 3., where the string allocations are nearly gone completely from the Massif log! Now only the tree node allocations, and the containers saving them, are visible in the memory log - something I do not plan to reduce further.

If you are interested in how this was implemented, take a look at commit 4be5dad13fb.

Final Notes

I think this shows quite nicely how to improve the memory consumption of an application. If you want to verify my results, I’ve uploaded the massif log files. Remember that you can open compressed files seamlessly in Massif-Visualizer. The massif.out.data.bz2 file contains the test-data of the 16h Massif run.

You should probably use the latest Massif-Visualizer code though, since I’ve also optimized the performance of it considerably compared to the last released version 0.3. Furthermore, data files are now loaded in the background, showing a nice progress bar while doing that. If you open the big data file in 0.3 you’ll notice why I decided to optimize the visualizer :)

An interesting thing to note btw. is that the callgrind data format compresses files and function signatures, yielding much smaller data files and reducing the KCacheGrind’s memory consumption, esp. since it will automagically leverage the implicit sharing of Qt’s string classes.

Now it is probably time to stop slacking and start work-work again :) I do have quite a few ideas more for the next Massif-Visualizer though, especially an export functionality for the graphs is high on my TODO list!

» Tracking Memory Consumption Using Pmap

Fri, 03/16/2012 - 14:55

Massif is a really nifty tool which is very powerful, especially paired with my visualizer. The caveat of course is that it slows down the application considerably, I’ve seen anything up to a factor of 100… I see no alternative to Massif when it comes to investigating where your memory problems come from. But if you just want to see whether you have a problem at all, tracking the total memory consumption should suffice.

A few days ago, I came across pmap on Stack Overflow, which makes it easy to track the RSS memory consumption of an application using the -x switch. Of course I had to write some bash magic to automate this process and visualize the data using Gnuplot! Behold:

memory consumption of PhantomJS
memory consumption of a PhantomJS script over ~30min
usage

It’s simple, really: track_memory.sh $(pidof myapp).

The default timeout is ~1s between snapshots, you can pass a different timeout as second parameter. Bash’s sleep can also take float numbers such as 0.1 to get more snapshots for fast-running apps.

You can also run show_memory.sh mem.log.$(pidof myapp) while you are still tracking the memory. The gnuplot window that appears allows you to update the data intermittently, to zoom in and to create images such as the above.

Note: This kind of memory usage tracking costs nearly nothing, your application continues to work at full speed. Also be aware that this just shows the RSS memory consumption. Massif will always give you better, more detailed and accurate results. Still, I think this should already give you an idea on how your application behaves. If the graph goes up and up, you probably got a memory leak! Then it’s time to run Memcheck and/or Massif to find the issue and fix it!

track_memory.sh

You can find the most-recent version on GitHub: https://github.com/milianw/shell-helpers/blob/master/track_memory.sh

  1. #!/bin/bash
  2.  
  3. #
  4. # track memory of given application, identified by PID,
  5. # using pmap -x, to show RSS and Dirty memory usage.
  6. #
  7. # visualization can later on be done with the
  8. # show_memory.sh script.
  9. #
  10.  
  11. pid=$1
  12. sleep=$2;
  13.  
  14. if [[ "$sleep" == "" ]]; then
  15. sleep=1
  16. fi
  17.  
  18. if [[ "$(ps -p $pid | grep $pid)" == "" ]]; then
  19. echo "cannot find program with pid $pid"
  20. echo "track_memory.sh PID [SLEEP_TIMEOUT]"
  21. echo
  22. echo "example: track_memory.sh \$(pidof someapp) 0.1"
  23. exit
  24. fi
  25.  
  26. logfile=mem.log.$pid
  27.  
  28. echo "# $(ps -o command= -p $pid)" > $logfile
  29. echo "# $sleep" >> $logfile
  30.  
  31. cat $logfile
  32.  
  33. while [[ "$(ps -p $pid | grep $pid)" != "" ]]; do
  34. echo "snapshot " $pid
  35. pmap -x $pid | tail -n1 >> $logfile
  36. echo "$sleep"
  37. sleep $sleep;
  38. done
  39.  
  40. echo "done tracking, visualizing"
  41. $(dirname $0)/show_memory.sh "$logfile"
show_memory.sh

You can find the most-recent version on GitHub: https://github.com/milianw/shell-helpers/blob/master/show_memory.sh

  1. #!/bin/bash
  2.  
  3. #
  4. # visualize memory consumption over time
  5. # as recorded by pmap / track_memory.sh
  6. # script
  7. #
  8.  
  9. logfile=$1
  10.  
  11. if [ ! -f "$logfile" ]; then
  12. echo "cannot find memory logfile: $1"
  13. echo
  14. echo "usage: show_memory.sh LOGFILE"
  15. echo
  16. echo "example: show_memory.sh mem.log.12345"
  17. exit
  18. fi
  19.  
  20. title=$(head -n1 "$logfile")
  21. timeout=$(head -n2 "$logfile" | tail -n1)
  22.  
  23. title=${title/\# /}
  24. timeout=${timeout/\# /}
  25.  
  26. # total:
  27. # '$logfile' using 3 w lines title 'Kbytes', \
  28.  
  29. gnuplot -p -e "
  30. set title '$title';
  31. set xlabel 'snapshot ~${timeout}s';
  32. set ylabel 'memory consumption in kB';
  33. set key bottom right;
  34. plot \
  35. '$logfile' using 4 w lines title 'RSS' lt 1, \
  36. '$logfile' using 4 smooth bezier w lines title 'RSS (smooth)' lt 7, \
  37. '$logfile' using 5 w lines title 'Dirty' lt 2, \
  38. '$logfile' using 5 smooth bezier w lines title 'Dirty (smooth)' lt 3;
  39. ";
Future

The above is nice, but I’m wondering on whether one should not add this kind of utility to ksysguard: It already allows you to track the total memory consumption of your system, yet I did not find a way to just track a single application and visualize it’s memory consumption.

» VTune and KDE

Fri, 09/09/2011 - 21:09

Hey all,

been some time since I blogged last time. My TODO list is ever increasing and I took my day job at KDAB up again. Among others, I attended a marketing talk by Edmund Preiss. He actually made that marketing talk interesting, not least by his huge knowledge in the business, thanks to ~20 years of working for Intel. Probably the most important info I got out of it is this:

VTune is available free-of-charge under a non-commercial license

Yes, you heard right. Take these links:

  • Intel’s non-commercial offering

    note this entry from the FAQ:

    What does noncommercial mean?
    Non-commercial means that you are not getting compensated in any form for the products and/or services you develop using these Intel® Software Products.

  • Register for free license

  • Register for Download Access

    you’ll need the serial number that gets send to you via email after registering for the license

  • install VTune and profile the hell out of KDE/FOSS software and improve it all!

speeding up KDevelop

Personally I did the latter for KDevelop the last two days, and the results are astonishing. I just tested the results from today and an unscientific time kdevelop -s lotsofprojects - wait until parsing finished - stop showed roughly 50% decrease in time, from ~12min to ~6min. Yes, a whopping 50% - try it out for yourself and see how big the gain is. Don’t forget to whipe the DUChain cache though (i.e. via setting the environment variable CLEAR_DUCHAIN_DIR=1).

Why VTune rocks

I’m a huge fan of the Valgrind toolsuite, but it is simply too slow for profiling some things. Like opening ten medium to big sized projects in KDevelop and taking a look at the parsing speed. This can easily take a few minutes, but in Valgrind it would take ages. With VTune on the other hand, thanks to it’s sampling based approach, I don’t really notice the performance delay.

Then you might have heard of the new perf profiling utility in the Linux kernel. It is also sampling based, but sadly requires special compile options on 64 Bit (-fno-omit-frame-pointers), and the UI is horrible, I haven’t found anything worthwhile with it so far…

VTune on the other hand has an incredible GUI, which makes profiling a joy. You can look at call stacks top-down or bottom-up, visualize locks and waits, easily find hotspots, … I’m blasted. Especially the utilities to look at multi threaded performance (of e.g. KDevelop) kills every single other performance tool I have ever tested. Oh and did I mention that you can attach to an app at runtime, analyze some thing, and detach again?

Seriously, Intel: You just found a new fan boy in me. Thanks for giving this tool away for free for us “I hack on this tool in my spare time, yet still want it to perform nicely” people :) And kudos to the VTune developers - I’m blown away by it!

I really hope more people in the KDE community will try out VTune and try to improve the performance of our apps, I bet there is lots of potential!

Pitfalls

There are some negative aspects to VTune though: First of all it’s UI is sometimes freezing. I wonder if the developers should not maybe spent some time on analyzing the tool itself ;-)

The biggest gripe though is that VTune does not work everywhere. I tried to run it on my Arch box, but sadly Linux 3.0 is not supported by VTune yet. It worked like a charm on two Ubuntu boxes with some 2.6.X kernel though.

This also means that I have no idea if, and how, VTune works on non-Intel CPUs. I think some of it works nicely. I did not install any of the Kernel modules for examples, which would be required for hardcore lowlevel CPU profiling. I think the same feature set I praised so much above, should hence be available on e.g. AMD CPUs. But well, this is left to be tested.

So, I’m now drinking a well deserved beer and look positively into the future of a fast KDevelop/KDE :)

bye

» Should all callgrind bottlenecks be optimized?

Thu, 12/09/2010 - 19:12

Hey all,

I’d like to have some feedback from you. Consider this code:

  1. #include <iostream>
  2. #include <memory.h>
  3.  
  4. using namespace std;
  5.  
  6. struct List {
  7. List(int size) {
  8. begin = new int[size];
  9. memset(begin, 0, size);
  10. end = begin + size;
  11. }
  12. ~List() {
  13. delete[] begin;
  14. }
  15. int at(int i) const {
  16. return begin[i];
  17. }
  18. int size() const {
  19. // std::cout << "size called" << std::endl;
  20. return end - begin;
  21. }
  22. int& operator[](int i) {
  23. return begin[i];
  24. }
  25.  
  26. private:
  27. int* begin;
  28. int* end;
  29. };
  30.  
  31. int main() {
  32. const int s = 1000000;
  33. for (int reps = 0; reps < 1000; ++reps) {
  34. List l(s);
  35. List l2(s);
  36. // version 1
  37. for ( int i = 0; i < l.size(); ++i ) {
  38. // version 2
  39. // for ( int i = 0, c = l.size(); i < c; ++i ) {
  40. l2[i] = l.at(i);;
  41. }
  42. }
  43. return 0;
  44. }

If you run this through callgrind, you’ll see quite some time being spent in l.size(), the compiler doesn’t seem to optimize that away. Now, fixing this “bottleneck” is simple, look at version 2. That way, l.size() will only be called once and you’ll save quite some instructions according to callgrind.

Now, my first impression was: Yes, lets fix this! On the other hand, this optimization is not really that noticable in terms of user-experience. So my question is: Is it worth it? Should everything one sees in callgrind that is easily avoidable and optimizable (like the stuff above) be optimized?

I ask because QTextEngine e.g. doesn’t use the optimized version and I wonder whether I should create a merge request for that. According to callgrind the difference is noticeable: One of my testcases shows ~8% of the time being spent in QVector<QScriptItem>::size() (via QTextEngine::setBoundary()). In Kate the difference is even bigger with ~16% of the time being spent in QList<QTextLayout:.FormatRange>::size() via QTextEngine::format(). Hence I’d say: yes, lets optimize that. I just wonder whether it’s noticeably in the end.

Bye

EDIT: See this comment thread for an answer.

» Profiling Rocks - KDevelop CMake Support now 20x faster

Wed, 03/31/2010 - 01:24

I just need to get this out quickly:

We were aware that KDevelop’s CMake support was slow. Too slow actually. It was profiled months ago and after a quick look that turned up QRegExp, it was discarded in fear of having to rewrite the whole parser properly, without using QRegExp. Which btw. is still a good idea of course.

But well, today I felt like I should do some more tinkering. I mean I managed to optimize KDevelop’s Cpp support recently (parsing Boost’s huge generated template headers, like e.g. vector200.hpp is now 30% faster). I managed to make KGraphViewer usable for huge callgraphs I produce in Massif Visualizer. So how hard could it be to make KDevelop’s CMake at least /a bit/ faster, he?

Yeah well an hour later and two commits later, I managed to find and fix two bottlenecks. Both where related to QRegExp. Neither was the actual parser, instead it was the part that evaluated CMake files, esp. the STRING(...) function. So even if we’d used a proper parser generator, this would still been slow.

The first one was the typical “don’t reinvent the wheel” kinda commit which already made the CMake support two times faster for projects that used FindQt4.cmake, i.e. any Qt or KDE project. Not bad, right? Well, while I fixed that I saw that KDevelop tried to do some Regular expression replacement on the output of qmake --help, this could not been right, could it? With help of Andreas and Aleix we found the bug in the parser and that made the CMake support 10 times faster.

So yeah, CMake projects using Qt or KDE should now get opened a whopping 20 times faster in KDevelop :)

I really love KCacheGrind and Valgrind’s callgrind - again it proved to be the most awesome tool one can imagine! If you are interested in the callgrind files:

  1. without optimization
  2. first optimization
  3. second optimization

Note: with KCacheGrind from trunk you can open these compressed files transparently :)

» Massif Visualizer - now with user interaction

Sat, 03/13/2010 - 16:55

Just a quick status update: Massif Visualizer now reacts on user input. Meaning: You can click on the graph and the corresponding item in the treeview gets selected and vice versa. It’s a bit buggy since KDChart is not reliable on what it reports, but it works quite well already.

Furthermore the colors should be better now, peaks are labeled (better readable on bright color schemes, I’m afraid to say…), legend is shown, …

Now lets see how I can make the treeview more useful!

» Transparent loading of compressed Callgrind files in KCacheGrind

Thu, 03/11/2010 - 23:43

Hey everyone!

I just committed an (imo) insanely useful feature for KCacheGrind: Transparent loading of compressed Callgrind files. Finally one does not have to keep those Callgrind files around uncompressed, hogging up lots of space. And what is even more important: It’s much easier to share these files now, as you can send or upload them as .gz or better yet .bz2 and open them directly. KDE architecture just rocks :) So in KDE 4.5 the best profiling visualizer just got better :D

In related news: I’m spending my time as intern at KDAB currently by creating an application to visualize Massif. If you are interested, check the sources out on gitorious: http://gitorious.org/massif-visualizer

It’s still pretty limited in what it offers, yet is probably already more useful than the plain ASCII graph that ms_print generates:


Visualization of a Massif output file

This is very WIP but the visuals are somewhat working now. I plan to make the whole graph react on user input, i.e. zoomable, click to show details about snapshots, show information about the heap items that make up the stacked part of the diagram, …

Also very high on my wish list is some kind of interaction with the KCacheGrind libraries, to reuse it’s nice features like callgraphs, cost maps, etc. pp. you name it :) All these features that make KCacheGrind such an insanely useful application.

Oh and remember: Never do performance optimizations without checking the facts first ;-)