callgrind Syndicate content

warning: Creating default object from empty value in /www/htdocs/w0065fc9/milianw/modules/taxonomy/taxonomy.pages.inc on line 33.

» An optimization kata - profiling 101 at Akademy 2014

Tue, 09/09/2014 - 15:43

Yesterday my Profiling 101 workshop took place at this years Akademy in Brno. The room was packed and I got good feedback, so I hope you all learned something new :)

During my workshop, I showed you how to improve the performance of a word-count application which also creates a word histogram and finds the longest word of a file. I tried to put as many performance bottlenecks as possible into the original code base, which you can find here:

  1. git clone git@git.kde.org:scratch/mwolff/akademy-2014.git

Instead of uploading my useless slides full of meme images, instead I’m now pushing my optimized code branch. I urge everyone to review the commits I did and read the individual commit messages (Note: read this log from bottom to top). There are many useful tips and tricks in there. I furthermore plan to create a techbase article with the most important notes on how to use profilers for a given job. I’ll write another blog post once I’m done with that.

Furthermore, if you want to learn profiling, I think my scratch repo up there is a good coding kata. Branch off from the master branch and create your own optimized one. Use profilers such as Valgrind callgrind and Linux perf for CPU runtime. Try out Massif and heaptrack for memory.

I hope together we can make KDE software much faster. There are probably many low-hanging fruit throughout our large codebase. If you have any question, please do ask me.

Cheers, enjoy the rest of Akademy. Many thanks to the organizers, sponsors and the KDE e.V.!

» Should all callgrind bottlenecks be optimized?

Thu, 12/09/2010 - 19:12

Hey all,

I’d like to have some feedback from you. Consider this code:

  1. #include <iostream>
  2. #include <memory.h>
  3.  
  4. using namespace std;
  5.  
  6. struct List {
  7. List(int size) {
  8. begin = new int[size];
  9. memset(begin, 0, size);
  10. end = begin + size;
  11. }
  12. ~List() {
  13. delete[] begin;
  14. }
  15. int at(int i) const {
  16. return begin[i];
  17. }
  18. int size() const {
  19. // std::cout << "size called" << std::endl;
  20. return end - begin;
  21. }
  22. int& operator[](int i) {
  23. return begin[i];
  24. }
  25.  
  26. private:
  27. int* begin;
  28. int* end;
  29. };
  30.  
  31. int main() {
  32. const int s = 1000000;
  33. for (int reps = 0; reps < 1000; ++reps) {
  34. List l(s);
  35. List l2(s);
  36. // version 1
  37. for ( int i = 0; i < l.size(); ++i ) {
  38. // version 2
  39. // for ( int i = 0, c = l.size(); i < c; ++i ) {
  40. l2[i] = l.at(i);;
  41. }
  42. }
  43. return 0;
  44. }

If you run this through callgrind, you’ll see quite some time being spent in l.size(), the compiler doesn’t seem to optimize that away. Now, fixing this “bottleneck” is simple, look at version 2. That way, l.size() will only be called once and you’ll save quite some instructions according to callgrind.

Now, my first impression was: Yes, lets fix this! On the other hand, this optimization is not really that noticable in terms of user-experience. So my question is: Is it worth it? Should everything one sees in callgrind that is easily avoidable and optimizable (like the stuff above) be optimized?

I ask because QTextEngine e.g. doesn’t use the optimized version and I wonder whether I should create a merge request for that. According to callgrind the difference is noticeable: One of my testcases shows ~8% of the time being spent in QVector<QScriptItem>::size() (via QTextEngine::setBoundary()). In Kate the difference is even bigger with ~16% of the time being spent in QList<QTextLayout:.FormatRange>::size() via QTextEngine::format(). Hence I’d say: yes, lets optimize that. I just wonder whether it’s noticeably in the end.

Bye

EDIT: See this comment thread for an answer.

» Transparent loading of compressed Callgrind files in KCacheGrind

Thu, 03/11/2010 - 23:43

Hey everyone!

I just committed an (imo) insanely useful feature for KCacheGrind: Transparent loading of compressed Callgrind files. Finally one does not have to keep those Callgrind files around uncompressed, hogging up lots of space. And what is even more important: It’s much easier to share these files now, as you can send or upload them as .gz or better yet .bz2 and open them directly. KDE architecture just rocks :) So in KDE 4.5 the best profiling visualizer just got better :D

In related news: I’m spending my time as intern at KDAB currently by creating an application to visualize Massif. If you are interested, check the sources out on gitorious: http://gitorious.org/massif-visualizer

It’s still pretty limited in what it offers, yet is probably already more useful than the plain ASCII graph that ms_print generates:


Visualization of a Massif output file

This is very WIP but the visuals are somewhat working now. I plan to make the whole graph react on user input, i.e. zoomable, click to show details about snapshots, show information about the heap items that make up the stacked part of the diagram, …

Also very high on my wish list is some kind of interaction with the KCacheGrind libraries, to reuse it’s nice features like callgraphs, cost maps, etc. pp. you name it :) All these features that make KCacheGrind such an insanely useful application.

Oh and remember: Never do performance optimizations without checking the facts first ;-)