callgrind Syndicate content

» Should all callgrind bottlenecks be optimized?

Thu, 12/09/2010 - 19:12

Hey all,

I’d like to have some feedback from you. Consider this code:

  1. #include <iostream>
  2. #include <memory.h>
  3.  
  4. using namespace std;
  5.  
  6. struct List {
  7. List(int size) {
  8. begin = new int[size];
  9. memset(begin, 0, size);
  10. end = begin + size;
  11. }
  12. ~List() {
  13. delete[] begin;
  14. }
  15. int at(int i) const {
  16. return begin[i];
  17. }
  18. int size() const {
  19. // std::cout << "size called" << std::endl;
  20. return end - begin;
  21. }
  22. int& operator[](int i) {
  23. return begin[i];
  24. }
  25.  
  26. private:
  27. int* begin;
  28. int* end;
  29. };
  30.  
  31. int main() {
  32. const int s = 1000000;
  33. for (int reps = 0; reps < 1000; ++reps) {
  34. List l(s);
  35. List l2(s);
  36. // version 1
  37. for ( int i = 0; i < l.size(); ++i ) {
  38. // version 2
  39. // for ( int i = 0, c = l.size(); i < c; ++i ) {
  40. l2[i] = l.at(i);;
  41. }
  42. }
  43. return 0;
  44. }

If you run this through callgrind, you’ll see quite some time being spent in l.size(), the compiler doesn’t seem to optimize that away. Now, fixing this “bottleneck” is simple, look at version 2. That way, l.size() will only be called once and you’ll save quite some instructions according to callgrind.

Now, my first impression was: Yes, lets fix this! On the other hand, this optimization is not really that noticable in terms of user-experience. So my question is: Is it worth it? Should everything one sees in callgrind that is easily avoidable and optimizable (like the stuff above) be optimized?

I ask because QTextEngine e.g. doesn’t use the optimized version and I wonder whether I should create a merge request for that. According to callgrind the difference is noticeable: One of my testcases shows ~8% of the time being spent in QVector<QScriptItem>::size() (via QTextEngine::setBoundary()). In Kate the difference is even bigger with ~16% of the time being spent in QList<QTextLayout:.FormatRange>::size() via QTextEngine::format(). Hence I’d say: yes, lets optimize that. I just wonder whether it’s noticeably in the end.

Bye

EDIT: See this comment thread for an answer.

» Transparent loading of compressed Callgrind files in KCacheGrind

Thu, 03/11/2010 - 23:43

Hey everyone!

I just committed an (imo) insanely useful feature for KCacheGrind: Transparent loading of compressed Callgrind files. Finally one does not have to keep those Callgrind files around uncompressed, hogging up lots of space. And what is even more important: It’s much easier to share these files now, as you can send or upload them as .gz or better yet .bz2 and open them directly. KDE architecture just rocks :) So in KDE 4.5 the best profiling visualizer just got better :D

In related news: I’m spending my time as intern at KDAB currently by creating an application to visualize Massif. If you are interested, check the sources out on gitorious: http://gitorious.org/massif-visualizer

It’s still pretty limited in what it offers, yet is probably already more useful than the plain ASCII graph that ms_print generates:

visualization of massif data
Visualization of a Massif output file

This is very WIP but the visuals are somewhat working now. I plan to make the whole graph react on user input, i.e. zoomable, click to show details about snapshots, show information about the heap items that make up the stacked part of the diagram, …

Also very high on my wish list is some kind of interaction with the KCacheGrind libraries, to reuse it’s nice features like callgraphs, cost maps, etc. pp. you name it :) All these features that make KCacheGrind such an insanely useful application.

Oh and remember: Never do performance optimizations without checking the facts first ;-)