An optimization kata - profiling 101 at Akademy 2014

Yesterday my Profiling 101 workshop took place at this years Akademy in Brno. The room was packed and I got good feedback, so I hope you all learned something new :)

During my workshop, I showed you how to improve the performance of a word-count application which also creates a word histogram and finds the longest word of a file. I tried to put as many performance bottlenecks as possible into the original code base, which you can find here:

    git clone

Instead of uploading my useless slides full of meme images, instead I’m now pushing my optimized code branch. I urge everyone to review the commits I did and read the individual commit messages ( Note : read this log from bottom to top). There are many useful tips and tricks in there. I furthermore plan to create a techbase article with the most important notes on how to use profilers for a given job. I’ll write another blog post once I’m done with that.

Furthermore, if you want to learn profiling, I think my scratch repo up there is a good coding kata. Branch off from the master branch and create your own optimized one. Use profilers such as Valgrind callgrind and Linux perf for CPU runtime. Try out Massif and heaptrack for memory.

I hope together we can make KDE software much faster. There are probably many low-hanging fruit throughout our large codebase. If you have any question, please do ask me.

Cheers, enjoy the rest of Akademy. Many thanks to the organizers, sponsors and the KDE e.V.!


Want to comment? Send me an email!

Comment by Anonymous (2015-02-12 08:09:00)

Your website is so cool. I am impressed by the info that you have on this site. The main issues raised regarding the equipment

Comment by Anonymous (2014-09-13 22:21:00)

Hi Milian,

I’ve looked very closely at this code now and can only optimize it slightly. However, theoretical this could still be a lot faster. First the minor optimizations i found:

note: all of the below is based on revision b8932c0. I don’t use the QVector version which is the latest commit you made.

  • you use toLower on a per-word basis. You might as well do that on the full line. It saves quite some to-lowers and reduces the code (slightly). You get ~2 percent improvement with that
  • instead of reading per line, you could also read chunks of data (just the QIODevice::read function) with a bigger buffer. Lets say 128kb. I’ve tried that and it “seems” like you get another ~2 percent improvement by doing that. However, you will get into trouble if your chunk of data ends in the middle of a word.. Probably why the test cases fail for me if i take this approach :) Adding bookkeeping to get this right is probably going to make it slower.

So how can this still be a lot faster then anything you or i have right now? Here’s the case. I’m benchmarking the wordcount header function with just the file: data/pg1184.txt. If i specifically disable (comment out) the wordcounting logic then you get the time it takes to actually load the file and count the number of lines. In my case that takes: 47,670,729 instructions (yes, i confirmed that the file is actually being read and not optimized out by the compiler). If i enable the wordcount again i get well over 10x more instructions: 566,620,546 (my baseline).

566,620,546 - 47,670,729 = 518,949,817 instructions that are (roughly) being spend in just counting the words. Much of the instructions are being spend in QString and QHash. This means that there is still a lot of room for improvement. And i think i know how to get some of that potential improvement. The data structure that is perfect for this kind of job is any kind of tree structure really. QHash (and QVector) are just not suited for this anymore when optimizing this far. Sadly we don’t have a “QTree” or “QRadix”… I’m going to try my (wip!) Radix tree [1, 2] on this (which i probably have to modify a bit) and see if it improves performance.

I will report back when i have some results.

[1] [2]

Comment by Anonymous (2014-09-09 20:20:00)

How do I clone your repo? I.e. is there anonymous access?

Comment by Milian Wolff (2014-09-11 02:02:00)

Try this:



Comment by Anonymous (2014-09-09 19:01:00)

If you could post the slides, I’d be very happy :)

Comment by Milian Wolff (2014-09-11 02:04:00)

The slides only contained meme images. The actual content of the talk can be found in the commit log.

Comment by Anonymous (2014-09-09 18:34:00)

Instead of reading the log from bottom to top, one could also run git log -- reverse on the optimized branch ;-)