BlogAn optimization kata - profiling 101 at Akademy 2014 Syndicate content

Tue, 09/09/2014 - 15:43

Yesterday my Profiling 101 workshop took place at this years Akademy in Brno. The room was packed and I got good feedback, so I hope you all learned something new :)

During my workshop, I showed you how to improve the performance of a word-count application which also creates a word histogram and finds the longest word of a file. I tried to put as many performance bottlenecks as possible into the original code base, which you can find here:

  1. git clone

Instead of uploading my useless slides full of meme images, instead I’m now pushing my optimized code branch. I urge everyone to review the commits I did and read the individual commit messages (Note: read this log from bottom to top). There are many useful tips and tricks in there. I furthermore plan to create a techbase article with the most important notes on how to use profilers for a given job. I’ll write another blog post once I’m done with that.

Furthermore, if you want to learn profiling, I think my scratch repo up there is a good coding kata. Branch off from the master branch and create your own optimized one. Use profilers such as Valgrind callgrind and Linux perf for CPU runtime. Try out Massif and heaptrack for memory.

I hope together we can make KDE software much faster. There are probably many low-hanging fruit throughout our large codebase. If you have any question, please do ask me.

Cheers, enjoy the rest of Akademy. Many thanks to the organizers, sponsors and the KDE e.V.!


Your website is so cool. I am Thu, 02/12/2015 - 08:09 — Anonymous

Your website is so cool. I am impressed by the info that you have on this site. The main issues raised regarding the equipment

Hi Milian, I’ve looked very Sat, 09/13/2014 - 22:21 — Anonymous

Hi Milian,

I’ve looked very closely at this code now and can only optimize it slightly. However, theoretical this could still be a lot faster. First the minor optimizations i found:

note: all of the below is based on revision b8932c0. I don’t use the QVector version which is the latest commit you made.

  • you use toLower on a per-word basis. You might as well do that on the full line. It saves quite some to-lowers and reduces the code (slightly). You get ~2 percent improvement with that
  • instead of reading per line, you could also read chunks of data (just the QIODevice::read function) with a bigger buffer. Lets say 128kb. I’ve tried that and it “seems” like you get another ~2 percent improvement by doing that. However, you will get into trouble if your chunk of data ends in the middle of a word.. Probably why the test cases fail for me if i take this approach :) Adding bookkeeping to get this right is probably going to make it slower.

So how can this still be a lot faster then anything you or i have right now? Here’s the case. I’m benchmarking the wordcount header function with just the file: data/pg1184.txt. If i specifically disable (comment out) the wordcounting logic then you get the time it takes to actually load the file and count the number of lines. In my case that takes: 47,670,729 instructions (yes, i confirmed that the file is actually being read and not optimized out by the compiler). If i enable the wordcount again i get well over 10x more instructions: 566,620,546 (my baseline).

566,620,546 - 47,670,729 = 518,949,817 instructions that are (roughly) being spend in just counting the words. Much of the instructions are being spend in QString and QHash. This means that there is still a lot of room for improvement. And i think i know how to get some of that potential improvement. The data structure that is perfect for this kind of job is any kind of tree structure really. QHash (and QVector) are just not suited for this anymore when optimizing this far. Sadly we don’t have a “QTree” or “QRadix”… I’m going to try my (wip!) Radix tree [1, 2] on this (which i probably have to modify a bit) and see if it improves performance.

I will report back when i have some results.

[1] [2]

How do I clone your repo? Tue, 09/09/2014 - 20:20 — Anonymous

How do I clone your repo? I.e. is there anonymous access?

Try Thu, 09/11/2014 - 02:02 — Milian Wolff

Try this:

  1. git://


If you could post the slides, Tue, 09/09/2014 - 19:01 — Anonymous

If you could post the slides, I’d be very happy :)

The slides only contained Thu, 09/11/2014 - 02:04 — Milian Wolff

The slides only contained meme images. The actual content of the talk can be found in the commit log.

Instead of reading the log Tue, 09/09/2014 - 18:34 — Anonymous

Instead of reading the log from bottom to top, one could also run git log -- reverse on the optimized branch ;-)

Post new comment

  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options