Where Profiling Sucks

Ok, you should know by now that I love profiling and making things faster. Yet there’s always a “but”. For me it’s blocking syscalls, or anything that makes the app “slow” for the user but doesn’t show up in Callgrind as the Instruction Fetch cost doesn’t go up.

The usual suspect is of course locks (which we have quite a lot in KDevelop) or QProcesses with waitForFinished() or similar… You won’t see them in any Callgrind profile. Does anyone know a way to achieve that? Something that makes Callgrind increase the Ir cost for blocking func calls depending on the time it blocks? Or some other tool that would show me these?

And if you are interested: I was still able to find the cause for slow parsing of Custom Make Manager projects (Qt, Linux Kernel, …) in KDevelop: The cache in the IncludePathResolver never hit, since a operator== was improperly implemented ;-) I really wonder how we could have missed that for so long! I’ve also added some more changes that should make it much faster to parse projects that rely on the IncludePathResolver. I was personally now able to parse 10.000 files of the Linux Kernel in about 9.5 minutes. This is roughly a third of the Kernel, so I’d get to a total of approx 30min. Compare that to the 2.5h for 5% that one of our users reported ;-)


Want to comment? Send me an email!

Comment by Anonymous (not verified) (2010-04-08 05:47:00)

Try Systemtap…

Comment by Anonymous (not verified) (2010-04-07 22:16:00)

Compuware Devpartner - it instruments code at compile time, and is thus able to measure time taken to execute a function, which is sufficient to diagnoze IO bottlenecks. Only for Windows though.

Comment by Christoph Bartoschek (not verified) (2010-04-06 23:42:00)

At least oprofile is not capable of profiling the waiting time of a program. I doubt that there is a single profiler that is able to profile the waiting time.

Comment by stativ (not verified) (2010-04-06 07:40:00)

Try Tau [1], it’s probably one of the best free profillers out there. But it’s a pain to make it work (I put it in work once but then I bougth a new computer and I haven’t got back to it). The nice thing is that it supports several levels of instruction (source code, external library (that one works similarly to callgring I think)…). Some are faster, some are more precise, some are “easy” to use. And it’s quite robust.


Comment by Benjamin Otte (not verified) (2010-04-05 20:42:00)

You could try sysprof, it shows traces into the kernel.

Comment by Justin Noel (not verified) (2010-04-05 20:32:00)

Use OProfile, or it’s commercial cousin Zoom by RotateRight. OProfile uses hardware ticks or kernel timers to do profiling so it requires only loading a kernel module rather than running an emulator like Valgrind. OProfile will also give you a profile of a system at large so you can keep an eye on what the XServer is doing while you client app is running. Pretty cool. I used to use Zoom because it would get me call-graphs, but there may be a conversion util now available to convert OProfile to Callgrind Format. That would allow you to keep using KCacheGrind.

Published on April 05, 2010.