performance Syndicate content

» VTune and KDE

Fri, 09/09/2011 - 21:09

Hey all,

been some time since I blogged last time. My TODO list is ever increasing and I took my day job at KDAB up again. Among others, I attended a marketing talk by Edmund Preiss. He actually made that marketing talk interesting, not least by his huge knowledge in the business, thanks to ~20 years of working for Intel. Probably the most important info I got out of it is this:

VTune is available free-of-charge under a non-commercial license

Yes, you heard right. Take these links:

  • Intel’s non-commercial offering

    note this entry from the FAQ:

    What does noncommercial mean?
    Non-commercial means that you are not getting compensated in any form for the products and/or services you develop using these Intel® Software Products.

  • Register for free license

  • Register for Download Access

    you’ll need the serial number that gets send to you via email after registering for the license

  • install VTune and profile the hell out of KDE/FOSS software and improve it all!

speeding up KDevelop

Personally I did the latter for KDevelop the last two days, and the results are astonishing. I just tested the results from today and an unscientific time kdevelop -s lotsofprojects - wait until parsing finished - stop showed roughly 50% decrease in time, from ~12min to ~6min. Yes, a whopping 50% - try it out for yourself and see how big the gain is. Don’t forget to whipe the DUChain cache though (i.e. via setting the environment variable CLEAR_DUCHAIN_DIR=1).

Why VTune rocks

I’m a huge fan of the Valgrind toolsuite, but it is simply too slow for profiling some things. Like opening ten medium to big sized projects in KDevelop and taking a look at the parsing speed. This can easily take a few minutes, but in Valgrind it would take ages. With VTune on the other hand, thanks to it’s sampling based approach, I don’t really notice the performance delay.

Then you might have heard of the new perf profiling utility in the Linux kernel. It is also sampling based, but sadly requires special compile options on 64 Bit (-fno-omit-frame-pointers), and the UI is horrible, I haven’t found anything worthwhile with it so far…

VTune on the other hand has an incredible GUI, which makes profiling a joy. You can look at call stacks top-down or bottom-up, visualize locks and waits, easily find hotspots, … I’m blasted. Especially the utilities to look at multi threaded performance (of e.g. KDevelop) kills every single other performance tool I have ever tested. Oh and did I mention that you can attach to an app at runtime, analyze some thing, and detach again?

Seriously, Intel: You just found a new fan boy in me. Thanks for giving this tool away for free for us “I hack on this tool in my spare time, yet still want it to perform nicely” people :) And kudos to the VTune developers - I’m blown away by it!

I really hope more people in the KDE community will try out VTune and try to improve the performance of our apps, I bet there is lots of potential!

Pitfalls

There are some negative aspects to VTune though: First of all it’s UI is sometimes freezing. I wonder if the developers should not maybe spent some time on analyzing the tool itself ;-)

The biggest gripe though is that VTune does not work everywhere. I tried to run it on my Arch box, but sadly Linux 3.0 is not supported by VTune yet. It worked like a charm on two Ubuntu boxes with some 2.6.X kernel though.

This also means that I have no idea if, and how, VTune works on non-Intel CPUs. I think some of it works nicely. I did not install any of the Kernel modules for examples, which would be required for hardcore lowlevel CPU profiling. I think the same feature set I praised so much above, should hence be available on e.g. AMD CPUs. But well, this is left to be tested.

So, I’m now drinking a well deserved beer and look positively into the future of a fast KDevelop/KDE :)

bye

» Should all callgrind bottlenecks be optimized?

Thu, 12/09/2010 - 19:12

Hey all,

I’d like to have some feedback from you. Consider this code:

  1. #include <iostream>
  2. #include <memory.h>
  3.  
  4. using namespace std;
  5.  
  6. struct List {
  7. List(int size) {
  8. begin = new int[size];
  9. memset(begin, 0, size);
  10. end = begin + size;
  11. }
  12. ~List() {
  13. delete[] begin;
  14. }
  15. int at(int i) const {
  16. return begin[i];
  17. }
  18. int size() const {
  19. // std::cout << "size called" << std::endl;
  20. return end - begin;
  21. }
  22. int& operator[](int i) {
  23. return begin[i];
  24. }
  25.  
  26. private:
  27. int* begin;
  28. int* end;
  29. };
  30.  
  31. int main() {
  32. const int s = 1000000;
  33. for (int reps = 0; reps < 1000; ++reps) {
  34. List l(s);
  35. List l2(s);
  36. // version 1
  37. for ( int i = 0; i < l.size(); ++i ) {
  38. // version 2
  39. // for ( int i = 0, c = l.size(); i < c; ++i ) {
  40. l2[i] = l.at(i);;
  41. }
  42. }
  43. return 0;
  44. }

If you run this through callgrind, you’ll see quite some time being spent in l.size(), the compiler doesn’t seem to optimize that away. Now, fixing this “bottleneck” is simple, look at version 2. That way, l.size() will only be called once and you’ll save quite some instructions according to callgrind.

Now, my first impression was: Yes, lets fix this! On the other hand, this optimization is not really that noticable in terms of user-experience. So my question is: Is it worth it? Should everything one sees in callgrind that is easily avoidable and optimizable (like the stuff above) be optimized?

I ask because QTextEngine e.g. doesn’t use the optimized version and I wonder whether I should create a merge request for that. According to callgrind the difference is noticeable: One of my testcases shows ~8% of the time being spent in QVector<QScriptItem>::size() (via QTextEngine::setBoundary()). In Kate the difference is even bigger with ~16% of the time being spent in QList<QTextLayout:.FormatRange>::size() via QTextEngine::format(). Hence I’d say: yes, lets optimize that. I just wonder whether it’s noticeably in the end.

Bye

EDIT: See this comment thread for an answer.

» PHP "is_whitespace" performance

Fri, 08/22/2008 - 00:09

Easy question: What is the fastest way to determine if a string in PHP is whitespace-only?

Easy answer: !preg_match('[^\s]', $string);

Read on for the explanation:

» profile.class.php

Tue, 06/24/2008 - 18:29

Every now and then I want to profile a given part of PHP code. For example I want to quickly check wether my current changeset to GeSHi works faster or is horribly slower. For a big change I’ll stick to Xdebug and KCachegrind. But for a quick overview? Overkill in my eyes.

Say hello to profile.class.php, a simple timer class for PHP5 which you can use to get an overview about where how much time is spent. This is in no way a scientific method nor should you take the results of a single run as a basis for you decisions.

I’ve set an emphasize on an easy API so you don’t have to pollute your code with arbitrary hoops and whistles.

UPDATE: You can find the current updated source in the SVN repo of GeSHi.