performance Syndicate content

warning: Creating default object from empty value in /www/htdocs/w0065fc9/milianw/modules/taxonomy/taxonomy.pages.inc on line 33.

» Heaptrack - Attaching to Running Process

Tue, 12/09/2014 - 23:02

Hello all,

I’m happy to be back so soon with a status update on heaptrack: It is now possible to attach to an already running process!

Thanks to the great help from Celelibi on StackOverflow, I managed to achieve this important goal. Once you know what to do, it is actually extremely simple to patch a running process. I use GDB to attach to the process, then call dlopen to load a special heaptrack library for runtime-injection. Then I call an initialization function which takes the desired output file as a parameter, and then detach GDB. To actually overwrite malloc & friends, one can leverage dl_iterate_phdr and the public ELF API on Linux systems to find dynamic sections that reference one of our target symbols in their global offset table (GOT). This can then be rewritten to point to our custom hooks. Some refactoring later, which stabilized the shutdown sequence to allows multiple heaptrack attach/detach sequences, we can now do this:

  1. heaptrack -p $(pidof <yourapp>)
  2. # wait
  3. ^C
  4. heaptrack_print heaptrack.<yourapp>.$$.gz | less

This is a great help when you want to investigate why the memory consumption of your application suddenly rises. No need to restart the app, just attach heaptrack and wait for some, then kill it and heaptrack_print the outputfile.

Please try this new feature and send me bug reports and feedback.

Cheers

» Heaptrack - A Heap Memory Profiler for Linux

Tue, 12/02/2014 - 19:25

Hello everyone,

with a tingly feeling in my belly, I’m happy to announce heaptrack, a heap memory profiler for Linux. Over the last couple of months I’ve worked on this new tool in my free time. What started as a “what if” experiment quickly became such a promising tool that I couldn’t stop working on it, at the cost of neglecting my physics masters thesis (who needs that anyways, eh?). In the following, I’ll show you how to use this tool, and why you should start using it.

» An optimization kata - profiling 101 at Akademy 2014

Tue, 09/09/2014 - 15:43

Yesterday my Profiling 101 workshop took place at this years Akademy in Brno. The room was packed and I got good feedback, so I hope you all learned something new :)

During my workshop, I showed you how to improve the performance of a word-count application which also creates a word histogram and finds the longest word of a file. I tried to put as many performance bottlenecks as possible into the original code base, which you can find here:

  1. git clone git@git.kde.org:scratch/mwolff/akademy-2014.git

Instead of uploading my useless slides full of meme images, instead I’m now pushing my optimized code branch. I urge everyone to review the commits I did and read the individual commit messages (Note: read this log from bottom to top). There are many useful tips and tricks in there. I furthermore plan to create a techbase article with the most important notes on how to use profilers for a given job. I’ll write another blog post once I’m done with that.

Furthermore, if you want to learn profiling, I think my scratch repo up there is a good coding kata. Branch off from the master branch and create your own optimized one. Use profilers such as Valgrind callgrind and Linux perf for CPU runtime. Try out Massif and heaptrack for memory.

I hope together we can make KDE software much faster. There are probably many low-hanging fruit throughout our large codebase. If you have any question, please do ask me.

Cheers, enjoy the rest of Akademy. Many thanks to the organizers, sponsors and the KDE e.V.!

» Akademy 2014 - Come to my Profiling 101 Workshop!

Tue, 08/26/2014 - 19:08

Hello all!

I have the pleasure to attend Akademy this year again. From my past experience, I’m really looking forward to have a good time again. Lots of hacking, meeting known and unknown faces, drinking beer and socializing ahead! I also love that it’s in a (to me) new country again, and wonder what I will see of the Czech Republic and Brno!

This year, the conference schedule is a bit different from the past years. Not only do we have the usual two days packed with interesting talks and keynotes. No - this year there will also be workshops on the third day! These are more in-depth talks which hopefully teach the audience some new skills, be it QML, mobile development, testing, or … profiling :) Your’s truly has the honor to hold a one-hour Profiling 101 workshop.


I’m going to Akademy and will hold a Profiling 101 Workshop

I welcome all of you to attend my presentation. My plan, currently, is to do some life demoing of how I profile and optimize code. For that purpose, I just wrote a (really slow and badly written) word count test-app. I pushed the sources to kde:scratch/mwolff/akademy-2014.git. If you plan to join my workshop, I encourage you to download the sources and take a shot at optimizing it. I tried my best to write slow code this time, to leave plenty of opportunity for optimizations :) There are many low-hanging fruits in the code. I’m confident that I’ll be able to teach you some more advanced tips and tricks on how you can improve a Qt application’s performance. We’ll see in the end who can come up with the fastest version :)

During my workshop, I’ll investigate the performance of the wordcount app with various tools: On one hand this should teach you how to use the powerful existing opensource tools such as Linux perf and the valgrind suite. I will also show you Intel VTune though, as it is still unparalleled in many aspects and available free-of-charge for non-commercial usage on Linux. Then, I’ll present a few of my own tools to you, such as heaptrack. If you never heard of some of these tools, go try them out before Akademy!

I’ll see what else I’ll fit in and maybe I’ll extend my akademy-2014.git scratch repository with more examples over the next days.

Bye, hope to see you soon!

» Apps on Speed

Tue, 11/19/2013 - 17:07

Hey all,

since some people asked me: The slides to my extended Apps on Speed talk from this year’s Qt DevDays Berlin are available for download. If you are interested, get them here: http://devdays.kdab.com/wp-content/uploads/2013/11/qt-dd-2013-apps-on-sp…

I hope you liked that talk. I certainly had fun presenting it and discussing the contents with various attendees later on. I have now quite some ideas on how to extend the talk even further.

The slides of the other presentations are also available. Stay tuned for the video recordings of DevDays Berlin, I’m sure they will be accessible soonish :)

Edit: The video is now available! Enjoy http://www.youtube.com/watch?v=C5EPt50Kgmc

» Akademy 2013 - A Blast!

Sat, 07/20/2013 - 01:23

Wow…

I’ve been gone for eight days and returned just a few hours ago to Berlin. It doesn’t feel like that. The last days went by in a blur of awesomeness! The reason why I didn’t write a single blog post in between is just that I never had a spare minute for that. I arrived on Thursday and instantly enjoyed the warmth of Spain / the Basque country and had a tasty and cheap Menu del Dia at a local Restaurant with fellow KDABians and other KDE friends. Then just a few hours later the first party started, near the old district of the city - amazing! More and more hackers and helpers arrived, the atmosphere was once again so good. The social aspect of this years Akademy was without comparison in my opinion - seriously: Hats off to the local team, you did an amazing job!

While the social events on the following days have been just as awesome or even awesomer to awesomest - I especially enjoyed the day trip and jumping into the ocean! - the technical side of Akademy delivered just as well: My favorite talks this year where Mirko’s about ThreadWeaver, which we heavily use in KDevelop. His roadmap and polished API looks much better than what we have nowadays and should allow for much nicer code which might even perform better - kudos!

Similarily, I liked Volker’s talk about Expression Templates and Kevin Krammer’s presentation of Declarative Widgets a lot. Both of them are colleagues of mine, so the contents weren’t that new to me - yet hearing it all in a concise and entertaining manner is always worth it. The crowd also seemed to enjoy it. Martin Grässlin’s talk about being the 1% corner case was also highly entertaining and gave a very interesting insight into the problems he tackles day to day.

There have been other, less technical talks, which I also appreciated greatly: Kevin Otten’s visionary roadmap for KDE as a community or Till’s highly entertaining presentation of BlackBerry. Which brings me to the sponsors - many thanks! Without them, this year’s event would surely not have been as good as it was!

Oh boy, I already wrote a lot, yet only covered the first three days… After the AGM and presentations on the week end followed a full week of highly educational BoF’s - both around KDE topics (such as KF5, KDevelop, …) or “plain” Qt during the Qt Contributor Summit. This was my first time attending the QtCS and I definitely want to see more of this! Discussing the future of QtWebKit and learning more about whats cooking in QtCore was certainly worth it. Being in contact to the QtCreator and QML guys also helps from a tooling point of view in general and from a KDevelop pov in particular. Oh and we got a nice BlackBerry Z10 phone - many thanks for that!

The afternoons are mostly a blur - I mostly remember lots of Foosball, Socializing, Drinking, meeting Friends of Old and New, Eating, Partying etc. pp.

Anyhow, I think I need to stop here.

tl;dr; Thank you local KDE team for organizing such an awesome Akademy + QtCS 2013! Thank you Sponsors for making this possible!

PS: All of you who attended talks on the weekend: Go and rate them! The speakers will love you and provide you with even better talks next year! Go to either the page for the talks on saturday or the talks on sunday, then pick the sessions you attended and finally hit the “Feedback” link!

PPS: I definitely have to come back to the Basque country, the country side looked beautiful and Bilbao alone is worth the trip! And I didn’t even have time to visit the Guggenheim…

Cheers, see you next year you insane awesome crowd of KDE people!

» VTune and KDE

Fri, 09/09/2011 - 21:09

Hey all,

been some time since I blogged last time. My TODO list is ever increasing and I took my day job at KDAB up again. Among others, I attended a marketing talk by Edmund Preiss. He actually made that marketing talk interesting, not least by his huge knowledge in the business, thanks to ~20 years of working for Intel. Probably the most important info I got out of it is this:

VTune is available free-of-charge under a non-commercial license

Yes, you heard right. Take these links:

  • Intel’s non-commercial offering

    note this entry from the FAQ:

    What does noncommercial mean?
    Non-commercial means that you are not getting compensated in any form for the products and/or services you develop using these Intel® Software Products.

  • Register for free license

  • Register for Download Access

    you’ll need the serial number that gets send to you via email after registering for the license

  • install VTune and profile the hell out of KDE/FOSS software and improve it all!

speeding up KDevelop

Personally I did the latter for KDevelop the last two days, and the results are astonishing. I just tested the results from today and an unscientific time kdevelop -s lotsofprojects - wait until parsing finished - stop showed roughly 50% decrease in time, from ~12min to ~6min. Yes, a whopping 50% - try it out for yourself and see how big the gain is. Don’t forget to whipe the DUChain cache though (i.e. via setting the environment variable CLEAR_DUCHAIN_DIR=1).

Why VTune rocks

I’m a huge fan of the Valgrind toolsuite, but it is simply too slow for profiling some things. Like opening ten medium to big sized projects in KDevelop and taking a look at the parsing speed. This can easily take a few minutes, but in Valgrind it would take ages. With VTune on the other hand, thanks to it’s sampling based approach, I don’t really notice the performance delay.

Then you might have heard of the new perf profiling utility in the Linux kernel. It is also sampling based, but sadly requires special compile options on 64 Bit (-fno-omit-frame-pointers), and the UI is horrible, I haven’t found anything worthwhile with it so far…

VTune on the other hand has an incredible GUI, which makes profiling a joy. You can look at call stacks top-down or bottom-up, visualize locks and waits, easily find hotspots, … I’m blasted. Especially the utilities to look at multi threaded performance (of e.g. KDevelop) kills every single other performance tool I have ever tested. Oh and did I mention that you can attach to an app at runtime, analyze some thing, and detach again?

Seriously, Intel: You just found a new fan boy in me. Thanks for giving this tool away for free for us “I hack on this tool in my spare time, yet still want it to perform nicely” people :) And kudos to the VTune developers - I’m blown away by it!

I really hope more people in the KDE community will try out VTune and try to improve the performance of our apps, I bet there is lots of potential!

Pitfalls

There are some negative aspects to VTune though: First of all it’s UI is sometimes freezing. I wonder if the developers should not maybe spent some time on analyzing the tool itself ;-)

The biggest gripe though is that VTune does not work everywhere. I tried to run it on my Arch box, but sadly Linux 3.0 is not supported by VTune yet. It worked like a charm on two Ubuntu boxes with some 2.6.X kernel though.

This also means that I have no idea if, and how, VTune works on non-Intel CPUs. I think some of it works nicely. I did not install any of the Kernel modules for examples, which would be required for hardcore lowlevel CPU profiling. I think the same feature set I praised so much above, should hence be available on e.g. AMD CPUs. But well, this is left to be tested.

So, I’m now drinking a well deserved beer and look positively into the future of a fast KDevelop/KDE :)

bye

» Should all callgrind bottlenecks be optimized?

Thu, 12/09/2010 - 19:12

Hey all,

I’d like to have some feedback from you. Consider this code:

  1. #include <iostream>
  2. #include <memory.h>
  3.  
  4. using namespace std;
  5.  
  6. struct List {
  7. List(int size) {
  8. begin = new int[size];
  9. memset(begin, 0, size);
  10. end = begin + size;
  11. }
  12. ~List() {
  13. delete[] begin;
  14. }
  15. int at(int i) const {
  16. return begin[i];
  17. }
  18. int size() const {
  19. // std::cout << "size called" << std::endl;
  20. return end - begin;
  21. }
  22. int& operator[](int i) {
  23. return begin[i];
  24. }
  25.  
  26. private:
  27. int* begin;
  28. int* end;
  29. };
  30.  
  31. int main() {
  32. const int s = 1000000;
  33. for (int reps = 0; reps < 1000; ++reps) {
  34. List l(s);
  35. List l2(s);
  36. // version 1
  37. for ( int i = 0; i < l.size(); ++i ) {
  38. // version 2
  39. // for ( int i = 0, c = l.size(); i < c; ++i ) {
  40. l2[i] = l.at(i);;
  41. }
  42. }
  43. return 0;
  44. }

If you run this through callgrind, you’ll see quite some time being spent in l.size(), the compiler doesn’t seem to optimize that away. Now, fixing this “bottleneck” is simple, look at version 2. That way, l.size() will only be called once and you’ll save quite some instructions according to callgrind.

Now, my first impression was: Yes, lets fix this! On the other hand, this optimization is not really that noticable in terms of user-experience. So my question is: Is it worth it? Should everything one sees in callgrind that is easily avoidable and optimizable (like the stuff above) be optimized?

I ask because QTextEngine e.g. doesn’t use the optimized version and I wonder whether I should create a merge request for that. According to callgrind the difference is noticeable: One of my testcases shows ~8% of the time being spent in QVector<QScriptItem>::size() (via QTextEngine::setBoundary()). In Kate the difference is even bigger with ~16% of the time being spent in QList<QTextLayout:.FormatRange>::size() via QTextEngine::format(). Hence I’d say: yes, lets optimize that. I just wonder whether it’s noticeably in the end.

Bye

EDIT: See this comment thread for an answer.

» PHP "is_whitespace" performance

Fri, 08/22/2008 - 00:09

Easy question: What is the fastest way to determine if a string in PHP is whitespace-only?

Easy answer: !preg_match('[^\s]', $string);

Read on for the explanation:

» profile.class.php

Tue, 06/24/2008 - 18:29

Every now and then I want to profile a given part of PHP code. For example I want to quickly check wether my current changeset to GeSHi works faster or is horribly slower. For a big change I’ll stick to Xdebug and KCachegrind. But for a quick overview? Overkill in my eyes.

Say hello to profile.class.php, a simple timer class for PHP5 which you can use to get an overview about where how much time is spent. This is in no way a scientific method nor should you take the results of a single run as a basis for you decisions.

I’ve set an emphasize on an easy API so you don’t have to pollute your code with arbitrary hoops and whistles.

UPDATE: You can find the current updated source in the SVN repo of GeSHi.