MW

Kate/KDevelop Sprint 2014: First Days

Hey all! Greetings from the joint Kate/KDevelop sprint at the Blue Systems office in Barcelona!

I only arrived yesterday but already I have great news for you: After months of work I finally merged the sharedurls branches into master for KDevelop/KDevplatform etc. pp. There I worked on a optimization in our handling of file paths.

The status quo up until know was the following: When importing a folder as a project in KDevelop, we filled a model with every file and folder in the project (recursively). For every item we also stored its path as a KUrl to the potentially remote location. KUrl and QUrl are awesome when you have to work with paths and urls, but as soon as you store potentially thousands of them at the same time it becomes quite inefficient. Assume e.g. you open /foo/bar/blub/ which contains /foo/bar/blub/bla.h. When you use KUrl/QUrl to store these paths, you cannot share any memory between the two, as internally basically a QString is used. Thus, when you import deep folder trees or folders with many files, you’ll waste a lot of memory for common sub-paths. Furthermore, due to the amount of allocations required, reading the tree is pretty slow.

So in the sharedurls branch I worked on a internal replacement for KUrl in KDevelop: It’s called KDevelop::Path and is a glorified QVector<QString> with convenience API to simulate a KUrl and simplify porting. Every entry in the vector contains a path segment. It leverages QStrings implicit sharing to minimize the memory overhead. Furthermore, when you parse a tree structure recursively, all you do is copying vectors and appending strings to them - which is rather cheap as a QString is a small handle structure.

So all in all this should greatly improve the performance of opening projects in KDevelop. Especially for large sessions containing thousands of files (eg.: Qt 4, multiple Qt 5 modules, LibreOffice, Kernel, WebKit, …) the new code is much faster and consumes less memory. I’ve seen time savings in the order of multiple seconds in total as well as memory consumption going down in the order of 100MB.

While this sounds like a fairy-tale, I have to admit that it was/is a lot of work: By using a custom class, you have to convert to KUrl/QUrl or QString quite often when interacting with existing API. This of course is costly and potentially marginalizes or even pessimises the potential performance gains. Hence one needs to pay special attention and port code such that it minimizes these conversions. As such I can only recommend anyone doing something like that when you have similar extreme usecase. For a normal file browser or web browser I doubt the you’ll gain much if anything.

So please compile the current master branches and take a look for yourself. My tests and benchmarks look all good, yet I might have overlooked something. If you spot any regressions, please shout!

Now that this is mostly done and polished, I’ll continue working on Clang integration in KDevelop. Stay tuned for the next blog entry about that topic :) And already a huge thank you to Aleix Pol for organizing this sprint, to Blue Systems for having us, and to the KDE e.V. for sponsoring the trip and accommodation!

Comments

Want to comment? Send me an email!

Comment by Frank Reininghaus (not verified) (2014-01-22 00:52:00)

Your work on these issues is greatly appreciated! I always enjoy reading your posts and try to draw some inspiration from them. Sharing parts of the paths is a very clever idea!

When I read your comment about expensive “full path” to/from “segmented path” conversions, I had the following idea, would might make these conversions cheaper. In some situations, it might even reduce the memory usage further:

Instead of a QVector, just store two strings, such that each conversion only involves a split into, or a concatenation of, two strings:

    QString parentPath; // everything before the last slash; shared with other objects
    QString fileName;

This would require 2*8 bytes for the QStrings’ d pointers plus the contents of parentPath (usually shared with many other objects) plus the contents of fileName.

The QVector approach needs, depending on the level N in the file system hierarchy, 8 bytes for the QVector d pointer, 16 bytes for QVector’s bookkeeping, 8*N bytes for the QStrings’ d pointers stored inside the QVector, plus the contents of the strings (usually shared, except for the last part).

Which approach is more memory-efficient will depend on N and also on the average length of the segments - if they are very long, then sharing the individual segments might be better than sharing the total “parentPath”, but I would guess that there are many situations where the “split into two strings only” approach wins.

But this is just an idea. One would have to run some benchmarks to see how it works.

Comment by Milian Wolff (2014-01-22 01:40:00)

It would certainly be better and use less memory but it would also complicate the code further which I want to avoid at this point. And anything before the last slash is not enough, as you’d then waste memory when you have something like this:

    /foo
    /foo/bar (shares /foo)
    /foo/bar/asdf (would copy /foo/bar and share that)

Hm thinking about it this might actually work instead:

    union { KUrl url; Path path };
    // if empty, just use url, otherwise use Path as parent and append this string
    QString appended;

But again, it would complicate the code. Esp. when doing something like cd or parent. The path as it is right now is really simple code-wise and I like that :)

Published on January 22, 2014.