Improving KDevelop-PG-Qt

Good news everyone :)


After years of pretty much no documentation (except looking at the sources… ), The_User aka. Jonathan Schmidt-Dominé started documenting the parser generator that is used for most KDevelop language plugins (java, python, php, …). You can find it here:

It has to be improved and more examples have to be put in there, but it’s already a huge improvement over the situation before…


In related news I did some profiling on the parsing of the quite big file that includes all internal PHP declarations (i.e. all functions, classes, definitions,…). It drops in at a whopping 1.9M, with ~80k lines. Well, turns out that this showed a pretty easy to fix bottleneck in KDevelop-PG’s LocationTable, which used to use a linear lookup algorithm. Profiling showed that nearly 75% was spent in that function. But I used the past tense for a reason:

I replaced it with an algorithm that combines a relative search (i.e. relative to the last lookup) with a binary search fallback. That’s comparatively blazingly fast. I added some benchmarks to KDevelop-PG-Qt that proofs that (benchmark below run with release mode build):

    ********* Start testing of KDevPG::Benchmarks *********
    Config: Using QTest library 4.5.2, Qt 4.5.2
    PASS   : KDevPG::Benchmarks::initTestCase()
    RESULT : KDevPG::Benchmarks::positionAt():"initial, linear":
         1,184.0 msec per iteration (total: 11840, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"initial, random":
         1,008.5 msec per iteration (total: 10085, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"relative, linear":
         1.3 msec per iteration (total: 14, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"relative, random":
         1,185.0 msec per iteration (total: 11850, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"binary, linear":
         31.1 msec per iteration (total: 312, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"binary, random":
         39.2 msec per iteration (total: 392, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"binary & relative, linear":
         0.8 msec per iteration (total: 8, iterations: 10)
    RESULT : KDevPG::Benchmarks::positionAt():"binary & relative, random":
         40.2 msec per iteration (total: 402, iterations: 10)
    PASS   : KDevPG::Benchmarks::positionAt()
    PASS   : KDevPG::Benchmarks::cleanupTestCase()
    Totals: 3 passed, 0 failed, 0 skipped
    ********* Finished testing of KDevPG::Benchmarks *********

In our “realworld” phpfunctions.php example the DUChain building process got twice as fast (in debug mode though, but still).

We should probably get lost of the LocationTable altogether and use an existing container (QList, QVector, QLinkedList,… or any STL variant of them). But this would mean more profiling, and well, lets see when we get to it… But just got a reply on the KDevelop list showing interest on this, so maybe someone does that eventually!

Profiling rocks, and KCacheGrind is such a great application. I love such visualization. Well done! And the QTestLib benchmark utilities are also very solid, nice!

PS: KDevelop beta6 is in the pipelines and we’ll release a beta1 of the PHP plugin together with it. Packagers and users rejoice :)


Want to comment? Send me an email!

Comment by Roberto Raggi (not verified) (2009-10-27 00:38:00)

By the way, I’m not sure that caching the result of positionAt is a good idea, you really want positionAt() to be reentrant so it can be invoked from different threads without locking.

ciao robe

Comment by Milian Wolff (2009-10-27 01:19:00)

Good point, thanks for pointing this out.

Comment by Milian Wolff (2009-10-27 01:21:00)

actually, is using currentLine “safe”? or should that also be avoided?

looks like I’ll have some reading to do…

Comment by Milian Wolff (2009-10-27 01:35:00)

looking at it: isn’t the algorithm thread-unsafe just by accessing the lines-array? I mean if I call newline() or resize() from another thread anything could happen, right?

Comment by Roberto Raggi (not verified) (2009-10-27 11:42:00)

Well, LocationTable::newline/resize and LocationTable::position_at have different use cases. It is true that newline() and resize() are not reentrant, but it is also true that you call these methods only from the parsing phase so there is really no reason to lock or make them reentrant.

After the parsing phase you will probably run many visitors on the AST and there is a good chance that those visitors will call positionAt(), that’s why you want to keep positionAt reentrant.

ciao robe

Comment by Roberto Raggi (not verified) (2009-10-26 20:34:00)

Hmm, I’m pretty sure I was using a binary search there (std::lower_bound)…

search for position_at

ciao robe

Comment by Milian Wolff (2009-10-27 00:07:00)

Oh nice!

I’ll add a benchmark for your code, if it’s comparable (or faster, which I hope) I’ll use it instead, since I don’t want to reinvent the wheel. Thanks Roberto! Btw. pretty good job on the PG, thanks for that :)

Comment by Niko (not verified) (2009-10-26 19:38:00)

yay for a twice fast php support!

Performance tuning is fun :D