PHP "is_whitespace" performance
Easy question: What is the fastest way to determine if a string in PHP is whitespace-only?
Easy answer: ![preg_match](http://www.php.net/preg_match)('[^\s]', $string);
Read on for the explanation:
I applied the codeblocks feature of my profile.class.php
, here’s the testcase:
<?php
require 'profile.class.php';
$iterations = 10000;
profile::codeblocks(array(
'trim(long string) == ""' => 'trim($long) == ""',
'trim(long string) === ""' => 'trim($long) === ""',
'rtrim(long string) == ""' => 'rtrim($long) == ""',
'rtrim(long string) === ""' => 'rtrim($long) === ""',
'ltrim(long string) == ""' => 'ltrim($long) == ""',
'ltrim(long string) === ""' => 'ltrim($long) === ""',
'!preg_match("[^\s]", long string)' => '!preg_match("[^\s]", $long)',
'ctype_space(long string)' => 'ctype_space($long)',
'trim(short string) == ""' => 'trim($short) == ""',
'trim(short string) === ""' => 'trim($short) === ""',
'rtrim(short string) == ""' => 'rtrim($short) == ""',
'rtrim(short string) === ""' => 'rtrim($short) === ""',
'ltrim(short string) == ""' => 'ltrim($short) == ""',
'ltrim(short string) === ""' => 'ltrim($short) === ""',
'!preg_match("[^\s]", short string)' => '!preg_match("[^\s]", $short)',
'ctype_space(short string)' => 'ctype_space($short)',
), array(
'long' => str_repeat(" \n\t ", 500) . "a" . str_repeat(" \n\t ", 500),
'short' => str_repeat(" \n\t ", 5) . "a" . str_repeat(" \n\t ", 5),
), $iterations);
profile::print_results(profile::flush());
As an example I get this as a result (usual fluctuations apply though the gist of this is reproducible):
=== profile results ===
Timer | Time Diff | Time Deviation | Mem Diff | Mem Deviation
--------------------------------------------------------------------------------------------------------------------
ctype_space(short string) | 0.149848s | 100.00% | 676B | +0.40%
rtrim(short string) == "" | 0.157706s | 105.24% | 648B | +0.39%
trim(short string) == "" | 0.158682s | 105.90% | 640B | +0.34%
ltrim(short string) == "" | 0.164026s | 109.46% | 640B | +0.36%
!preg_match("[^\s]", long string) | 0.167598s | 111.85% | 756B | +0.43%
!preg_match("[^\s]", short string) | 0.187268s | 124.97% | 764B | +0.42%
ctype_space(long string) | 0.205380s | 137.06% | 792B | +0.48%
ltrim(long string) == "" | 0.258338s | 172.40% | 528B | +0.28%
trim(long string) == "" | 0.346503s | 231.24% | 660B | +0.34%
rtrim(long string) == "" | 0.387900s | 258.86% | 720B | +0.41%
--------------------------------------------------------------------------------------------------------------------
Summary: the [preg_match](http://www.php.net/preg_match)()
way has a nearly constant performance for both long and short strings. It’s a bit slower for short strings (10%) but 5x as fast when using long strings. Also interesting is that [ctype_space](http://www.php.net/ctype_space)()
is not faster.
The explanation is simple as well: [preg_match](http://www.php.net/preg_match)()
(and [ctype_space](http://www.php.net/ctype_space)()
as well) can break at the first available non-whitespace to yield a result. [trim](http://www.php.net/trim)()
needs to traverse the beginning of the string (comparable the the other two functions) but than also needs to do the same for the end of the string. Of course [ltrim](http://www.php.net/ltrim)()
is faster since it only needs to go through half the string.
Not really obvious to me is why there is such a relatively huge difference between [ltrim](http://www.php.net/ltrim)()
and the faster alternatives. Additionally it is totally unclear to me why [rtrim](http://www.php.net/rtrim)()
is reproducably the slowest function in the mix!
Hope this clears things up for some of you.
PS: NOTE: Never forget your numbers. I applied each function 15k times. In your average application it will not make any difference which of the above function you use. But parsers like GeSHi tend to use such functions thousands of times and there it could make a difference.
Comments
Want to comment? Send me an email!
Comment by Daniel Hahler (not verified) (2009-02-12 15:11:00)
The following appears to be shorter / faster(?) (“\S” (uppercase) mean “no whitespace”):
or
.