BlogPHP "is_whitespace" performance Syndicate content

Fri, 08/22/2008 - 00:09

Easy question: What is the fastest way to determine if a string in PHP is whitespace-only?

Easy answer: !preg_match('[^\s]', $string);

Read on for the explanation:

I applied the codeblocks feature of my profile.class.php, here’s the testcase:

  1. <?php
  2. require 'profile.class.php';
  3.  
  4. $iterations = 10000;
  5.  
  6. profile::codeblocks(array(
  7. 'trim(long string) == ""' => 'trim($long) == ""',
  8. 'trim(long string) === ""' => 'trim($long) === ""',
  9. 'rtrim(long string) == ""' => 'rtrim($long) == ""',
  10. 'rtrim(long string) === ""' => 'rtrim($long) === ""',
  11. 'ltrim(long string) == ""' => 'ltrim($long) == ""',
  12. 'ltrim(long string) === ""' => 'ltrim($long) === ""',
  13. '!preg_match("[^\s]", long string)' => '!preg_match("[^\s]", $long)',
  14. 'ctype_space(long string)' => 'ctype_space($long)',
  15. 'trim(short string) == ""' => 'trim($short) == ""',
  16. 'trim(short string) === ""' => 'trim($short) === ""',
  17. 'rtrim(short string) == ""' => 'rtrim($short) == ""',
  18. 'rtrim(short string) === ""' => 'rtrim($short) === ""',
  19. 'ltrim(short string) == ""' => 'ltrim($short) == ""',
  20. 'ltrim(short string) === ""' => 'ltrim($short) === ""',
  21. '!preg_match("[^\s]", short string)' => '!preg_match("[^\s]", $short)',
  22. 'ctype_space(short string)' => 'ctype_space($short)',
  23. ), array(
  24. 'long' => str_repeat(" \n\t ", 500) . "a" . str_repeat(" \n\t ", 500),
  25. 'short' => str_repeat(" \n\t ", 5) . "a" . str_repeat(" \n\t ", 5),
  26. ), $iterations);
  27.  
  28. profile::print_results(profile::flush());

As an example I get this as a result (usual fluctuations apply though the gist of this is reproducible):

  1. === profile results ===
  2.  
  3. Timer | Time Diff | Time Deviation | Mem Diff | Mem Deviation
  4. --------------------------------------------------------------------------------------------------------------------
  5. ctype_space(short string) | 0.149848s | 100.00% | 676B | +0.40%
  6. rtrim(short string) == "" | 0.157706s | 105.24% | 648B | +0.39%
  7. trim(short string) == "" | 0.158682s | 105.90% | 640B | +0.34%
  8. ltrim(short string) == "" | 0.164026s | 109.46% | 640B | +0.36%
  9. !preg_match("[^\s]", long string) | 0.167598s | 111.85% | 756B | +0.43%
  10. !preg_match("[^\s]", short string) | 0.187268s | 124.97% | 764B | +0.42%
  11. ctype_space(long string) | 0.205380s | 137.06% | 792B | +0.48%
  12. ltrim(long string) == "" | 0.258338s | 172.40% | 528B | +0.28%
  13. trim(long string) == "" | 0.346503s | 231.24% | 660B | +0.34%
  14. rtrim(long string) == "" | 0.387900s | 258.86% | 720B | +0.41%
  15. --------------------------------------------------------------------------------------------------------------------

Summary: the preg_match() way has a nearly constant performance for both long and short strings. It’s a bit slower for short strings (10%) but 5x as fast when using long strings. Also interesting is that ctype_space() is not faster.

The explanation is simple as well: preg_match() (and ctype_space() as well) can break at the first available non-whitespace to yield a result. trim() needs to traverse the beginning of the string (comparable the the other two functions) but than also needs to do the same for the end of the string. Of course ltrim() is faster since it only needs to go through half the string.

Not really obvious to me is why there is such a relatively huge difference between ltrim() and the faster alternatives. Additionally it is totally unclear to me why rtrim() is reproducably the slowest function in the mix!

Hope this clears things up for some of you.

PS: NOTE: Never forget your numbers. I applied each function 15k times. In your average application it will not make any difference which of the above function you use. But parsers like GeSHi tend to use such functions thousands of times and there it could make a difference.

Comments

The following appears to be Thu, 02/12/2009 - 15:11 — Daniel Hahler (not verified)

The following appears to be shorter / faster(?) (“\S” (uppercase) mean “no whitespace”):
<?php !preg_match(‘\S’, $string) ?>
or
<?php preg_match(‘^\s+$’, $string) ?>.

Post new comment

  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options