Code SnippetsUTF-8 Wordwrap Syndicate content

Tue, 07/18/2006 - 00:06

If you use UTF-8 in your PHP projects you may want to use wordwrap(). But that function can’t handle multibyte characters and may mess up your text.

Don’t be annoyed - help is near!

The only PHP UTF-8 wordwrap function I found was the one by tjomi4 at yeap dot lv in the notes of the PHP manual. I took it and improved it a bit:

  1. completly the same syntax as the original wordwrap function: string utf8_wordwrap(string $str, integer $width, string $break [, bool $cut]);
  2. The $cut parameter is supported (tjomi4’s function only supports $cut = true).
    But be careful: I use regular expression word boundaries (\b) for this feature. I’m not sure if this works everywhere!
  3. The function uses the multibyte extension if installed for counting the string length
  4. The regular expression inside the while loop is shorter and uses preg_match() instead of preg_replace(). That should improve performance and prevent a strange bug (Compilation failed: regular expression too large)

But enough of that talk, i present you:

UTF-8 Wordwrap for PHP
  1. /**
  2.  * wordwrap for utf8 encoded strings
  3.  *
  4.  * @param string $str
  5.  * @param integer $len
  6.  * @param string $what
  7.  * @return string
  8.  * @author Milian Wolff <mail@milianw.de>
  9.  */
  10.  
  11. function utf8_wordwrap($str, $width, $break, $cut = false) {
  12. if (!$cut) {
  13. $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
  14. } else {
  15. $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
  16. }
  17. if (function_exists('mb_strlen')) {
  18. $str_len = mb_strlen($str,'UTF-8');
  19. } else {
  20. $str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
  21. }
  22. $while_what = ceil($str_len / $width);
  23. $i = 1;
  24. $return = '';
  25. while ($i < $while_what) {
  26. preg_match($regexp, $str,$matches);
  27. $string = $matches[0];
  28. $return .= $string.$break;
  29. $str = substr($str, strlen($string));
  30. $i++;
  31. }
  32. return $return.$str;
  33. }

Comments

Thank you very much! Fri, 04/04/2014 - 20:28 — Anonymous (not verified)

Thank you very much!

Thank you, your function made Thu, 08/22/2013 - 22:40 — Hazem Noor (not verified)

Thank you, your function made my life better :-)

I got that error ” Notice: Undefined offset: 0 in C:\xampplite\htdocs\www\1.php on line 21 ” and the fix is

  1. function utf8_wordwrap($str, $width, $break, $cut = false) {
  2. if (!$cut) {
  3. $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
  4. } else {
  5. $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
  6. }
  7.  
  8. $str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
  9.  
  10. $while_what = ceil($str_len / $width);
  11. $i = 1;
  12. $return = '';
  13. while ($i < $while_what) {
  14. $i++;
  15. preg_match($regexp, $str, $matches);
  16. if(isset($matches[0])) {
  17. $string = $matches[0];
  18. $return .= $string.$break;
  19. $str = substr($str, strlen($string));
  20. }
  21. }
  22.  
  23. return $return.$str;
  24. }

Hi, shouldn’t be mb_substr Mon, 02/14/2011 - 04:16 — Fosfor (not verified)

Hi, shouldn’t be mb_substr and mb_strlen at line 29? But it is not working for me even with mb_ functions. Mixing \r\n and \n (passed \r\n as $break), breaking after 13 characters on 18-chars line with 75 passed as $width, cutting words in the middle (short words - less chars then $width)… Sorry, but this function is unusable, searching furthermore…

aaaaaaaaaaaaaaaaaaaaaaaaaaaaa Tue, 02/09/2010 - 20:36 — Anonymous (not verified)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

this comment shows Tue, 02/09/2010 - 21:50 — Milian Wolff

this comment shows beautifully that Drupal doesn’t do wordwrap in comments ;-)

What does Mon, 02/08/2010 - 09:30 — Anonymous (not verified)

What does #^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){ means?

I’m still getting Notice: Fri, 09/18/2009 - 00:03 — Martin (not verified)

I’m still getting Notice: Undefined offset:0 on line 27 in your example. Don’t you know where’s the problem? Thx.

Looks like the preg_match Fri, 09/18/2009 - 00:50 — Milian Wolff

Looks like the preg_match line above doesn’t match anything.

It’s too long since I used this function myself, sorry. I won’t try to fix this up myself.

Excellent work. Thanks for Sat, 05/05/2007 - 01:57 — Andrew (not verified)

Excellent work. Thanks for the improvements. Now, my problem is that URLs got cut as well. It would be nice to have a full text treatment function that could take text mixed with URLs, wrap the URLs with link tags, and then word-cut the text (including the link’s inner text [the url])

So for example:

  1. aaaaaaaaaaaaaaaaaaa http://gooooooooooooooooooogle.com jortewaofnweafwa

would become

  1. aaaaaa aaaaaaaaaaaa <a href="http://goooooooooooooooooooooogle.com">http:&#x200B;//gooooooooooooooooogle.com</a> jortewaofn weafwa

I’m not sure I understand the current function enough to attempt it myself…WordPress’s make_clickable is a good start, but not with the word-cutting.

-Andrew

I think this should help: Thu, 04/10/2008 - 04:55 — Anonymous (not verified)

That’s hard. I’d say you Sat, 05/05/2007 - 14:26 — Milian Wolff

That’s hard. I’d say you should combine both functions and do more or less the following:

  1. look for long links
  2. -> save links in an array
  3. -> replace links with shorter tokens
  4. wordwrap
  5. -> replace tokens with long links
  6. make_clickable

The question is how these tokens should look like. I’d say you could take some unfrequently used chars (e.g. »«|) and the key of the link array afterwards or something. Dunno.

Maybe I’ll try to code something like this later on, but try it yourself first please as I do lack time currently.

Post new comment

  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options