UTF-8 Wordwrap
If you use UTF-8 in your PHP projects you may want to use [wordwrap](http://www.php.net/wordwrap)()
. But that function can’t handle multibyte characters and may mess up your text.
Don’t be annoyed - help is near!
The only PHP UTF-8 wordwrap function I found was the one by tjomi4 at yeap dot lv in the notes of the PHP manual. I took it and improved it a bit:
- completly the same syntax as the original wordwrap function:
string utf8_wordwrap(string $str, integer $width, string $break [, bool $cut]);
The
$cut
parameter is supported (tjomi4’s function only supports$cut = true
).
But be careful : I use regular expression word boundaries (\b
) for this feature. I’m not sure if this works everywhere!- The function uses the multibyte extension if installed for counting the string length
- The regular expression inside the while loop is shorter and uses
[preg_match](http://www.php.net/preg_match)()
instead of[preg_replace](http://www.php.net/preg_replace)()
. That should improve performance and prevent a strange bug (Compilation failed: regular expression too large
)
But enough of that talk, i present you:
UTF-8 Wordwrap for PHP
/**
* wordwrap for utf8 encoded strings
*
* @param string $str
* @param integer $len
* @param string $what
* @return string
* @author Milian Wolff <mail@milianw.de>
*/
function utf8_wordwrap($str, $width, $break, $cut = false) {
if (!$cut) {
$regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
} else {
$regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
}
if (function_exists('mb_strlen')) {
$str_len = mb_strlen($str,'UTF-8');
} else {
$str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
}
$while_what = ceil($str_len / $width);
$i = 1;
$return = '';
while ($i < $while_what) {
preg_match($regexp, $str,$matches);
$string = $matches[0];
$return .= $string.$break;
$str = substr($str, strlen($string));
$i++;
}
return $return.$str;
}
Comments
Want to comment? Send me an email!
Comment by Anonymous (not verified) (2014-04-04 20:28:00)
Thank you very much!
Comment by Hazem Noor (not verified) (2013-08-22 22:40:00)
Thank you, your function made my life better :-)
I got that error ” Notice: Undefined offset: 0 in C:\xampplite\htdocs\www\1.php on line 21 ” and the fix is
Comment by Fosfor (not verified) (2011-02-14 04:16:00)
Hi, shouldn’t be mb_substr and mb_strlen at line 29? But it is not working for me even with mb_ functions. Mixing \r\n and \n (passed \r\n as $break), breaking after 13 characters on 18-chars line with 75 passed as $width, cutting words in the middle (short words - less chars then $width)… Sorry, but this function is unusable, searching furthermore…
Comment by Anonymous (not verified) (2010-02-09 20:36:00)
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Comment by Milian Wolff (2010-02-09 21:50:00)
this comment shows beautifully that Drupal doesn’t do wordwrap in comments ;-)
Comment by Anonymous (not verified) (2010-02-08 09:30:00)
Comment by Milian Wolff (2010-02-08 14:37:00)
See for yourself:
http://en.wikipedia.org/wiki/Regular_expressions
http://en.wikipedia.org/wiki/Hexadecimal
Comment by Martin (not verified) (2009-09-18 00:03:00)
I’m still getting Notice: Undefined offset:0 on line 27 in your example. Don’t you know where’s the problem? Thx.
Comment by Milian Wolff (2009-09-18 00:50:00)
Looks like the preg_match line above doesn’t match anything.
It’s too long since I used this function myself, sorry. I won’t try to fix this up myself.
Comment by Andrew (not verified) (2007-05-05 01:57:00)
Excellent work. Thanks for the improvements. Now, my problem is that URLs got cut as well. It would be nice to have a full text treatment function that could take text mixed with URLs, wrap the URLs with link tags, and then word-cut the text (including the link’s inner text [the url])
So for example:
would become
I’m not sure I understand the current function enough to attempt it myself…WordPress’s make_clickable is a good start, but not with the word-cutting.
-Andrew
Comment by Anonymous (not verified) (2008-04-10 04:55:00)
I think this should help: http://www.greywyvern.com/code/php/htmlwrap.phps
Comment by Milian Wolff (2007-05-05 14:26:00)
That’s hard. I’d say you should combine both functions and do more or less the following:
The question is how these tokens should look like. I’d say you could take some unfrequently used chars (e.g.
»«|
) and the key of the link array afterwards or something. Dunno.Maybe I’ll try to code something like this later on, but try it yourself first please as I do lack time currently.