MW

UTF-8 Wordwrap

If you use UTF-8 in your PHP projects you may want to use [wordwrap](http://www.php.net/wordwrap)(). But that function can’t handle multibyte characters and may mess up your text.

Don’t be annoyed - help is near!

The only PHP UTF-8 wordwrap function I found was the one by tjomi4 at yeap dot lv in the notes of the PHP manual. I took it and improved it a bit:

  1. completly the same syntax as the original wordwrap function: string utf8_wordwrap(string $str, integer $width, string $break [, bool $cut]);
  2. The $cut parameter is supported (tjomi4’s function only supports $cut = true).
    But be careful : I use regular expression word boundaries (\b) for this feature. I’m not sure if this works everywhere!

  3. The function uses the multibyte extension if installed for counting the string length
  4. The regular expression inside the while loop is shorter and uses [preg_match](http://www.php.net/preg_match)() instead of [preg_replace](http://www.php.net/preg_replace)(). That should improve performance and prevent a strange bug (Compilation failed: regular expression too large)

But enough of that talk, i present you:

UTF-8 Wordwrap for PHP

    /**
     * wordwrap for utf8 encoded strings
     *
     * @param string $str
     * @param integer $len
     * @param string $what
     * @return string
     * @author Milian Wolff <mail@milianw.de>
     */
     
    function utf8_wordwrap($str, $width, $break, $cut = false) {
        if (!$cut) {
            $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
        } else {
            $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
        }
        if (function_exists('mb_strlen')) {
            $str_len = mb_strlen($str,'UTF-8');
        } else {
            $str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
        }
        $while_what = ceil($str_len / $width);
        $i = 1;
        $return = '';
        while ($i < $while_what) {
            preg_match($regexp, $str,$matches);
            $string = $matches[0];
            $return .= $string.$break;
            $str = substr($str, strlen($string));
            $i++;
        }
        return $return.$str;
    }

Comments

Want to comment? Send me an email!

Comment by Anonymous (not verified) (2014-04-04 20:28:00)

Thank you very much!

Comment by Hazem Noor (not verified) (2013-08-22 22:40:00)

Thank you, your function made my life better :-)

I got that error ” Notice: Undefined offset: 0 in C:\xampplite\htdocs\www\1.php on line 21 ” and the fix is

    function utf8_wordwrap($str, $width, $break, $cut = false) {
       if (!$cut) {
           $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.',}\b#U';
        } else {
           $regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
        }
     
       $str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
     
       $while_what = ceil($str_len / $width);
     $i = 1;
        $return = '';
        while ($i < $while_what) {
          $i++;
          preg_match($regexp, $str, $matches);
           if(isset($matches[0])) {
               $string = $matches[0];
             $return .= $string.$break;
             $str = substr($str, strlen($string));
          }
      }
     
       return $return.$str;
    }
Comment by Fosfor (not verified) (2011-02-14 04:16:00)

Hi, shouldn’t be mb_substr and mb_strlen at line 29? But it is not working for me even with mb_ functions. Mixing \r\n and \n (passed \r\n as $break), breaking after 13 characters on 18-chars line with 75 passed as $width, cutting words in the middle (short words - less chars then $width)… Sorry, but this function is unusable, searching furthermore…

Comment by Anonymous (not verified) (2010-02-09 20:36:00)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Comment by Milian Wolff (2010-02-09 21:50:00)

this comment shows beautifully that Drupal doesn’t do wordwrap in comments ;-)

Comment by Anonymous (not verified) (2010-02-08 09:30:00)
What does #^(?:[\x00-\x7F][\xC0-\xFF][\x80-\xBF]+){ means?
Comment by Milian Wolff (2010-02-08 14:37:00)

See for yourself:

http://en.wikipedia.org/wiki/Regular_expressions
http://en.wikipedia.org/wiki/Hexadecimal

Comment by Martin (not verified) (2009-09-18 00:03:00)

I’m still getting Notice: Undefined offset:0 on line 27 in your example. Don’t you know where’s the problem? Thx.

Comment by Milian Wolff (2009-09-18 00:50:00)

Looks like the preg_match line above doesn’t match anything.

It’s too long since I used this function myself, sorry. I won’t try to fix this up myself.

Comment by Andrew (not verified) (2007-05-05 01:57:00)

Excellent work. Thanks for the improvements. Now, my problem is that URLs got cut as well. It would be nice to have a full text treatment function that could take text mixed with URLs, wrap the URLs with link tags, and then word-cut the text (including the link’s inner text [the url])

So for example:

    aaaaaaaaaaaaaaaaaaa http://gooooooooooooooooooogle.com jortewaofnweafwa

would become

    aaaaaa aaaaaaaaaaaa <a href="http://goooooooooooooooooooooogle.com">http:&#x200B;//gooooooooooooooooogle.com</a> jortewaofn weafwa

I’m not sure I understand the current function enough to attempt it myself…WordPress’s make_clickable is a good start, but not with the word-cutting.

-Andrew

Comment by Anonymous (not verified) (2008-04-10 04:55:00)

I think this should help: http://www.greywyvern.com/code/php/htmlwrap.phps

Comment by Milian Wolff (2007-05-05 14:26:00)

That’s hard. I’d say you should combine both functions and do more or less the following:

    look for long links
    -> save links in an array
    -> replace links with shorter tokens
    wordwrap
    -> replace tokens with long links
    make_clickable

The question is how these tokens should look like. I’d say you could take some unfrequently used chars (e.g. »«|) and the key of the link array afterwards or something. Dunno.

Maybe I’ll try to code something like this later on, but try it yourself first please as I do lack time currently.

Published on May 05, 2007.