Bloghtml2text.php 1.1 Syndicate content

Sun, 07/23/2006 - 03:03

Update: Use Markdownify, it’s the successor to html2text.

I changed my html2text.php function and it now supports non markdownable elements better. Previously something like <p class="foobar">...</p> would have resulted in <p>...</p>. Now these elements (which could be ported to markdown) will be left in plain html.

Additionally I made some changes which should lead to an improved performance.

Download

Get it while it’s hot: html2text.php 1.1 (.tar.gz ~ 120.9 KB)

Known Bugs

Yes, there are some, which I’ll try to fix in the next days (note: to better point out the bugs I just write what happens if you convert html to markdown to html):

  • Also if the parent element (e.g. <table>) gets parsed and a child <tr>,<td> or<th> has attributes they will be ignored and dropped. Workaround: Add a attribute to the parent element (e.g. a class / id).
  • If you give a single <li> element in the middle of a list some attributes it wont lose them, but will produce not well formed html:

    1. <ul><li>abc</li> <li class="foo">bar</li> </ul>

    Will result in:

    1. <ul> <li>abc <li class="foo">bar</li></li> </ul>
  • <pre><code some="attrib"> will result in <pre><code><code some="attrib">

For more information read the bottom of the html2text.php site

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • You can use Markdown syntax to format and style the text. Also see Markdown Extra for tables, footnotes, and more.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <pre>.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options