Update: Use Markdownify, it’s the successor to html2text.
I changed my html2text.php function and it now supports non markdownable elements better. Previously something like
<p class="foobar">...</p> would have resulted in
<p>...</p>. Now these elements (which could be ported to markdown) will be left in plain html.
Additionally I made some changes which should lead to an improved performance.
Get it while it’s hot: html2text.php 1.1 (.tar.gz ~ 120.9 KB)
Yes, there are some, which I’ll try to fix in the next days (note: to better point out the bugs I just write what happens if you convert html to markdown to html):
- Also if the parent element (e.g.
<table>) gets parsed and a child
<tr>,<td> or<th>has attributes they will be ignored and dropped. Workaround: Add a attribute to the parent element (e.g. a class / id).
If you give a single
<li>element in the middle of a list some attributes it wont lose them, but will produce not well formed html:
<ul><li>abc</li> <li class="foo">bar</li> </ul>
Will result in:
``` <ul> <li>abc <li class="foo">bar</li></li> </ul> ```
<pre><code some="attrib">will result in
For more information read the bottom of the html2text.php site