Yes, I’m finally gearing towards the release of my html2text.php
successor, dubbed Markdownify. I’m using exessive testing and utilize the MDTest suite to find potential regressions etc. I’m really enjoying to program little CLI scripts with PHP, it just works like a charm.
Here’s an example of how my test suite currently looks like:
To the left is the original input (HTML), in the middle you find the generated Markdown and to the right HTML again - but now generated via PHP Markdown by Michel Fortin. The pretty colors mark changes between the two HTML versions. I use PEAR Text_Diff for this and a little of my own code. But since all of the existing diff engines for Text_Diff took ages for the Markdown Documentation (~400 lines afair), I wrote a Text_Diff engine which utilizes [shell_exec](http://www.php.net/shell_exec)()
and GNU diff. This is blazingly fast and works like a charm! You can get the source code over at pastebin.org. Also take a look on the feature request I made. Dunno if this was the correct place for that…
continue reading...
Yes! I’ve finally done it. I’ve moved my website to Drupal, which is so much better than my old 3co stuff. Tons of great modules out there and those I couldn’t find for Drupal 6, which was recently released, I ported. Well not all of them, there are still some I’m really looking forward. On the top of my list are definitely the spam module and the Akismet module. Minutes after my move I got my first spam comment…
Well let’s see how I might get involved into Drupal development. I already filed some patches for the following modules:
- Marksmarty: better GeSHi support and some other minor things, but it doesn’t seem to be what the maintainer wants. I’ll have to move it into another extra-module then. Also some work to separate Smartypants and Markdown into distinct modules. Furtheron I’ve added support for PHP Markdown’s no-markup mode. This as well needs some more work. Maybe it will be dropped again and the pristine Drupal HTML Filter will be used, lets see.
continue reading...
I’ve just released a second Markdownify Beta with better PHP 4 support and some other small bug fixes. You can download it from sourceforge.
continue reading...
Finally I’ve completed the Markdownify website. Also I’ve released the first beta, here the news text from SourceForge:
This is the first beta release of Markdownify - the HTML to Markdown converter for PHP.
It is very stable and should handle nearly all features of Markdown and Markdown Extra syntax. Missing are only two things:
- “Markdown inside block elements” for Markdownify Extra
These two things will be added before the first “stable” release. Additionally some performance improvements will hopefully be added.
You are encouraged to use this release in your web applications. Please let me know if you find any bugs. Also a code review by anyone would be very much appreciated!
Download it now
continue reading...
A few days ago I started a complete rewrite of html2text. It now uses a new htmlparser (also written by me) which should make the whole HTML cleanup process obsolete. The generic XML parser which is currently used dies on invalid XHTML, with my parser it should be possible do handle errors and parse HTML 4.01 documents without any regex magic beforehand.
You’ll hear more of this in about a week as I’ll be on vacation until the 24th.
continue reading...
Update: Use Markdownify, it’s the successor to html2text.
I just released html2text version 1.3 which sports a ton of bug fixes. Most notably all features of php markdown extra are now fully supported, including footnotes and abbrevations.
Also wrapping should work like intended and inline links (like <foo@bar.com>
) won’t be converted to block links (like [foo@bar.com]([foo@bar.com](mailto:foo@bar.com))
).
In the next version I’ll add some more options, especially disabling php markdown extra support. Also I’ll clean up the code a bit.
Merry Christmas to you!
continue reading...
Update: Use Markdownify, it’s the successor to html2text.
I changed my html2text.php function and it now supports non markdownable elements better. Previously something like <p class="foobar">...</p>
would have resulted in <p>...</p>
. Now these elements (which could be ported to markdown) will be left in plain html.
Additionally I made some changes which should lead to an improved performance.
Download
Get it while it’s hot: html2text.php 1.1 (.tar.gz ~ 120.9 KB)
Known Bugs
Yes, there are some, which I’ll try to fix in the next days (note: to better point out the bugs I just write what happens if you convert html to markdown to html):
continue reading...