Transcoding module

One of the new features in mod_proxy_html 3.0 is improved i18n support, adding character sets supported by apr_xlate (normally iconv) to those supported by libxml2.

In generalising this for other filter modules, I’ve decided to split it out into a new transcoding module. It will be tied to libxml2 applications, and will be usable both before and after any libxml2-based content filter. For maximum efficiency, it will only handle charsets that are not supported by libxml2.

It will also support additional preprocessing fixups that experience has shown necessary. That includes adjusting charset declarations that are invalidated by transcoding, and fixing tag-soup problems that screw up libxml2’s htmlParser.

It won’t do anything useful yet, but I’ve committed mod_xml2enc as a work-in-progress to svn at apache.webthing.com. When ready, it’ll borrow from several existing modules, and replace transcoding and preprocessing functions in them.

Posted on December 18, 2007, in apache, html, xml. Bookmark the permalink. 1 Comment.

Leave a comment