Making mod_proxy_html smarter

I’ve just had a good hacking session on mod_proxy_html (version 3.0-dev of course; 2.x isn’t getting major new features).

I had contemplated adding DTD support using the code from mod_publisher.  But that’s OTT for a specifically-HTML module.  Instead, I’ve added the capability to check HTML conformance to HTML4/XHTML1, using the HTML knowledge built into libxml2.  And in doing so, I recollect hacking up that little bit of libxml2 myself back when I was developing AccessValet:-)

So now a server admin can enable checking either to current or legacy (X)HTML standards (the difference being that the legacy – aka transitional – DTD allows deprecated markup).  If checking is enabled, then any bogus crap will be dumped.  This will be logged at loglevel DEBUG.  It’ll also complain if an HTML element is missing a REQUIRED attribute (e.g. ALT on an image), though of course it can’t fix that.

I’m contemplating also supporting context checking, so it’ll fix up elements that are valid, but appear in a context where they’re not valid.  That’s something libxml2 can fix (up to a point) as well as log.  But that’s rather more overhead to implement, because it means saving state over the SAX callbacks.

Posted on November 20, 2006, in apache, html. Bookmark the permalink. Leave a comment.

Leave a comment