<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>niq's soapbox &#187; html</title>
	<atom:link href="http://bahumbug.wordpress.com/category/html/feed/" rel="self" type="application/rss+xml" />
	<link>http://bahumbug.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<pubDate>Thu, 03 Jul 2008 22:59:23 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
	<language>en</language>
			<item>
		<title>Sanitising user-contributed markup</title>
		<link>http://bahumbug.wordpress.com/2008/04/20/sanitising-user-contributed-markup/</link>
		<comments>http://bahumbug.wordpress.com/2008/04/20/sanitising-user-contributed-markup/#comments</comments>
		<pubDate>Sun, 20 Apr 2008 21:33:12 +0000</pubDate>
		<dc:creator>niq</dc:creator>
		
		<category><![CDATA[apache]]></category>

		<category><![CDATA[html]]></category>

		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://bahumbug.wordpress.com/?p=392</guid>
		<description><![CDATA[At ApacheCon, I once again encountered the argument sanitising markup is difficult, with an explanation of how easy it is to evade pattern-matching filters with tricks like reordering, whitespace, and embedded comments.  I protested that this kind of difficulty comes from using the wrong tools, and the problem largely goes away if you use [...]]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>At ApacheCon, I once again encountered the argument <em>sanitising markup is difficult</em>, with an explanation of how easy it is to evade pattern-matching filters with tricks like reordering, whitespace, and embedded comments.  I protested that this kind of difficulty comes from using the wrong tools, and the problem largely goes away if you use markup-aware tools.</p>
<p>On April 10<sup>th</sup> <a href="http://bahumbug.wordpress.com/2008/04/10/putting-ones-money-where-ones-mouth-is/">I promised a note on this</a> (though that promise came from a separate conversation at apachecon, and in a different context to the security issue).  Today I&#8217;ve just delivered on that promise, with a <a href="http://www.apachetutor.org/dev/online-edit">brief technical note</a>.  I expect to use it in future when the subject arises.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/bahumbug.wordpress.com/392/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/bahumbug.wordpress.com/392/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/bahumbug.wordpress.com/392/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/bahumbug.wordpress.com/392/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/bahumbug.wordpress.com/392/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/bahumbug.wordpress.com/392/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/bahumbug.wordpress.com/392/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/bahumbug.wordpress.com/392/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/bahumbug.wordpress.com/392/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/bahumbug.wordpress.com/392/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/bahumbug.wordpress.com/392/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/bahumbug.wordpress.com/392/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=bahumbug.wordpress.com&blog=471959&post=392&subd=bahumbug&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://bahumbug.wordpress.com/2008/04/20/sanitising-user-contributed-markup/feed/</wfw:commentRss>
	
		<media:content url="http://a.wordpress.com/avatar/bahumbug-128.jpg" medium="image">
			<media:title type="html">niq</media:title>
		</media:content>
	</item>
		<item>
		<title>Transcoding module</title>
		<link>http://bahumbug.wordpress.com/2007/12/18/transcoding-module/</link>
		<comments>http://bahumbug.wordpress.com/2007/12/18/transcoding-module/#comments</comments>
		<pubDate>Tue, 18 Dec 2007 23:52:59 +0000</pubDate>
		<dc:creator>niq</dc:creator>
		
		<category><![CDATA[apache]]></category>

		<category><![CDATA[html]]></category>

		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://bahumbug.wordpress.com/2007/12/18/transcoding-module/</guid>
		<description><![CDATA[One of the new features in mod_proxy_html 3.0 is improved i18n support, adding character sets supported by apr_xlate (normally iconv) to those supported by libxml2.
In generalising this for other filter modules, I&#8217;ve decided to split it out into a new transcoding module.  It will be tied to libxml2 applications, and will be usable both [...]]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>One of the new features in <a href="http://apache.webthing.com/mod_proxy_html/">mod_proxy_html 3.0</a> is improved i18n support, adding character sets supported by apr_xlate (normally <a href="http://www.gnu.org/software/libiconv/">iconv</a>) to those supported by <a href="http://xmlsoft.org/">libxml2</a>.</p>
<p>In generalising this for other filter modules, I&#8217;ve decided to split it out into a new transcoding module.  It will be tied to libxml2 applications, and will be usable both before and after any libxml2-based content filter.  For maximum efficiency, it will only handle charsets that are not supported by libxml2.</p>
<p>It will also support additional preprocessing fixups that experience has shown necessary.  That includes adjusting charset declarations that are invalidated by transcoding, and fixing  tag-soup problems that screw up libxml2&#8217;s htmlParser.</p>
<p>It won&#8217;t do anything useful yet, but I&#8217;ve committed mod_xml2enc as a work-in-progress to <a href="http://apache.webthing.com/svn/apache/">svn at apache.webthing.com</a>.  When ready, it&#8217;ll borrow from several existing modules, and replace transcoding and preprocessing functions in them.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/bahumbug.wordpress.com/319/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/bahumbug.wordpress.com/319/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/bahumbug.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/bahumbug.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/bahumbug.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/bahumbug.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/bahumbug.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/bahumbug.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/bahumbug.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/bahumbug.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/bahumbug.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/bahumbug.wordpress.com/319/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=bahumbug.wordpress.com&blog=471959&post=319&subd=bahumbug&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://bahumbug.wordpress.com/2007/12/18/transcoding-module/feed/</wfw:commentRss>
	
		<media:content url="http://a.wordpress.com/avatar/bahumbug-128.jpg" medium="image">
			<media:title type="html">niq</media:title>
		</media:content>
	</item>
		<item>
		<title>mod_proxy_html 3.0</title>
		<link>http://bahumbug.wordpress.com/2006/12/25/mod_proxy_html-30-2/</link>
		<comments>http://bahumbug.wordpress.com/2006/12/25/mod_proxy_html-30-2/#comments</comments>
		<pubDate>Mon, 25 Dec 2006 02:51:30 +0000</pubDate>
		<dc:creator>niq</dc:creator>
		
		<category><![CDATA[apache]]></category>

		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://bahumbug.wordpress.com/2006/12/25/mod_proxy_html-30-2/</guid>
		<description><![CDATA[I&#8217;ve just announced a public dev version of mod_proxy_html, incorporating a range of updates.  That means it works nicely for me, and I&#8217;d like the outside world to start test-driving it.
First, there&#8217;s much better internationalisation support.

A charset not supported by libxml2 can be aliased to a supported one.
A charset that is neither supported directly nor [...]]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve just announced a public dev version of <a href="http://apache.webthing.com/mod_proxy_html">mod_proxy_html</a>, incorporating a range of updates.  That means it works nicely for me, and I&#8217;d like the outside world to start test-driving it.</p>
<p>First, there&#8217;s much better internationalisation support.</p>
<ul>
<li>A charset not supported by libxml2 can be aliased to a supported one.</li>
<li>A charset that is neither supported directly nor aliased will be converted to unicode using apr_xlate (an iconv wrapper).</li>
<li>A default input encoding (for totally unlabelled contents) can be configured.</li>
<li>Output can be filtered through apr_xlate to a server admin&#8217;s desired encoding.</li>
</ul>
<p>Second, support for rewriting proprietary HTML variants is now configurable.  Indeed, the definitions of all link and event attributes is now delegated to httpd.conf, and an example configuration is supplied, defining the links and events in W3C HTML 4.01 and XHTML 1.0.</p>
<p>When I <a href="http://bahumbug.wordpress.com/2006/11/07/mod_proxy_html-30/">announced it here</a> I got two requests, one of which was easy to satisfy.  You can now override its refusal to run when not in a proxy context, or when the input isn&#8217;t HTML.  This of course is at your own risk, to help dealing with broken backends.</p>
<p>This is one of a number of new fixes available for broken backends.  Others include an option to <a href="http://bahumbug.wordpress.com/2006/10/12/mod_proxy_html-revisited/">ignore leading junk</a>, and the capability to strip out bogus or deprecated markup and output cleaned up HTML or XHTML.</p>
<p>Finally, Version 3 introduces more flexible configuration.  It now supports variable interpolation in ProxyHTMLURLMap rules, and allows an additional clause making application of individual rules conditional on an environment variable.  So configuration can now be dynamic - e.g. driven by mod_rewrite -  when &lt;Location&gt; / &lt;LocationMatch&gt; sections aren&#8217;t sufficiently flexible.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/bahumbug.wordpress.com/93/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/bahumbug.wordpress.com/93/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/bahumbug.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/bahumbug.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/bahumbug.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/bahumbug.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/bahumbug.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/bahumbug.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/bahumbug.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/bahumbug.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/bahumbug.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/bahumbug.wordpress.com/93/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=bahumbug.wordpress.com&blog=471959&post=93&subd=bahumbug&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://bahumbug.wordpress.com/2006/12/25/mod_proxy_html-30-2/feed/</wfw:commentRss>
	
		<media:content url="http://a.wordpress.com/avatar/bahumbug-128.jpg" medium="image">
			<media:title type="html">niq</media:title>
		</media:content>
	</item>
		<item>
		<title>Making mod_proxy_html smarter</title>
		<link>http://bahumbug.wordpress.com/2006/11/20/making-mod_proxy_html-smarter/</link>
		<comments>http://bahumbug.wordpress.com/2006/11/20/making-mod_proxy_html-smarter/#comments</comments>
		<pubDate>Mon, 20 Nov 2006 00:51:37 +0000</pubDate>
		<dc:creator>niq</dc:creator>
		
		<category><![CDATA[apache]]></category>

		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://bahumbug.wordpress.com/2006/11/20/making-mod_proxy_html-smarter/</guid>
		<description><![CDATA[I&#8217;ve just had a good hacking session on mod_proxy_html (version 3.0-dev of course; 2.x isn&#8217;t getting major new features).
I had contemplated adding DTD support using the code from mod_publisher.  But that&#8217;s OTT for a specifically-HTML module.  Instead, I&#8217;ve added the capability to check HTML conformance to HTML4/XHTML1, using the HTML knowledge built into libxml2.  And [...]]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve just had a good hacking session on mod_proxy_html (version 3.0-dev of course; 2.x isn&#8217;t getting major new features).</p>
<p>I had contemplated adding DTD support using the code from mod_publisher.  But that&#8217;s OTT for a specifically-HTML module.  Instead, I&#8217;ve added the capability to check HTML conformance to HTML4/XHTML1, using the HTML knowledge built into libxml2.  And in doing so, I recollect hacking up that little bit of libxml2 myself back when I was developing AccessValet <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>So now a server admin can enable checking either to current or legacy (X)HTML standards (the difference being that the legacy - aka transitional - DTD allows deprecated markup).  If checking is enabled, then any bogus crap will be dumped.  This will be logged at loglevel DEBUG.  It&#8217;ll also complain if an HTML element is missing a REQUIRED attribute (e.g. ALT on an image), though of course it can&#8217;t fix that.</p>
<p>I&#8217;m contemplating also supporting context checking, so it&#8217;ll fix up elements that are valid, but appear in a context where they&#8217;re not valid.  That&#8217;s something libxml2 <em>can</em> fix (up to a point) as well as log.  But that&#8217;s rather more overhead to implement, because it means saving state over the SAX callbacks.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/bahumbug.wordpress.com/54/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/bahumbug.wordpress.com/54/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/bahumbug.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/bahumbug.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/bahumbug.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/bahumbug.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/bahumbug.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/bahumbug.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/bahumbug.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/bahumbug.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/bahumbug.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/bahumbug.wordpress.com/54/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=bahumbug.wordpress.com&blog=471959&post=54&subd=bahumbug&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://bahumbug.wordpress.com/2006/11/20/making-mod_proxy_html-smarter/feed/</wfw:commentRss>
	
		<media:content url="http://a.wordpress.com/avatar/bahumbug-128.jpg" medium="image">
			<media:title type="html">niq</media:title>
		</media:content>
	</item>
	</channel>
</rss>