sed in apache

April 28, 2008 at 6:49 pm | In apache |

We have a history of general-purpose sed-like filtering in apache.  In chronological order:

  1. sed-like filter for Apache 1.3.
  2. sed with mod_ext_filter
  3. mod_line_edit
  4. mod_substitute
  5. mod_sed

This represents a genuine progression. The first is limited by the apache 1.x architecture, which means it can’t in general be used with dynamic or proxied contents. The second is not thus limited, but incurs a big performance penalty. The third and fourth are similar, and support general-purpose search and replace in apache’s output (mod_line_edit is designed primarily for use in a proxy but works anywhere; mod_substitute has no such preference).

The fifth, mod_sed, is new, and appears to be another big advance on any of its predecessors. Whereas mod_line_edit and mod_substitute are described as sed-like, mod_sed is the real thing: sed itself embedded in an apache filter. mod_sed developer Basant Kukreja (who I am privileged to have as colleague at Sun) has updated original sed code to be thread-safe and reentrant and to use APR pools, and has hooked it in to a filter. That alone means it can run much more complex operations than basic search-and-replace. But it has yet more to offer: mod_sed, unlike its predecessors, can filter input as well as output.

So what’s the cost of this extra power? Well, it’s bleeding-edge, with all that implies. And it’s bigger than its more limited competitors. But in terms of performance it appears to hold its own comfortably against any competition. Neither is it any more complex to configure. Well, I’m suitably impressed!

The question is, what next? I think for general-purpose filtering, mod_sed may be as good as it gets[1]. I’m wondering if this can now move in new directions:

  • Can we come up with a framework to plug syntax modules into mod_sed (as we can, for example, make vim syntax-aware)? And if so, could it move into the space of markup-aware modules like mod_proxy_html and mod_publisher, or indeed mod_highlighter?
  • Can we usefully apply mod_security-like rulesets with mod_sed to make a powerful untainting and information-disclosure filter? If so, the fact that it streams I/O will offer major performance advantages over scanning request and response bodies with mod_security, for situations where filtering is considered sufficient. This is something I’ve had at the back of my mind for years, but mod_sed offers a more powerful startingpoint than has hitherto been available.

[1] Yeah, right, you could probably do the same thing with perl and have truly the ultimate text processor, at the cost of much more bloat. But mod_perl does no such thing. Ditto other scripting modules, as far as I know.

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.