sed in apache

We have a history of general-purpose sed-like filtering in apache.  In chronological order:

  1. sed-like filter for Apache 1.3.
  2. sed with mod_ext_filter
  3. mod_line_edit
  4. mod_substitute
  5. mod_sed

This represents a genuine progression. The first is limited by the apache 1.x architecture, which means it can’t in general be used with dynamic or proxied contents. The second is not thus limited, but incurs a big performance penalty. The third and fourth are similar, and support general-purpose search and replace in apache’s output (mod_line_edit is designed primarily for use in a proxy but works anywhere; mod_substitute has no such preference).

The fifth, mod_sed, is new, and appears to be another big advance on any of its predecessors. Whereas mod_line_edit and mod_substitute are described as sed-like, mod_sed is the real thing: sed itself embedded in an apache filter. mod_sed developer Basant Kukreja (who I am privileged to have as colleague at Sun) has updated original sed code to be thread-safe and reentrant and to use APR pools, and has hooked it in to a filter. That alone means it can run much more complex operations than basic search-and-replace. But it has yet more to offer: mod_sed, unlike its predecessors, can filter input as well as output.

So what’s the cost of this extra power? Well, it’s bleeding-edge, with all that implies. And it’s bigger than its more limited competitors. But in terms of performance it appears to hold its own comfortably against any competition. Neither is it any more complex to configure. Well, I’m suitably impressed!

The question is, what next? I think for general-purpose filtering, mod_sed may be as good as it gets[1]. I’m wondering if this can now move in new directions:

  • Can we come up with a framework to plug syntax modules into mod_sed (as we can, for example, make vim syntax-aware)? And if so, could it move into the space of markup-aware modules like mod_proxy_html and mod_publisher, or indeed mod_highlighter?
  • Can we usefully apply mod_security-like rulesets with mod_sed to make a powerful untainting and information-disclosure filter? If so, the fact that it streams I/O will offer major performance advantages over scanning request and response bodies with mod_security, for situations where filtering is considered sufficient. This is something I’ve had at the back of my mind for years, but mod_sed offers a more powerful startingpoint than has hitherto been available.

[1] Yeah, right, you could probably do the same thing with perl and have truly the ultimate text processor, at the cost of much more bloat. But mod_perl does no such thing. Ditto other scripting modules, as far as I know.

[UPDATE] Since the above was written, mod_sed has been donated to the ASF and will be included as standard in Apache HTTPD 2.3/2.4 releases.

Posted on April 28, 2008, in apache. Bookmark the permalink. 1 Comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: