mod_proxy_html 3.1-dev

June 18, 2008 at 12:31 am | In apache | No Comments

I’ve just started hacking a new mod_proxy_html update.

The rationale is that all the internationalisation support in 3.0 should be shared by other filter modules, without having to replicate lots of code.  That’s why I wrote mod_xml2enc, which does a better job of it.

That left me with the job of converting mod_proxy_html to use mod_xml2enc as an inevitable corollary.  That’s what I’ve just started on.  The outcome, when it’s ready, will be a mod_proxy_html 3.1 that’s delegated its i18n support, and is thus quite a bit smaller and simpler than 3.0.

Sometime soon I’ll be packaging these and other modules for Sun’s webstack.

Apache 2.2.9

June 13, 2008 at 11:43 pm | In apache | No Comments

Just a heads-up to anyone who hasn’t seen it elsewhere. Apache 2.2.9 is now an official release, and should be available from the mirrors by the time you read this, if it isn’t already.

There’s nothing earth-shattering in this release (for most users, at any rate). There’s a security fix for an issue that probably affects noone in real life: a possible DoS attack not from a client, but from a backend server being proxied by apache. There are a number of minor bugfixes and enhancements, documented in the CHANGES file. Users will probably find the biggest difference is some improvement and rationalisation of the configuration. That’ll manifest itself in some irritating niggles going away, rather than in any radical change.

The most serious bug fixed is probably a race condition that could cause segfaults under load in the worker and event MPMs. That, together with the minor bugfixes and configuration improvements make a worthwhile upgrade, but not one of great urgency.

sed in apache

April 28, 2008 at 6:49 pm | In apache | No Comments

We have a history of general-purpose sed-like filtering in apache.  In chronological order:

  1. sed-like filter for Apache 1.3.
  2. sed with mod_ext_filter
  3. mod_line_edit
  4. mod_substitute
  5. mod_sed

This represents a genuine progression. The first is limited by the apache 1.x architecture, which means it can’t in general be used with dynamic or proxied contents. The second is not thus limited, but incurs a big performance penalty. The third and fourth are similar, and support general-purpose search and replace in apache’s output (mod_line_edit is designed primarily for use in a proxy but works anywhere; mod_substitute has no such preference).

The fifth, mod_sed, is new, and appears to be another big advance on any of its predecessors. Whereas mod_line_edit and mod_substitute are described as sed-like, mod_sed is the real thing: sed itself embedded in an apache filter. mod_sed developer Basant Kukreja (who I am privileged to have as colleague at Sun) has updated original sed code to be thread-safe and reentrant and to use APR pools, and has hooked it in to a filter. That alone means it can run much more complex operations than basic search-and-replace. But it has yet more to offer: mod_sed, unlike its predecessors, can filter input as well as output.

So what’s the cost of this extra power? Well, it’s bleeding-edge, with all that implies. And it’s bigger than its more limited competitors. But in terms of performance it appears to hold its own comfortably against any competition. Neither is it any more complex to configure. Well, I’m suitably impressed!

The question is, what next? I think for general-purpose filtering, mod_sed may be as good as it gets[1]. I’m wondering if this can now move in new directions:

  • Can we come up with a framework to plug syntax modules into mod_sed (as we can, for example, make vim syntax-aware)? And if so, could it move into the space of markup-aware modules like mod_proxy_html and mod_publisher, or indeed mod_highlighter?
  • Can we usefully apply mod_security-like rulesets with mod_sed to make a powerful untainting and information-disclosure filter? If so, the fact that it streams I/O will offer major performance advantages over scanning request and response bodies with mod_security, for situations where filtering is considered sufficient. This is something I’ve had at the back of my mind for years, but mod_sed offers a more powerful startingpoint than has hitherto been available.

[1] Yeah, right, you could probably do the same thing with perl and have truly the ultimate text processor, at the cost of much more bloat. But mod_perl does no such thing. Ditto other scripting modules, as far as I know.

Sanitising user-contributed markup

April 20, 2008 at 9:33 pm | In apache, html, security | No Comments

At ApacheCon, I once again encountered the argument sanitising markup is difficult, with an explanation of how easy it is to evade pattern-matching filters with tricks like reordering, whitespace, and embedded comments. I protested that this kind of difficulty comes from using the wrong tools, and the problem largely goes away if you use markup-aware tools.

On April 10th I promised a note on this (though that promise came from a separate conversation at apachecon, and in a different context to the security issue). Today I’ve just delivered on that promise, with a brief technical note. I expect to use it in future when the subject arises.

Royalties and Translations!

April 14, 2008 at 4:26 pm | In apache, books | 2 Comments

I arrived home today to my second royalty cheque for the book.  This one, which is for the period July-December 2007, is sadly smaller than the previous one: presumably that must’ve been boosted by a burst of initial sales where there was a previously-unmet demand.

One interesting item was “Subsidiary Rights” from Grupo Anaya, S.A., who appear to have published a spanish translation.  This is the first I’ve heard of it, though I had been told by a fellow author (probably DrBacchus, though I can’t recollect for certain) that translations might happen without my hearing of them.  As with the chinese translation, I’m thrilled to hear that it’s available in another of the world’s most important languages.

In money terms, the spanish royalty accounts for nearly as much as the English edition!  There’s no mention of the chinese translation, but then that won’t've hit the shelves during this royalty period.  I wonder if that’ll be a worthwhile item next time?

Apache 3.0

April 9, 2008 at 3:00 am | In apache, apachecon | No Comments

OK, since last week was the time for mischievous fun, let’s start by making it perfectly clear that Apache 3.0 is vapourware!

Roy’s keynote closing Apachecon on Friday will, we’re led to believe, will talk about Apache 3.  I expect any hint at concrete revelations may be pretty incidental to it, but there was a bit of dialogue last week along the lines of Tell me what you decide and I’ll announce it …  We broke it!

Tuesday was a day to talk about visions for the next generation webserver, with probably the greatest number ever of today’s core devs gathered around the same table.  Chief visionary for radical updates was Justin, who not only introduced some potentially-interesting ideas, but was prepared to defend them in the face of questioning from natural sceptics like Yours Truly.  Which is not, of course, to say that he answered every question.

Without going into technical detail, key issues discussed included core architecture, performance, configuration, and compatibility.  And I think we’re closer to agreement on some of the issues than we were before, having made progress in reconciling at least some of the conflicting concerns in the meltingpot.

Of course, there were no takers for now who’s actually going to get hacking on it :-)

Flexible configuration for Apache

April 4, 2008 at 12:39 am | In apache | No Comments

How best to improve Apache’s configuration syntax is a hot topic just now, with a camp wanting to bring in Lua as a solution to all problems. In the absence of a clear vision of where they’re heading, I’ll remain sceptical but open to argument about that.

As a baseline, I strongly believe any major change should not force a new learning curve on our existing communities, including users, sysops, and module developers. Giving them new options is fine; forcing new things on them isn’t. At the same time, there are certainly things that could do with an overhaul: for example, virtual host configuration confuses the heck out of many users. What we could really do with is a hook to enable modules to take charge of the whole business of configuration, and take this argument (along with the current core config code) out of the server core.

Meanwhile, I’m working on incremental improvements that are fully back-compatible, requiring no changes to existing configurations or modules, but giving a useful new tool to server admins. This morning I committed a patch to implement an <If> block:

<If “expression”>
#arbitrary directives here
</If>

The expression is evaluated for each request, and the configuration contained applies if and only if the expression is true. This is intended primarily to offer a much simpler and easier (not to mention more rational) alternative to some of the ugly hacks people implement with mod_rewrite & friends.

It’s also the third application of the apache expression parser.

i18n and filters

April 2, 2008 at 1:29 pm | In apache | No Comments

mod_proxy_html 3.0 introduced charset conversion using apr_xlate (iconv). Later I decided to extract that out into its own module mod_xml2enc, rather than re-implement the same i18n again in other filters such as mod_publisher.

In developing mod_xml2enc, I encountered and fixed some problems I hadn’t seen with mod_proxy_html 3. And today, someone emailed me with a bug report that looked very familiar: he’s encountering the problem I fixed for mod_xml2enc.

For the future, mod_proxy_html 3.1 will have the apr_xlate stuff stripped out, and replaced by hooks for mod_xml2enc instead. Meanwhile, anyone affected by transcoding problems should take the advice I gave today’s correspondent: use the xml2enc filter in front of the proxy-html one, so the latter’s iconv transcoding isn’t required.

Porting apache to Java platform

April 1, 2008 at 6:37 am | In apache, sun | No Comments

The Apache webserver runs on a wide range of platforms, from the mainstream Unix-family (including Linux, Mac, etc) and Windows through to minority platforms like Netware, OS/2 and BeOS.  But not the Java VM, which is of course the majority platform for ASF projects, as well as an important flagship for Sun and other corporations with an interest in the server.

Of course it’s long supported Java apps, through an appserver such as Tomcat or Glassfish, and a connector such as mod_jk or mod_proxy_ajp.  But that’s not the same as running on the platform!

Now at Sun it’s no secret we’d like to bring the best of both Sun’s and Apache’s webservers to our users.  How better to do that than to take the proven performance and scalability of Sun’s async connection/request management and scheduling, and plugging it into Apache as a new high-performance MPM?

The starting point for this is Java bindings in APR, after which we can plug in a Java MPM based on Sun code, and run HTTPD on the Java platform.  In due course we’ll have a server which users can run equally well either natively (as now) or on the Java VM, and which supports both Apache modules (the current C API) and the NSAPI for Java modules.

Eventually, a server where users can mix-and-match the LAMP stack with Glassfish in a single application platform!

Expression parser for Apache

March 31, 2008 at 8:33 pm | In apache | 4 Comments

Over the past few days I’ve got a long-awaited round tuit, and given Apache a general-purpose expression parser, ap_expr. That’s in trunk/2.3, and is unlikely to be stable enough for a 2.2.x (release) version for a while.

What I haven’t done this time is write a new expression parser. I’ve done that before, e.g. when I hacked up an ESI parser in 2003, but it’s necessarily an exercise in reinventing the wheel. So this time I’ve just adapted an existing parser: the one used by mod_include. That basically meant removing mod_include-specific stuff from its expression parser, and generalising somewhat. The first stage was complete when mod_include itself was adapted to use ap_expr, and passed all tests with it.

The second module I’ve just now adapted is mod_filter, where the expression parser replaces the ad-hoc dispatch criteria in the FilterProvider directive. The advantage of this is that the updated mod_filter can dispatch on multiple criteria. For example, a user wants a filter to apply if and only if the Content Type is text/html AND the response is not compressed. That’s now easy, and no longer requires hackish workarounds:

FilterProvider myname myprovider "($resp{Content-Encoding} != gzip) && ($Content-Type == /text\/html/i)"

In the medium term, this could be used to enhance a number of different apache functions, and provide a more consistent expression syntax across different modules. Notable potential benefits include a far simpler and more logical configuration syntax for some of the more complex tasks undertaken with mod_rewrite.

Next Page »

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.