mod_proxy_html 3.0

mod_proxy_html 2.x has been more-or-less stable for about two and a half years. But as its popularity grows, a few frequently-requested enhancements have become apparent.

I started hacking a while back, but that work has just been sitting on my desktop box. I’ve just checked in a bit of work-in-progress to SVN, together with a TODO.

Planned Enhancements:

  • Enable an admin to support proprietary HTML attributes (e.g. background to TABLE or TD). That means taking out the knowledge of HTML from mod_proxy_html and specifying it in the config instead.
  • Improve I18N support. Enable an admin
    1. To support an unsupported charset by aliasing it to a supported one on input.
    2. To generate output in different charsets, or to preserve the input charset.
  • Use a brigade rather than copying stuff when buffering is required.
  • Support stripping out bogus markup, using mod_publisher’s DTD support.

Other changes? Please submit your patches, or add your comments below!

Note: none of the above should be taken to imply a new release is imminent 🙂

Posted on November 7, 2006, in apache. Bookmark the permalink. 7 Comments.

  1. Gustavo Noronha (kov)

    Hello, I needed to proxy an ill-behaved application server through apache, and used mod_proxy_html to replace incorrect link and image addresses. The App Server is ill-behaved in the sense that it does not specify the Content-Type header when sending its html pages. I had to hack mod_proxy_html to allow that:

    http://200.152.41.7/~kov/assume_html.diff

    I believe this option would be a good adition to people who need to force mod_proxy_html to handle these bad application servers.

  2. I’ve tried mod_proxy_html but it doesn’t cope too well with broken web-pages, due to the SAX interpreter.

    A “simpler” option just to replace the obvious links might be an idea, or perhaps to use the SAX interpreter to generate substitutions in the original (broken) html (so e.g. “&copy” is left alone rather than becoming “&copy”!)

    Chris

  3. A flag for converting hrefs to lower case would be a huge win for those trying to make urls more search engine friendly.

    As things now stand, matching can be done in a case-insensitive way — converting the capture groups to lower case is the missing piece.

  4. Tony, it’s had that capability since version 1. See ProxyHTMLFixups.

  5. Hey niq,
    Can I make the mod_proxy_html to peak into the json data and rewrite some http://my_urls flowing from the backend server.Is so, how do I do it?.

    If not, should we have a mod_proxy_json specifically for json data.In that case we would need a json parser in C I suppose.

    Regards,
    vikasap

  6. Where script is embedded in HTML, mod_proxy_html will fix it for you. Otherwise, use a general-purpose filter. See my recent blog post “sed in apache”.

  1. Pingback: mod_proxy_html 3.0 « niq’s soapbox

Leave a comment