Category Archives: apache
Three days of ApacheCon@home – not far short of a full regular ApacheCon. A comparable number of presentations, far more attendees, but missing some fun elements like lightning talks. I felt too knackered to blog on Wednesday or Thursday evenings, but a few more thoughts bear recording.
In fact it wasn’t just the evenings I felt knackered: I felt diminishing returns on the contents, both presentations and social (not helped by more glitches with the technology). If a change is as good as a rest, the
changerest associated with a conference venue and atmosphere is perhaps essential to the familiar experience. Which leads me to the thought: this would have worked better as three separate one-day events. Shorter events could perhaps be themed further for different timezones.
And of course, the primary reasons for making it a multi-day event – to get a decent return on the overhead of travel and accommodation, and to make the most of the in-person event – is gone. It seems to me that more and shorter events make a lot of sense in the online space!
Regarding the substantial contents, I attended several talks in the Geospatial track. Of particular note was one by Lucian Plesea, whose interests seem to have a huge amount in common with my own. His work on accessing and visualising huge datasets look a lot like what I was aiming for with the HyperDAAC, a couple of generations on in terms of both computing and Big Data. Dr Plesea is working at ESRI, so hopefully his work shouldn’t languish unnoticed as HyperDAAC did!
Furthermore, at the core of his implementation (and talk) is a set of Apache modules, practicing what I preach in terms of making it his application server. I was gratified by his reference to my book in the BoF at the end of the day (after his talk). His modules are open source at github, and look interesting, and perhaps deserving of packaging for a wider audience.
This year, for obvious reasons (covid), ApacheCon is taking place entirely online. Today was the first day. So what was it like?
Well, obviously it’s not a Jolly in a nice hotel in an interesting location, as the best events have been. Does that detract from it? Well of course those are part of the magic of the best ApacheCons – above all Budapest, where both the hotel and the city were fantastic. On the other hand, the money saved could buy quite a decent week’s holiday somewhere of my choosing! Better to focus on what did or didn’t work well in terms of presentations, communication, networking.
The economics of online worked nicely for lots of people, bringing in several thousand attendees compared to a few hundred at a “normal” event. A poll suggests that eightysomething percent of those thousands are attending their first ApacheCon: evidently a lot of people find it a lower hurdle (as I do). So we’re embracing a much bigger community, which is fantastic – so long as we don’t disappoint.
The times also worked nicely for us in European (and indeed African) timezones. While Americans, Asians and Antipodeans had it taking much of their nights at one end or t’other, here it opened at 09:30, and ended with BoF sessions at 21:00, much like a regular ApacheCon. The benefits of being in the middle of the world’s inhabited timezones!
After one or two initial glitches – a very short learning curve – the technology worked well. Presentations were clear, with the presentation window split to show the presenter, his/her screen, and a text chat window in separate panes: pretty-much ideal. “Corridor” action was, I thought, less successful, but then I’ve long found text chat easier to work with than face-to-face, which is why I don’t even have a webcam and audio system on my desktop ‘puter. Text chat there was a-plenty on every possible topic, but then we don’t need an organised event to benefit from that.
In terms of contents, the programme was easily as good as any I can remember. I enjoyed and was inspired by a number of talks, including some not merely on subjects but on projects with which I had no previous familiarity. In fact I think it worked rather better than sitting in a conference room, and I found it easy to stay alert and focussed, even in that after-lunch siesta slot when it can be hard to stay awake.
A major theme this year is the rapidly-growing Chinese community at Apache. In recent years it’s moved on from a handful of individual developers contributing to Apache projects, to quite a number of major projects originating in China and with Chinese core teams coming to Apache. Sheng Wu – a name I’ve hitherto known as a leading light of the Incubator and also lead on one of those projects – gave a keynote on the subject.
I don’t recollect when we had the main discussion of the language issues of Chinese communities coming to Apache, but some of these are now fully bi-lingual, with English being ultimately the official language but Mandarin also widely used. Mandarin was also used in a few of the morning’s talks – morning being of course the eastern-timezone-friendly time of day (and there was also a Hindi track). One Chinese speaker whose English-language talk I started to listen to proved hard to follow, and I found sneaking out unobtrusively a minor benefit of the online format!
The success of Chinese projects coming to Apache was demonstrated by two of today’s most interesting talks – by Western speakers (one American, one German) who don’t speak Chinese, but have become members of the respective core developer communities by virtue of participating. One was about the project itself, but Julian Feinauer’s talk was specifically focussed on the community: how a bi-lingual community works in practice (a question on which I’ve mused before, for example with reference to translations of my book, and regarding nginx). Answer: it’s working very well, with both languages, with machine translation to help “get the gist”, and with bilingual members of the community. And there are gotchas, when an insufficiently-comprehensive translation leads to confusion.
Summarising the Chinese theme, I think perhaps Sheng Wu’s keynote marks the point I dreamed of when I wrote the preface to the Chinese translation of my book.
Congratulations to Rich Bowen and his team on adapting to the circumstances and bringing us a fantastic event! More to come, about which I may or may not blog.
Apache APR is a stable project. Development activity tends to be incremental, and low-volume.
Today we have what is probably our biggest change for years: a new apr_json module to parse and produce JSON. This was developed as a third-party project by Moriyoshi Koizumi, who has now formally donated it to the APR project. Thanks to Moriyoshi and to Graham Leggett from the APR core team for bringing it to Apache.
With a bit of luck, this might motivate us to work towards a new APR-1.7 release in the next few months. I shall endeavour to get my own fat arse into gear and backport my XML (libxml2 and build) work of some time ago from trunk, as well as do my bit in working towards a release.
 For values of “few” that tend to grow.
Some months ago, Apache PR (aka Sally) launched a monthly series under the generic title “Success at Apache”, and solicited volunteers to write articles on topics of relevance to the Apache Way and how things work. I was one of many to reply, and she put me down for this month’s piece. A few days ago it went live, here.
The original proposal was to discuss the Just Do It and Scratch Your Own Itch aspects of Apache projects and how, with the checks and balances provided by the meritocratic and democratic elements of project governance, that Just Works. Some (linguistically) very ugly words for this have been floating around, so I’ve made an attempt to improve on them with a new coinage to avoid muddling English and Greek. Pratocracy: the Rule of the Makers.
Sometime before I started writing, a question came up on the Apache Members list about any guidelines for companies looking to get involved with an Apache project. It appears most of what’s been written is on the negative side: things not to do! This seems to be a question that dovetails well with my original plan, so I decided to try and tackle it in my article. This became the longest section of the article, and may hopefully prove useful to someone out there!
Sadly I was recovering from a nasty lurgy at the time I was writing it, and I can’t help feeling that the prose falls short of my most inspired efforts. I’ve avoided repeating Apache Way orthodoxy that’s been spoken and written before by many of my colleagues, but in doing so I may have left too much unsaid for a more general readership. At times I may have done the opposite and blathered on about the perfectly obvious. Ho, hum.
Folks who know me will know that I’ve been taking an interest for some time in the problems of online identity and trust:
- Passwords (as we know them today) are a sick joke.
- Monolithic certificate authorities (and browser trust lists) are a serious weakness in web trust.
- PGP and the Web of Trust remain the preserve of geekdom.
- People distrust and even fear centralised databases. At issue are both the motivations of those who run them, and security against intruders.
- Complexity and poor practice opens doors for phishing and identity theft.
- Establishing identity and trust can be a nightmare, to the extent that a competent fraudster might find it easier than the real person to establish an identity.
I’m not a cryptographer. But as mathematician, software developer, and old cynic, I have the essential ingredients. I can see that things are wrong and could so easily be a whole lot better at many levels. It’s not even a hard problem: merely a more rational deployment of existing technology! Some time back I thought about setting myself up in the business of making it happen, but was put off by the ghost of what happened last time I tried (and failed) to launch an innovative startup.
Recently – starting this summer – I’ve embarked on another mission towards improving the status quo. Instead of trying to run my own business, I’ve sought out an existing business doing good work in the field, to which I can hope to make a significant contribution. So the project’s fortunes tap into my strengths as techie rather than my weaknesses as a Suit.
I should add that the project does rather more than just improve the deployment of existing technology, as it significantly advances the underlying cryptographic framework. Most importantly it introduces a Distributed Trust Authority model, as an alternative to the flawed monolithic Certificate Authority and its single point of failure. The distributed model also makes it particularly well-suited to “cloud” applications and to securing the “Internet of Things”.
And it turns out, I arrived at an opportune moment. The project has been single-company open source for some time and generated some interest at github. Now it’s expanding beyond that: a second corporate team is joining development and I understand there are further prospects. So it could really use a higher-level development model than github: one that will actively foster the community and offer mutual assurance and protection to all participants. So we’ve put it forward as a candidate for incubation at Apache. The proposal is here.
If all goes well, this could be the core of my work for some time to come. Here’s hoping for a big success and a better, safer online world.
I haven’t blogged much on software of late. Well, I don’t seem to have blogged so much at all, but my techie contents have been woefully sparse even within a meagre whole.
Well, I’ve just added a new stream editor in to Apache Trafficserver. It’s been on my to-do list for a long time to produce a similar functionality to sed and sed-like modules in Apache HTTPD. Now I’ve hacked it up, and dropped in in to the main repo at /plugins/experimental/stream-editor/. I expect it’ll stay in /experimental/ until and unless it gets sufficient real-world usage to prove itself and sufficient demand to be promoted.
The startingpoint for this was to duplicate the functionality of mod_line_edit or mod_substitute, but with the capability (offered by mod_sed but not by the others) to rewrite incoming as well as outgoing data. Trafficserver gives me that for free, as the same code will filter both input and output. Some of the more advanced features, such as HTTPD’s environment variables, are not supported.
There were two main problems to deal with. Firstly, the configuration needs to be designed and implemented from scratch: that’s currently documented in the source code. It’s a bit idiosyncratic (I’ll append it below): suggestions welcome. Secondly, the trafficserver API lacks a set of utility classes as provided by APR for Apache HTTPD. To deal with the latter, I hacked it in C++ and used STL containers, in a manner that should hopefully annoy purists in either C (if they exist) or C++ (where they certainly do).
In figuring it out I was able to make some further improvements: in particular, it deals much better than mod_line_edit or mod_substitute with the case where different rules produce conflicting edits, allowing different rules to be assigned different precedences in configuration to resolve conflicts. And it applies all rules in a single pass, avoiding the overhead of reconstituting the data or parsing ever-more-fragmented buffers – though it does have to splice buffers to avoid the risk of losing matches that span input chunks. It parses each chunk of data into an ordered (stl) set before actually applying the edits and dispatching the edited data.
/* stream-editor: apply string and/or regexp search-and-replace to * HTTP request and response bodies. * * Load from plugin.config, with one or more filenames as args. * These are config files, and all config files are equal. * * Each line in a config file and conforming to config syntax specifies a * rule for rewriting input or output. * * A line starting with [out] is an output rule. * One starting with [in] is an input rule. * Any other line is ignored, so blank lines and comments are fine. * * Each line must have a from: field and a to: field specifying what it * rewrites from and to. Other fields are optional. The full list: * from:flags:value * to:value * scope:flags:value * prio:value * len:value * * Fields are separated by whitespace. from: and to: fields may contain * whitespace if they are quoted. Quoting may use any non-alphanumeric * matched-pair delimiter, though the delimiter may not then appear * (even escaped) within the value string. * * Flags are: * i - case-independent matching * r - regexp match * u (applies only to scope) - apply scope match to full URI * starting with "http://" (the default is to match the path * only, as in for example a <Location> in HTTPD). * * * A from: value is a string or a regexp, according to flags. * A to: string is a replacement, and may reference regexp memory $1 - $9. * * A scope: value is likewise a string or (memory-less) regexp and * determines the scope of URLs over which the rule applies. * * A prio: value is a single digit, and determines the priority of the * rule. That is to say, two or more rules generate overlapping matches, * the priority value will determine which rule prevails. A lower * priority value prevails over a higher one. * * A len: value is an integer, and applies only to a regexp from: * It should be an estimate of the largest match size expected from * the from: pattern. It is used internally to determine the size of * a continuity buffer, that avoids missing a match that spans more * than one incoming data chunk arriving at the stream-editor filter. * The default is 20. * * Performance tips: * - A high len: value on any rule can severely impact on performance, * especially if mixed with short matches that match frequently. * - Specify high-precedence rules (low prio: values) first in your * configuration to avoid reshuffling edits while processing data. * * Example: a trivial ruleset to escape text in HTML: * [out] scope::/html-escape/ from::"&" to:"&" * [out] scope::/html-escape/ from::< to:< * [out] scope::/html-escape/ from::> to:> * [out] scope::/html-escape/ from::/"/ to:/"/ * Note, the first & has to be quoted, as the two ampersands in the line * would otherwise be mis-parsed as a matching pair of delimiters. * Quoting the &, and the " line with //, are optional (and quoting * is not applicable to the scope: field). * The double-colons delimit flags, of which none are used in this example. */
I’ve already posted from ApacheCon about my favourable first impression. I’m happy to say my comments about the fantastic city and hotel have survived the week intact: I was as impressed at the end of the week as at the start. Even the weather improved through the week, so in the second half – when the conference schedule was less intense – I could go out without getting wet.
The main conference sessions were Monday to Wednesday, with all-day schedules and social events in the evening. Thursday was all-day BarCamp, though I skipped the morning in favour of a bit of touristing in the best weather of the week. Thursday and Friday were also the related Cloudstack event. I’m not going to give a detailed account of my week. I attended a mix of talks: a couple on familiar subjects to support and heckle speakers, new and unfamiliar material to educate myself on topics of interest, and – not least – inspirational talks from Apache’s gurus such as Bertrand.
Socially it had a very good feel: as ever I’ve renewed acquaintance with old friends, met new friends, and put faces to names hitherto seen only online. The social scene was no doubt helped not just by the three social evenings laid on, but also by the fact that all meals were provided encouraging us to stay around the hotel, and that the weather discouraged going elsewhere for the first half of the week. The one thing missing was a keysigning party. Note to self: organise it myself for future conferences if noone else gets there first!
I’ve returned home much refreshed and with some ideas relevant to my work, and an intention to revitalise my Apache work – where I need to cut my involvement down to my three core projects and then give those the time&effort they deserve but which have been sadly lacking of late. Also grossly overfed and bloated. Now I just have to sustain that high, against the adversity of the darkest time of year and temperatures that encourage staying in bed. 😮
Huge thanks to DrBacchus and the team for making it all happen!
It’s lunchtime on the first day of Apachecon. Too soon to assess the event as a whole, but I’ve formed a view on the venue.
Of all the ApacheCon venues I’ve been to, I think this week’s seems the best. The Corinthia Hotel is about as good as any I’ve encountered, and we’re in a nice area of the great historic city of Budapest. Amsterdam is the only past-Apachecon city that can really rival Budapest, but that was let down by a bad conference hotel. And conversely, where I’ve encountered decent hotels, they’ve been in some altogether less pleasant or interesting locations. At worst we’ve had poor hotels in poor locations.
Come to think of it, that’s not just Apachecon, it’s conferences of any kind, even stretching back to my days in academia.
Of course, my perception may be coloured by individual circumstances too. I’m not doing anything stressful like giving a talk or tutorial this time. And I may have been fortunate to have been allocated an ideal hotel room, overlooking a quiet quadrangle where I can open the window wide for fresh air without being disturbed either by outside traffic or hotel noise.
Just a couple of flies in the ointment. The weather in bleak November isn’t entirely conducive to getting the most from Budapest. And there are not sufficient power outlets to wield the laptop everywhere around the conference. Even if that’s (arguably) a good thing when in a presentation, the shortage of power points applies even to the designated hacker area, which is itself not a strong point of the event.
OK, time to get back to conferring!
I spent two days last week at the trafficserver summit.
Or rather, two evenings. The summit was held in Silicon Valley (hosted by linkedin), while I remained at home in Blighty with a conferencing link, making me one of several remote attendees. With an 8 hour time difference, each day started at 5pm and went on into the wee hours. On the first day (Tuesday) this followed a day of regular work. On the Wednesday I took a more sensible approach and the only work I did before the summit was a bit of gardening. Despite that I felt more tired on the Wednesday.
The conferencing link was a decent enough instance of its kind, with regular video alongside screen sharing and text (though IRC does a better job with text). The video was pointed at the speakers as they presented, and the screen sharing was used to share their presentations. That was good enough to follow the presentations pretty well: indeed, sometimes better than being there, as I could read all the intricate slides and screens that would’ve been just a blur if I’d been present in the room.
Unfortunately most of the presentations involved discussion around the room, and that was much harder, sometimes impossible, to follow. Also, speaking was not a good experience: I heard my voice some time after I’d spoken, and it sounded ghastly and indistinct, so I muted my microphone. That was using just the builtin mike in the macbook. I tried later with a proper headset when I had something to contribute, but alas it seems by then I (and I think all remote attendees, after the initial difficulties) was muted by the system. So I had something approximating to read-only access. And of course missed out on the social aspects of the event away from the presentations.
In terms of the mechanics of running an event like this, I think in retrospect we could make some modest improvements. We had good two-way communication over IRC, and that might be better-harnessed. Maybe rather than ad-hoc intervention, someone present (a session chair?) could act as designated proxy for remote attendees, and keep an eye on IRC for anyone looking to contribute to discussion. Having such a person would probably have prompted me into action on a few occasions when I had a comment, question or suggestion. Or perhaps better, IRC could be projected onto a second screen in the room, alongside the presenter’s materials.
The speakers and contents were well worth the limitations and antisocial hours of attending. I found a high proportion of the material interesting, informative, and well-presented. Alan, who probably knows more than anyone about Trafficserver internals, spoke at length on a range of topics. The duo of Brian and Bryan (no, not a comedy act) talked about debugging and led discussion on test frameworks.
Other speakers addressed applications and APIs, and deployments, ops and tools. A session I found unexpectedly interesting was Susan on the subject of how, in integrating sophisticated SSL capabilities in a module, she’s been working with Alan to extend the API to meet her needs. It’s an approach from which I might just benefit, and I also need to take a look at whether Ironbee adequately captures all potentially-useful information available from SSL.
At the end I also made (via IRC) one suggestion for a session for the next summit: API review. There’s a lot that’s implemented in Trafficserver core and utils that could usefully be made available to plugins via the API, even just by installing existing header files to a public includes directory. Obviously that requires some control over what is intended to be public, and a stability deal over exported APIs. I have some thoughts over how to deal with those, but I think that’s a subject for the wiki rather than a blog post. One little plea for now: let’s not get hung up on what’s in C vs C++. Accept that exported headers might be either, and let application developers deal with it. If anyone then feels compelled to write a ‘clean’ wrapper, welcome their contribution!
I started writing a longer post about the so-called shell shock, with analysis of what makes a web server vulnerable or secure. Or, strictly speaking, not a webserver, but a platform an attacker might access through a web server. But I’m not sure when I’ll find time to do justice to that, so here’s the short announcement:
I’ve updated mod_taint to offer an ultra-simple defence against the risk of shell shock attacks coming through Apache HTTPD, versions 2.2 or later. A new simplified configuration option is provided specifically for this problem:
LoadModule taint_module modules/mod_taint.so Untaint shellshock
Here’s some detail from what I posted earlier to the Apache mailinglists:
Untaint works in a directory context, so can be selectively enabled for potentially-vulnerable apps such as those involving CGI, SSI, ExtFilter, or (other) scripts.
This goes through all Request headers, any PATH_INFO and QUERY_STRING, and (just to be paranoid) any other subprocess environment variables. It untaints them against a regexp that checks for “()” at the beginning of a variable, and returns an HTTP 400 error (Bad Request) if found.
Feedback welcome, indeed solicited. I believe this is a simple but sensible approach to protecting potentially-vulnerable systems, but I’m open to contrary views. The exact details, including the shellshock regexp itself, could probably use some refinement. And of course, bug reports!