Category Archives: apache
Source and non-source repos
Some people engage in Holy Wars over what source control system to use. For my part I really can’t get too worked up over a choice of tools, but I am concerned about another question. What files do you keep in a source control repository?
I’d like to say source files. Program source files, inputs for your choice of build system, legal stuff like licenses and acknowledgements, matters of record, documentation. The key point is, files that are rightfully under the direct control of project members. Not files that are generated by software, or managed by third-parties.
In practice, this principle is all-too-often lost. One example is Apache HTTPD, whose source repos contain extensive HTML documentation that is not written by developers but generated from XML source. There’s a clue in the headers of each of these files:
<!-- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX This file is generated from xml source: DO NOT EDIT XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -->
So these files are not source, and should really be generated in the build (or made a configuration option) rather than kept under source control. But apart from raising the overhead of using the repos, they’re harmless.
I’ve recently come upon an altogether more problematic case. It manifested itself after I’d installed all the prerequisites for a configure to succeed, but found my build fell down in compiling something. Scrolling up through reams of error messages, I find at the top:
#error This file was generated by a newer version of protoc which is #error incompatible with your Protocol Buffer headers. Please update #error your headers.
OK, that’s simple enough: the version of google protobuf I installed with aptitude is too old. Go to google and download the latest (cursing google for failing to sign it). And hack protobuf.m4 to detect this error from configure rather than fall over in the build.
But hang on! It’s not as simple as that. This isn’t the usual dependency on a minimum version: it’s a requirement for an exact version of protobuf. If I install a version that’s too new I get another error:
#error This file was generated by an older version of protoc which is #error incompatible with your Protocol Buffer headers. Please #error regenerate this file with a newer version of protoc.
Altogether more problematic. Nightmare if I have more than one app each requiring different protobuf versions. And this is a library I’m building: it could be linked with somesuch. Ouch!
The clue is at the top of the file that generates the errors:
// Generated by the protocol buffer compiler. DO NOT EDIT! // source: [filename].proto
This C++ is not source, it’s an intermediate file generated by protoc, which is part of the protobuf package. Its source is the .proto file, which is also there in the repo but not used for the build. It follows that hacking protobuf.m4 to test the version was the wrong solution: instead the build should be updated to generate the intermediate files from the .proto source.
Ouch.
Apache 2.4
Just in case it’s news to anyone, yesterday saw the first major new release of the world’s most popular web server in just over six years. Apache HTTPD 2.4 is now released!
The last major release was version 2.2 in December 2005, while version 2.0 was first released as stable in April 2002. Why such a slow release cycle?
I guess one answer is that we have no marketing department breathing down our necks looking for press releases and big headline releases every six months. So no great pressure to keep putting out half-baked releases. This is indeed common amongst the best open-source projects: just look at how long the Linux kernel remained at 2.x, despite the constant very real need to update as hardware evolved under it! Apache’s most credible rivals in the web server space show no inclination to inflate versions, either (though application servers do: perhaps that’s why HTTPD remains a minority platform in that space).
The other reason for so few releases is that they’re rarely necessary. Apache’s modular framework means that substantial new features can be introduced without requiring any new release. The only absolute rule of minor delta releases (like an upgrade from 2.4.1 to 2.4.2, or from 2.2.0 to 2.2.20) is that they preserve full back-compatibility, so your existing modules still work even if you don’t have the source code. The ten years of Versions 2.0 and 2.2 saw many advances without fanfare.
I just dug up this text, from the preface to my book. I was right when I wrote:
The current Apache release — version 2.2 — is the primary focus of this book. Version 2.2.0 was released in December 2005, and given Apache’s development cycle, is likely to remain current for some time (the previous stable version 2.0 was released in April 2002). This book is also very relevant to developers still working with 2.0 (the architecture and API is substantially the same across all 2.x versions) and is expected to remain valid for the foreseeable future.
Modules move home
When I first released some Apache modules, I was not yet part of the core development team. I released modules based at my own site, for whomsoever was interested. More recently, most new modules I’ve developed have gone straight into the core distribution from apache.org. I’ve discussed the issue of in or out in this blog before, and this post could be considered a case in point.
One of those earlier modules, mod_proxy_html, turned out to be the solution to a big latent need, and rapidly became my most popular single module. Since first release in 2003 it’s seen a number of significant improvements, including one for which I had direct sponsorship. More recently, the advanced internationalisation support that had developed over the years was separated out into a new module mod_xml2enc, so that the same code could be shared with other markup-processing modules without having to duplicate it and maintain multiple copies.
These modules were released as open-source, but without the infrastructure for substantial collaborative development. At first there wasn’t even a change control repository, though that was introduced fairly early. There was no bugs database, no general developer forum. Anyone wanting to participate had the choice of mailing me (which various people have done – sometimes with valuable contributions) or ignoring me and forking their own project (as in mod_proxy_content).
That’s imperfect. In ideological terms it falls short of an open development model: someone wanting to make more than a minor contribution would have to work with me taking a lead role (hire me? dream on) or fork. A bug report or enhancement request would usually but not necessarily get my attention, and if it related to a scenario I couldn’t reproduce, that could present difficulties. Whoops! Bottom line: it’s a fine model for a one-man project and somewhat workable as it grows, but lacks infrastructure support for the community that drives major open projects like Apache’s successes.
Announcement
I can now announce that I’ve donated mod_xml2enc and mod_proxy_html to Apache. They will feature as standard in webserver releases from the forthcoming 2.4.0.
This gives them a platform to grow and flourish, even if I take a back seat – as inevitably happens from time to time when interest has passed a certain point. It also has some further implications for developers and users:
- Both modules are now relicensed under the Apache License. They continue to exist under the GPL (or, in the case of mod_xml2enc, dual-licensed) at webthing, so third-party developers and distributors have a choice.
- However, there is no guarantee, nor even expectation, that the two versions will remain in step. It is likely now that the version at apache will be the more up-to-date in future. That’s where it’ll get the tender loving care of a broad developer community. My own further work may happen in both places, but is more certain to happen at Apache than WebThing (unless in the unlikely event that a paying Client dictates otherwise).
Libxml2 Dependency
This may be of particular interest to packagers. Most obviously it relieves them of the need to distribute mod_proxy_html as a separate package, but with one proviso. If these modules are packaged in a standard Apache/HTTPD distribution then libxml2 becomes a dependency of that.
Not a big deal for anything-mainstream (though in the distant past it was considered a reason not to accept mod_proxy_html into the core product), but it invites another change. If you switch from expat to libxml2 for APR’s parser (as described here) you can eliminate expat, and standardise on libxml2 for all markup parsing needs. One might consider this a good move in any case, as libxml2 is not just more powerful, but also has the more active development community of the two. The downside then is that you’ve introduced a bigger dependency for any APR users who have no use for HTTPD or libxml2.
That leaves the expat-based module mod_xmlns somewhat orphaned. I’ll probably get around to switching that one to use libxml2: it’s pretty-much a drop-in replacement. Or maybe I’ll drop it altogether in favour of Joachim Zobel’s mod_xml2, which was (I understand) originally inspired by mod_xmlns but offers an alternative and probably superior platform for XML applications.
OpenOffice at Apache?
Today’s buzz: talk of OpenOffice being donated to the Apache Software Foundation.
Wow! That’s a Very Big Catch, isn’t it? Perhaps the biggest since Hadoop? Or???
Well, maybe. As of now it’s a long way from a done deal, and it’s by no means clear that it will happen. To become an Apache project, OpenOffice will have to be accepted into the incubator where it will have to demonstrate suitability before it can graduate to an Apache project. Apache media guru Sally Khudairi has written about the incubation process here in anticipation of a wave of interest.
The first question is whether OpenOffice will enter the incubator in the first place. Before the LibreOffice split there’s little doubt it would’ve been warmly welcomed, but now there’s a questionmark over why Oracle should prefer the ASF to TDF, and whether Apache folks want to make ourselves party to a legacy of that split. But if this reaction from the LibreOffice folks represents a consensus then I for one will be happy to accept OpenOffice.
Intellectual Property should be straightforward (because Oracle owns all the rights, inherited from Sun), so the question then becomes how the community will fare. How much room is there for both projects to thrive? Who will give their loyalty to ASF in preference to TDF, or equal loyalty to both? Could separate competing projects become a Good Thing and foster innovation, or will it just add duplication and confusion to no real purpose?
There is a likely driver for an Apache version: contributors who prefer the Apache License over the GPL. That could drive interest particularly from companies like IBM who maintain their own derivative products. Whether that will give rise to a thriving community, and perhaps a development focus distinct from that of LibreOffice, remains to be seen: that’s part of what incubation will tell us.
Anyway, if OpenOffices enters incubation at Apache, I’d expect that to be make or break for it. If it thrives then we could see “Apache OpenOffice” at some future date. If not, then it pretty clearly cedes the future to LibreOffice. If only they could find a better name …
Errata
A reader has pointed out a second serious error in my book. Unlike the first, this one is obscure: noone in real life would use Digest Authentication for mod_authnz_day where there are no secrets to protect! But my reader evidently used the code as a template for something and discovered the error.
The error is on Page 195, where apr_md5 is used to compute an MD5 hash. apr_md5 in fact computes a binary digest, which then has to be encoded to the hash we need (as in htdigest). This is very simply accomplished by using ap_md5 in place of apr_md5 in our code. I have added it to the book pages errata section, and corrected the code downloadable from there.
Fortunately my correspondent was extremely complimentary about the book in general: evidently it is achieving its purpose of helping a programmer surmount the learning curve to working productively with Apache HTTPD.
He also wondered whether I have any plans for a second edition: a question I have contemplated but not acted on as we approach the release of a new stable 2.4 branch. Since 2.4 doesn’t actually obsolete 2.2 (or indeed 2.0) programming skills, my feeling is that the book remains valid, and my time would be better spent writing some supplementary standalone articles to deal with what’s changed. But then, if I do that then it’s a relatively small step to a second edition with additional chapters. Hmmm ….
Thanks to Brad Goodman for alerting me to the error, and for being so nice about it!
XML Support in APR and Apache
Recently the subject of bundling or non-bundling of expat within APR and Apache HTTPD (the web server) re-emerged on the dev list. I’ve always been against bundling: it’s a third-party library and should be a dependency. We’ve moved gradually towards that, but current practice includes bundling it in an optional dependencies package.
APR’s use of expat is in practice pretty limited and straightforward: the core does nothing very demanding with XML. And in practice, when applications such as Apache Modules need to work with XML, expat is often too limiting. So modules need to introduce an alternative XML library. The most usual choice is libxml2 as in, for example, mod_proxy_html, mod_transform, and mod_security.
Libxml2 is not just a much bigger and more powerful library than expat, it’s also very nearly a drop-in replacement. In particular, it provides a compatible SAX API. So if we could use it in place of libxml2 in APR we have a win-win for web servers (and other applications) involving libxml2: replace expat in APR, and load just the one XML library instead of two. At the same time, we don’t want to impose libxml2 as a dependency on APR applications that have no need for it.
So this week I’ve finally got around to rewriting APR’s XML module to decouple the parser and use either expat or libxml2. The choice of XML parser is now available at compile time. While libxml2 support should be considered experimental for the time being, it should become the preferred option for users of applications requiring it, potentially simplifying your configuration and reducing your footprint.
For the time being, anyone interested will need to download APR from trunk.
TrafficServer on ARM
I’ve returned to my slightly unusual hacking activity to build Apache TrafficServer on the Maemo (ARM linux) platform.
It’s a slow process, partly because of the slow platform (NFS-mounted disc for everything, just the processor from the phone) and the very large C++ code (which g++ builds slowly on any platform), but also crucially because of the number of fixes that only become apparent when the build trips up. The worst cases have been where a header file has to be fixed: there are no explicit dependencies in the Makefiles, so it implies a huge amount of largely-unnecessary recompilation.
When I started on this, zwoop told me it was never going to be much of a performer on ARM, because it relies on 64-bit Intel-only atomics that have to be emulated at a heavy cost on ARM. I found the atomics were not emulated, and their absence caused a link error. Emulation code appears to be written, but evidently the build scripts incorrectly detected maemo as having the Intel atomics. Oops!
Other fixes that have proved necessary:
- char on ARM appears to be unsigned, so I had to substitute bitwise tests for sign tests, and typecast values like EOF in comparisons.
- The build scripts failed to fix up my library path when I used –with-foo configure options. Specifically a –with-tcl option, as maemo’s Tcl package is missing tclconfig (required by trafficserver) so I had had to build a separate Tcl from source.
Not too much really: the atomics are the most complex hack. But I fear this exercise is of purely academic interest, if the atomics prove to be a show-stopper for decent performance on ARM.
Bug me if I drop the ball on documenting this exercise in three bug reports!
Debian breaking Apache
Uh-oh.
Debian has a history of going its own way and confusing its users. So when someone @debian writes this, I fear what might follow, and I want to suggest they at least read and consider this before breaking expectations and invalidating the documentation/manuals all over again. But there’s no comment facility
Anyone at Debian reading this? I know at least some of you appreciate the issue.
The webserver on a ‘phone
Yesterday I successfully built Apache APR and HTTPD (the webserver) on my pocket-’puter, a Nokia N900, also known as a smartphone.
The prerequisite to that was to install a development environment. I wanted something native running on maemo, and while the tools don’t run entirely smoothly (apt-get fails to find many of the packages), a bit of googling found me ossguy’s page leading to the necessary packages and repositories to set up a working GNU toolchain.
Having installed gcc and a couple of other packages from the repos, building APR, APR-util and HTTPD went mostly smoothly. For the record (and I may update this post when I’ve figured out more about it):
- Configure’s detection of grep and egrep fails, despite both utilities being available in standard places and working fine. This may be an artifact of the GNU-toolchain-derived configure script’s syntax failing with the “ash” shell. As a workaround I removed the broken detection and replaced in each configure script with
GREP=grep
EGREP=egrep - There were a number of assembler warnings. To be investigated.
- APR-UTIL failed the xlate test. This may mean that iconv is not available but APR_HAVE_ICONV is incorrectly detected/set. I recollect a similar issue with OpenSolaris, and I suspect a bug in the configure/build scripts.
- A few extra prerequisites were required, like zlib-dev for mod_deflate.
All in all, remarkably straightforward, and I was much surprised to see only the one failure from the test suite. The webserver is up and running, and in future I expect to treat it the same as any other dev-platform server.
Furthermore, if it’s that easy on Maemo, I’d expect it to be similarly straightforward on other ARM/Linux platforms such as Android.
One million commits
By the time you read this, the commit count in Apache’s main repository will have ticked over the million. I won’t be doing the deed, but someone from Apache’s couple of hundred projects (including podlings) and thousands of active developers will.
The fact that it’s all running smoothly and not bothered in the least by so much activity is a testament both to subversion (a mature project but a recent incomer to the Apache fold) and to the ASF’s infrastructure folks. Congratulations to them all!