Category Archives: open source
How well can open source transcend cultural and language barriers?
A few days ago I posted to the ironbee-devel list about the experimental nginx module for Ironbee. It cannot be incorporated into Ironbee’s normal build/test processes in the manner of the Apache HTTPD and Traffic Server modules, because nginx doesn’t support loadable modules, so the libraries have to be installed so they can be linked when the nginx module is built. This, along with a much more limited API, is presumably one of the design decisions the nginx team made when they focussed firmly on performance over extensibility.
In response to my post, someone drew my attention to an nginx fork called tengine. The key point of tengine is that it addresses precisely the issue of loadable modules. And not just that: it supports input filters, opening up the possibility of overcoming another shortcoming of the nginx module – the need to read and buffer an entire request body before scanning it. Interesting.
I’ve now downloaded tengine, and tried building the nginx-ironbee module for it. It appears to be fully API-compatible, and the only source change needed arose from their only having forked nginx 1.2.6 (the stable version), whereas I had developed the ironbee module using nginx 1.3.x. All I need to add is a preprocessor directive to detect nginx version and work around the missing API, and the two are (or appear to be) fully interchangeable (well, until I take advantage of input filtering to improve it further). This is seriously useful!
Tengine has been a collaborative open source effort for two years now (that’s almost as long as TrafficServer), yet this is the first I’d heard of it! Perhaps one reason for that is that Tengine is made-in-China. Just as TrafficServer originated from a single major site (Yahoo) before being open-sourced, so Tengine originates with Taobao and a Chinese developer community. They have English-language resources including a decent-enough website. But as a developer I want the mailinglist: there is an English-language list, but just looking at archive sizes tells me all the traffic takes place on the corresponding chinese-language list.
How much of a barrier is language? I’ve written about that before, and now it’s my turn to find myself the wrong side of a language barrier. Actually that applies to nginx too: the Russian-born web server has a core community whose language I don’t speak. Developing the nginx-ironbee module gave me an opportunity to test a barrier from the outside, and I’m happy to report I got some helpful responses and productive technical discussion on nginx’s English-language developer list. A welcoming community and no language barrier to what I was doing.
Like other major open source projects, nginx has achieved a critical mass of interest that makes it not merely possible but inevitable that it crosses language barriers. Not all nginx’s Russian core team participate in English-language lists (nor should they!), but all it takes is one or two insiders with fluent English as points of contact to bridge the divide. I’ve no idea if I’ll get a good experience on tengine’s english-language list, but I expect I’ll find out now that I’ve heard of tengine and find it meets a need.
Corollary: there is still a language barrier. Of course! With Apache I started out developing applications (some of them modules) before making the transition to the core developer team. With nginx or tengine I know I can’t make that transition – at least not fully. And because I know that, I’m unlikely to let my work take me in that direction. The same kind of consideration may or may not have led the tengine team to fork rather try and work directly with nginx.
Some people engage in Holy Wars over what source control system to use. For my part I really can’t get too worked up over a choice of tools, but I am concerned about another question. What files do you keep in a source control repository?
I’d like to say source files. Program source files, inputs for your choice of build system, legal stuff like licenses and acknowledgements, matters of record, documentation. The key point is, files that are rightfully under the direct control of project members. Not files that are generated by software, or managed by third-parties.
In practice, this principle is all-too-often lost. One example is Apache HTTPD, whose source repos contain extensive HTML documentation that is not written by developers but generated from XML source. There’s a clue in the headers of each of these files:
<!-- XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX This file is generated from xml source: DO NOT EDIT XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -->
So these files are not source, and should really be generated in the build (or made a configuration option) rather than kept under source control. But apart from raising the overhead of using the repos, they’re harmless.
I’ve recently come upon an altogether more problematic case. It manifested itself after I’d installed all the prerequisites for a configure to succeed, but found my build fell down in compiling something. Scrolling up through reams of error messages, I find at the top:
#error This file was generated by a newer version of protoc which is #error incompatible with your Protocol Buffer headers. Please update #error your headers.
OK, that’s simple enough: the version of google protobuf I installed with aptitude is too old. Go to google and download the latest (cursing google for failing to sign it). And hack protobuf.m4 to detect this error from configure rather than fall over in the build.
But hang on! It’s not as simple as that. This isn’t the usual dependency on a minimum version: it’s a requirement for an exact version of protobuf. If I install a version that’s too new I get another error:
#error This file was generated by an older version of protoc which is #error incompatible with your Protocol Buffer headers. Please #error regenerate this file with a newer version of protoc.
Altogether more problematic. Nightmare if I have more than one app each requiring different protobuf versions. And this is a library I’m building: it could be linked with somesuch. Ouch!
The clue is at the top of the file that generates the errors:
// Generated by the protocol buffer compiler. DO NOT EDIT! // source: [filename].proto
This C++ is not source, it’s an intermediate file generated by protoc, which is part of the protobuf package. Its source is the .proto file, which is also there in the repo but not used for the build. It follows that hacking protobuf.m4 to test the version was the wrong solution: instead the build should be updated to generate the intermediate files from the .proto source.
I just tried to report a bug to Ubuntu. Nothing major, just a missing package dependency: aptitude installed libnids-dev for me without installing libpcap-dev. My configure script then insists that nids.h was not found, whereas it is in fact clearly visible in /usr/include/nids.h. Turns out the test program fails because nids.h #includes pcap.h, which is not installed. Whoops!
OK, let’s do the Right Thing for a change: don’t just ignore it, report it. How do I report a Ubuntu bug? Aha, it’s at launchpad.net. Search for nids: nope, none of the 16 bugs listed is this one. OK, time to report a new bug.
This is where the problems go from straightforward to too difficult. To report a bug, I need to log in to launchpad. To log in, I need to create an account (it waffles on about OpenID, but it won’t accept my wordpress OpenID as a login). And to create an account, I need to solve a captcha. That is, one of those nasty eyesight tests.
I can’t do it. This one is nastier than ever.
Cycle the thing a few times, they’re all as bad. Try the audio version, but it’s silent (this is on a ubuntu machine). Looks like I can’t report a bug!
I look on freenode, find #ubuntu-devel. Try asking there:
Just trying to report a bug (missing packaging dependency), but I can’t because I can’t even guess the launchpad captcha
The bug is, libnids-dev requires pcap-dev as a dependency
After a few minutes silence, start to blog this. But a few minutes more and someone replies:
first, I think you meant “libpcap-dev” instead of “pcap-dev”;
second, both these packages come unchanged from Debian, so it’s better to report this bug to bugs.debian.org
Ok, that looks like someone who knows what he’s talking about. Try a bug report to Debian. This fortunately turns out to be a much simpler process: their bug reporting site mentions a “reportbug” tool I can install with apt, and which appears to work nicely.
Ubuntu must be effectively in a bubble isolated from the big bad world!
When I first released some Apache modules, I was not yet part of the core development team. I released modules based at my own site, for whomsoever was interested. More recently, most new modules I’ve developed have gone straight into the core distribution from apache.org. I’ve discussed the issue of in or out in this blog before, and this post could be considered a case in point.
One of those earlier modules, mod_proxy_html, turned out to be the solution to a big latent need, and rapidly became my most popular single module. Since first release in 2003 it’s seen a number of significant improvements, including one for which I had direct sponsorship. More recently, the advanced internationalisation support that had developed over the years was separated out into a new module mod_xml2enc, so that the same code could be shared with other markup-processing modules without having to duplicate it and maintain multiple copies.
These modules were released as open-source, but without the infrastructure for substantial collaborative development. At first there wasn’t even a change control repository, though that was introduced fairly early. There was no bugs database, no general developer forum. Anyone wanting to participate had the choice of mailing me (which various people have done – sometimes with valuable contributions) or ignoring me and forking their own project (as in mod_proxy_content).
That’s imperfect. In ideological terms it falls short of an open development model: someone wanting to make more than a minor contribution would have to work with me taking a lead role (hire me? dream on) or fork. A bug report or enhancement request would usually but not necessarily get my attention, and if it related to a scenario I couldn’t reproduce, that could present difficulties. Whoops! Bottom line: it’s a fine model for a one-man project and somewhat workable as it grows, but lacks infrastructure support for the community that drives major open projects like Apache’s successes.
I can now announce that I’ve donated mod_xml2enc and mod_proxy_html to Apache. They will feature as standard in webserver releases from the forthcoming 2.4.0.
This gives them a platform to grow and flourish, even if I take a back seat – as inevitably happens from time to time when interest has passed a certain point. It also has some further implications for developers and users:
- Both modules are now relicensed under the Apache License. They continue to exist under the GPL (or, in the case of mod_xml2enc, dual-licensed) at webthing, so third-party developers and distributors have a choice.
- However, there is no guarantee, nor even expectation, that the two versions will remain in step. It is likely now that the version at apache will be the more up-to-date in future. That’s where it’ll get the tender loving care of a broad developer community. My own further work may happen in both places, but is more certain to happen at Apache than WebThing (unless in the unlikely event that a paying Client dictates otherwise).
This may be of particular interest to packagers. Most obviously it relieves them of the need to distribute mod_proxy_html as a separate package, but with one proviso. If these modules are packaged in a standard Apache/HTTPD distribution then libxml2 becomes a dependency of that.
Not a big deal for anything-mainstream (though in the distant past it was considered a reason not to accept mod_proxy_html into the core product), but it invites another change. If you switch from expat to libxml2 for APR’s parser (as described here) you can eliminate expat, and standardise on libxml2 for all markup parsing needs. One might consider this a good move in any case, as libxml2 is not just more powerful, but also has the more active development community of the two. The downside then is that you’ve introduced a bigger dependency for any APR users who have no use for HTTPD or libxml2.
That leaves the expat-based module mod_xmlns somewhat orphaned. I’ll probably get around to switching that one to use libxml2: it’s pretty-much a drop-in replacement. Or maybe I’ll drop it altogether in favour of Joachim Zobel’s mod_xml2, which was (I understand) originally inspired by mod_xmlns but offers an alternative and probably superior platform for XML applications.
Today’s buzz: talk of OpenOffice being donated to the Apache Software Foundation.
Wow! That’s a Very Big Catch, isn’t it? Perhaps the biggest since Hadoop? Or???
Well, maybe. As of now it’s a long way from a done deal, and it’s by no means clear that it will happen. To become an Apache project, OpenOffice will have to be accepted into the incubator where it will have to demonstrate suitability before it can graduate to an Apache project. Apache media guru Sally Khudairi has written about the incubation process here in anticipation of a wave of interest.
The first question is whether OpenOffice will enter the incubator in the first place. Before the LibreOffice split there’s little doubt it would’ve been warmly welcomed, but now there’s a questionmark over why Oracle should prefer the ASF to TDF, and whether Apache folks want to make ourselves party to a legacy of that split. But if this reaction from the LibreOffice folks represents a consensus then I for one will be happy to accept OpenOffice.
Intellectual Property should be straightforward (because Oracle owns all the rights, inherited from Sun), so the question then becomes how the community will fare. How much room is there for both projects to thrive? Who will give their loyalty to ASF in preference to TDF, or equal loyalty to both? Could separate competing projects become a Good Thing and foster innovation, or will it just add duplication and confusion to no real purpose?
There is a likely driver for an Apache version: contributors who prefer the Apache License over the GPL. That could drive interest particularly from companies like IBM who maintain their own derivative products. Whether that will give rise to a thriving community, and perhaps a development focus distinct from that of LibreOffice, remains to be seen: that’s part of what incubation will tell us.
Anyway, if OpenOffices enters incubation at Apache, I’d expect that to be make or break for it. If it thrives then we could see “Apache OpenOffice” at some future date. If not, then it pretty clearly cedes the future to LibreOffice. If only they could find a better name …
An entertaining talk at FOSDEM was Michael Meeks, on the fork from OpenOffice to LibreOffice. At the same time as delivering the now-popular message of community and open development, he was taking some quite partisan potshots at other FOSS models that unambiguously share those very values. Hmmm … good entertainment, but perhaps unduly provocative. Interestingly OpenOffice and LibreOffice both had stalls at FOSDEM, separated by only one independent exhibitor!
From an outsider’s viewpoint, there was one thing I found reassuring. Namely, the tensions that led to the split had existed during Sun’s time, before the Oracle takeover. Thus whatever mistakes may have happened are not new. I like to think Oracle is building on what Sun did right and drawing a line under what was wrong. It would’ve been sad to hear that Oracle had damaged something Sun was doing right, and Meeks’s talk reassures me that hasn’t happened in this case.
The open-source-but-owned-and-controlled development model such as (most famously) that of MySQL can work, but seems to have fallen comprehensively out of favour with FOSS communities. It’s at its best where third-parties are minor contributors, but is likely to lead to a fork if outside developers are taking a major interest. And it’s never good to send mixed messages to the community: they’ll remember the big claims when you back-pedal.
 How is anyone supposed to promote a program the pronunciation of whose very name is a stumbling-block? Shot in the foot there, methinks. Is that the laughter of Redmond I hear?
 I’m a user of OpenOffice but have never contributed to its development, nor am I familiar with its community.
The Free Software Foundation (FSF) has gone public with a statement on the Oracle vs Google litigation. The FSF is of course free to do so, and since it’s also a campaigning organisation we should not be surprised when they do. But does the statement itself stand up to scrutiny?
Before going any further, I should make it clear: this is a comment on the FSF’s position statement. No matter where this appears aggregated, I don’t represent anyone or anything other than myself. Any views I may have on the FSF itself, on Oracle or Google, on Java implementations, Android/Dalvik, on patents (software or otherwise) or on anyone/anything else, fall outside the scope of this posting. Nor should this be taken as comment on the FSF beyond this single document: as it happens, I am in general terms an admirer of the FSF.
The introduction is clear enough:
As you likely heard on any number of news sites, Oracle has filed suit against Google, claiming that Android infringes some of its Java-related copyrights and patents. Too little information is available about the copyright infringement claim to say much about it yet; we expect we’ll learn more as the case proceeds. But nobody deserves to be the victim of software patent aggression, and Oracle is wrong to use its patents to attack Android.
That’s fair: the FSF’s position against software patents is rational and consistent. Oracle vs Google is one of many patent cases currently in the courts throughout the rapidly-growing mobile devices space: some other household names that spring to mind include Apple, Nokia, HTC, and of course the victim of the biggest injustice, Blackberry-maker RIM. But it’s also fair to say Oracle vs Google may have more far-reaching repercussions than the others, insofar as it may affect Free Software in the Android ecosystem.
The second paragraph is more problematic:
Though it took longer than we would’ve liked, Sun Microsystems ultimately did the right thing by the free software community when it released Java under the GPL in 2006. [...]
That’s fair as far as it goes, but it’s becoming a partisan statement within FOSS when you implicitly dismiss the ongoing controversy over licensing a TCK. The third paragraph goes on to say:
Now Oracle’s lawsuit threatens to undo all the good will that has been built up in the years since. Programmers will justifiably steer clear of Java when they stand to be sued if they use it in some way that Oracle doesn’t like. [...]
Hang on! How is that new? The entire TCK issue is about field-of-use restrictions that are problematic for free software! At the same time, let’s not forget that Java was hugely popular among Free Software developers even before 2006: these controversies matter only to an activist minority.
If the above is nitpicking, paragraph 4 is altogether more suspect. Let’s quote it in full:
Unfortunately, Google didn’t seem particularly concerned about this problem until after the suit was filed. The company still has not taken any clear position or action against software patents. And they could have avoided all this by building Android on top of IcedTea, a GPL-covered Java implementation based on Sun’s original code, instead of an independent implementation under the Apache License. The GPL is designed to protect everyone’s freedom—from each individual user up to the largest corporations—and it could’ve provided a strong defense against Oracle’s attacks. It’s sad to see that Google apparently shunned those protections in order to make proprietary software development easier on Android.
Erm, this really is an attack on Apache! How would IcedTea have helped here? The only valid argument that it might have done is that rights were granted with Sun’s original code. I don’t think it’s clear to anyone outside the Oracle and Google legal teams whether and to what extent such ‘grandfather’ rights might affect the litigation. As far as licenses are concerned, the Apache License is a lot stronger on protection against patent litigation than the GPLv2 under which IcedTea is licensed. Indeed, in separate news, Mozilla (another major player in Free Software) is updating its MPL license, and says of its update:
The highlight of this release is new patent language, modeled on Apache’s. We believe that this language should give better protection to MPL-using communities, make it possible for MPL-licensed projects to use Apache code, and be simpler to understand.
Well, Mozilla is coming from a startingpoint closer to the GPL than Apache. It seems I’m not alone in supposing the Apache license offers the better patent protection, contrary to the FSF’s implication!
Finally the tone of the FSF statement, as expressed for example in the final paragraph, makes me uneasy:
Oracle once claimed that it only sought software patents for defensive purposes. Now it is using them to proactively attack free software.
Hmmm, attacking Android/Dalvik is proactively attacking free software? While it’s a supportable position it’s also (to say the least) ambiguous, and you haven’t made a case to convince a sceptic. Or a judge.
 Not to mention the grammar, up on which some readers of this blog will undoubtedly pick.
Remember SCO? The world’s saddest, most ludicrous software company? Well, if not, Groklaw has a rich and colourful (not to mention opinionated) archive on the subject.
The ghost of SCO has long since joined that of Jarndyce & Jarndyce, the perpetual litigants. But this week, an actual decision by a Utah jury: Novell owns the Unix copyrights.
Some believe SCO’s litigation was inherently doomed: there’s nothing to be had from Unix IP. Yes, there’s value, but that’s long-since been opened to the world, and of course independently re-engineered elsewhere, most importantly in GNU/Linux.
Others take a different view: there’s gold beyond the dreams of avarice in that Unix IP. SCO had a great idea; they just made a hash of executing it. After all, in the real world, pirates have taken such major companies as Blackberry-maker RIM and even Microsoft to the cleaners over IP that is, by any standards, a drop in the ocean set against UNIX.
So when a hedge fund bids for Novell, I expect they’re in the latter camp. They’re not an Oracle, a huge and powerful software company getting Sun, a crown-jewel complementary company on the cheap. They’re a pure money-machine. They have no business to fit Novell’s. So it seems likely they want the crown jewels of Novell’s IP.
That was before the jury declared Novell owner of such an important part of the IP! It must be worth more now, to a cash-rich wannabe-pirate.
Novell under current management has shown itself benign, and hero of the SCO story. Under other management, all bets would be off. The fact that they rejected one bid (or did they?) doesn’t necessarily mean they’ll always be able to do so – that’s up to the shareholders.
How much is it worth to lay that spectre to rest? Are you a shareholder, and if not, why not?
Yes, I’m planning to be at FOSDEM next month. Traveling by Eurostar Friday and Monday for a full weekend in Brussels.
I’ve booked the Renaissance Hotel, which is the same place I stayed last year. I can recommend it to anyone who isn’t afraid of a bit of a walk: it’s a nice place, and quite a bit closer to the FOSDEM venue than a city-centre hotel. And at winter weekend rates, the room price is vastly more reasonable than is usual in European conference cities! But I don’t know if there’s a bus/tram route for non-walkers.
Anyone keen to meet up, drop me a line (if we’re not already talking about it). Also, don’t forget to sign up for the PGP keysigning.
Anyone who works in or with software knows the danger of a product/project being orphaned: left unsupported, and its users in limbo, facing forced migration to something else. It is a strong argument in favour of open source: if you have the source, then if the worst happens and your supplier/support organisation disappears, or is bought up by someone hostile to it, you can hire someone else to maintain it.
My Apache colleague Gianugo Rabellino (one of the most interesting thinkers and inspiring speakers anywhere in the FOSS world) has argued for years that open source alone is necessary but not really sufficient, and for a product, you need open development. This evening he’s one of the many bloggers to comment on the Oracle acquisition of Sun, and argues there is now a danger of MySQL being orphaned and its users left in limbo despite MySQL being open source (GPL)! His thesis (here) is that if Oracle wants to stifle MySQL, they can make it very unrewarding for anyone else to pick up development.
I don’t think his point completely stands. If enough of the original/current MySQL team were to leave Oracle en masse, they could pick up development, and make a support business of it on the basis of their reputation, in spite of not owning the IP. But that’s not a nice scenario, compared to MySQL as an independent or within Sun. Or of course within a supportive Oracle.
On the subject of MySQL itself, I’m more optimistic (albeit through the perspective of benefit of the doubt – I want this to be good). While acknowledging the danger, I’m sure Oracle can see the business case for maintaining a healthy MySQL product and community. LAMP and other FOSS users are not short of credible alternatives: obvious candidates include PostgreSQL for serious applications or SQLite for lightweight php-ish stuff, and if MySQL loses its bloom, they’ll migrate. Surely better for Oracle to keep them on-side, make tiny margins on LAMP business and support, but gain a serious market from those who grow big and might be sold a smooth upgrade to a top-end platform where Solaris and Oracle replace Linux and MySQL.
 What’s MySQL’s current market share? Is it more than all other SQL databases combined?