Source and non-source repos

Some people engage in Holy Wars over what source control system to use.  For my part I really can’t get too worked up over a choice of tools, but I am concerned about another question.  What files do you keep in a source control repository?

I’d like to say source files.  Program source files, inputs for your choice of build system, legal stuff like licenses and acknowledgements, matters of record, documentation.  The key point is, files that are rightfully under the direct control of project members.  Not files that are generated by software, or managed by third-parties.

In practice, this principle is all-too-often lost.  One example is Apache HTTPD, whose source repos contain extensive HTML documentation that is not written by developers but generated from XML source.  There’s a clue in the headers of each of these files:

<!--
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
              This file is generated from xml source: DO NOT EDIT
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      -->

So these files are not source, and should really be generated in the build (or made a configuration option) rather than kept under source control.  But apart from raising the overhead of using the repos, they’re harmless.

I’ve recently come upon an altogether more problematic case.  It manifested itself after I’d installed all the prerequisites for a configure to succeed, but found my build fell down in compiling something.  Scrolling up through reams of error messages, I find at the top:

#error This file was generated by a newer version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please update
#error your headers.

OK, that’s simple enough: the version of google protobuf I installed with aptitude is too old.  Go to google and download the latest (cursing google for failing to sign it).  And hack protobuf.m4 to detect this error from configure rather than fall over in the build.

But hang on!  It’s not as simple as that.  This isn’t the usual dependency on a minimum version: it’s a requirement for an exact version of protobuf.  If I install a version that’s too new I get another error:

#error This file was generated by an older version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please
#error regenerate this file with a newer version of protoc.

Altogether more problematic.  Nightmare if I have more than one app each requiring different protobuf versions.  And this is a library I’m building: it could be linked with somesuch.  Ouch!

The clue is at the top of the file that generates the errors:

// Generated by the protocol buffer compiler.  DO NOT EDIT!
// source: [filename].proto

This C++ is not source, it’s an intermediate file generated by protoc, which is part of the protobuf package.  Its source is the .proto file, which is also there in the repo but not used for the build.  It follows that hacking protobuf.m4 to test the version was the wrong solution: instead the build should be updated to generate the intermediate files from the .proto source.

Ouch.

Posted on September 2, 2012, in apache, google, open source, source control. Bookmark the permalink. 1 Comment.

  1. Oddly enough, I understood this complaint perfectly. And I agree, in principle.

    In practice, however, the line between ‘source’ and ‘non-source’ is sometimes hopelessly blurred. In particular, people who use high-level programming languages or tools may not even be able to separate “source”. (Consider Word, for instance – one of the highest-level PLs I know. I use a tool called ‘Author-it’, which stores the source independently and only generates the Word doc for outputting purposes; in theory I could track the content of AIT in a separate source-control system, but I don’t think it’s feasible. And without something like AIT – which is a heavyweight tool in its own right – there’s simply no way to separate anything that could be called ‘source’ from a Word doc.)

    Or a wiki. Source is stored within a database, but it’s not easy to align it with a separate source-control system – at least I’ve never seen it done. (By ‘align’, I mean that you should be able to pull up a previous version of your code and automatically view the documents that relate to that version.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: