Category Archives: google

Source and non-source repos

Some people engage in Holy Wars over what source control system to use.  For my part I really can’t get too worked up over a choice of tools, but I am concerned about another question.  What files do you keep in a source control repository?

I’d like to say source files.  Program source files, inputs for your choice of build system, legal stuff like licenses and acknowledgements, matters of record, documentation.  The key point is, files that are rightfully under the direct control of project members.  Not files that are generated by software, or managed by third-parties.

In practice, this principle is all-too-often lost.  One example is Apache HTTPD, whose source repos contain extensive HTML documentation that is not written by developers but generated from XML source.  There’s a clue in the headers of each of these files:

<!--
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
              This file is generated from xml source: DO NOT EDIT
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      -->

So these files are not source, and should really be generated in the build (or made a configuration option) rather than kept under source control.  But apart from raising the overhead of using the repos, they’re harmless.

I’ve recently come upon an altogether more problematic case.  It manifested itself after I’d installed all the prerequisites for a configure to succeed, but found my build fell down in compiling something.  Scrolling up through reams of error messages, I find at the top:

#error This file was generated by a newer version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please update
#error your headers.

OK, that’s simple enough: the version of google protobuf I installed with aptitude is too old.  Go to google and download the latest (cursing google for failing to sign it).  And hack protobuf.m4 to detect this error from configure rather than fall over in the build.

But hang on!  It’s not as simple as that.  This isn’t the usual dependency on a minimum version: it’s a requirement for an exact version of protobuf.  If I install a version that’s too new I get another error:

#error This file was generated by an older version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please
#error regenerate this file with a newer version of protoc.

Altogether more problematic.  Nightmare if I have more than one app each requiring different protobuf versions.  And this is a library I’m building: it could be linked with somesuch.  Ouch!

The clue is at the top of the file that generates the errors:

// Generated by the protocol buffer compiler.  DO NOT EDIT!
// source: [filename].proto

This C++ is not source, it’s an intermediate file generated by protoc, which is part of the protobuf package.  Its source is the .proto file, which is also there in the repo but not used for the build.  It follows that hacking protobuf.m4 to test the version was the wrong solution: instead the build should be updated to generate the intermediate files from the .proto source.

Ouch.

Calling home: fatal?

Was asked if I could help solve a proxying problem this evening.  My provisional diagnosis raises a couple of issues of interest, and it would be good to confirm whether my diagnosis makes sense.  Any Iphone or Android users out there should be able to say whether it’s plausible.

It started with a request: did I have an iphone or ipad, or possibly Mac (the latter in case it was something Apple-specific).  Users have been unable to view pages through the proxy, but we have no detailed explanation beyond “doesn’t work”.  Yes I have a mac, but it’s not here: is this a problem I can go away and look at?  Or, why don’t I fire up Konqueror, the KDE browser that uses the same khtml engine as Apple?  What URL should I try to see if I can reproduce the error?

This is where it gets interesting.  The purpose of the project is to run a reverse proxy, but to test it I had to configure it as forward proxy for Konq and navigate to a test URL.  It all worked fine, but the forward proxy is a test-only setup and blocks all but a selected whitelist of sites.

OK, next tack, can I see what’s happening if I have ssh access to the proxy itself?  Trafficserver’s logs are in squid format (with which I am unfamiliar) and show ERR_CONNECT_FAIL when the errors occur.  Looking that message up, I find it should just mean Trafficserver was unable to contact the origin server.  By about this time it’s also been established that Android clients have the same problem.

Reading the log, I’m guessing the clients having trouble are trying to “phone home”, so to test this I generate a couple of requests using Lynx through the proxy: one to a proxied site, the other to Google.  This confirms my suspicion: the google request (which is blocked) generates precisely the log entry associated with failed requests.  It also helps clarify my reading of the squid-format logs, and confirms that the iphone and android clients’ failed requests are in fact to Google URLs.

So my question to iphone and android users: would a failed call-home request (to Google) throw an error that would prove fatal to a regular page loading from elsewhere?  That seems rather bizarre, though not really more so than Google maps/satnav refusing to work at all without a live data connection.

If that is indeed the problem, it still doesn’t explain why the problem should arise in normal use, when it’s a reverse proxy and the google connection is direct.  Looks like either the problem is in fact on someone else’s network (combined with dumb browser design), or the messages seen in Trafficserver’s logs are a complete red herring and unrelated to the problem.  Hmmm.

Furthering the interests of Free Software?

Or not.

The Free Software Foundation (FSF) has gone public with a statement on the Oracle vs Google litigation.  The FSF is of course free to do so, and since it’s also a campaigning organisation we should not be surprised when they do.  But does the statement itself stand up to scrutiny?

Before going any further, I should make it clear: this is a comment on the FSF’s position statement.  No matter where this appears aggregated, I don’t represent anyone or anything other than myself.  Any views I may have on the FSF itself, on Oracle or Google, on Java implementations, Android/Dalvik, on patents (software or otherwise) or on anyone/anything else, fall outside the scope of this posting.  Nor should this be taken as comment on the FSF beyond this single document: as it happens, I am in general terms an admirer of the FSF.

The introduction is clear enough:

As you likely heard on any number of news sites, Oracle has filed suit against Google, claiming that Android infringes some of its Java-related copyrights and patents. Too little information is available about the copyright infringement claim to say much about it yet; we expect we’ll learn more as the case proceeds. But nobody deserves to be the victim of software patent aggression, and Oracle is wrong to use its patents to attack Android.

That’s fair: the FSF’s position against software patents is rational and consistent.  Oracle vs Google is one of many patent cases currently in the courts throughout the rapidly-growing mobile devices space: some other household names that spring to mind include Apple, Nokia, HTC, and of course the victim of the biggest injustice, Blackberry-maker RIM.  But it’s also fair to say Oracle vs Google may have more far-reaching repercussions than the others, insofar as it may affect Free Software in the Android ecosystem.

The second paragraph is more problematic:

Though it took longer than we would’ve liked, Sun Microsystems ultimately did the right thing by the free software community when it released Java under the GPL in 2006. [...]

That’s fair as far as it goes, but it’s becoming a partisan statement within FOSS when you implicitly dismiss the ongoing controversy over licensing a TCK.  The third paragraph goes on to say:

Now Oracle’s lawsuit threatens to undo all the good will that has been built up in the years since. Programmers will justifiably steer clear of Java when they stand to be sued if they use it in some way that Oracle doesn’t like. [...]

Hang on!  How is that new?  The entire TCK issue is about field-of-use restrictions that are problematic for free software!  At the same time, let’s not forget that Java was hugely popular among Free Software developers even before 2006: these controversies matter only to an activist minority.

If the above is nitpicking, paragraph 4 is altogether more suspect.  Let’s quote it in full:

Unfortunately, Google didn’t seem particularly concerned about this problem until after the suit was filed. The company still has not taken any clear position or action against software patents. And they could have avoided all this by building Android on top of IcedTea, a GPL-covered Java implementation based on Sun’s original code, instead of an independent implementation under the Apache License. The GPL is designed to protect everyone’s freedom—from each individual user up to the largest corporations—and it could’ve provided a strong defense against Oracle’s attacks. It’s sad to see that Google apparently shunned those protections in order to make proprietary software development easier on Android.

Erm, this really is an attack on Apache!  How would IcedTea have helped here?  The only valid argument that it might have done is that rights were granted with Sun’s original code.  I don’t think it’s clear to anyone outside the Oracle and Google legal teams whether and to what extent such ‘grandfather’ rights might affect the litigation.  As far as licenses are concerned, the Apache License is a lot stronger on protection against patent litigation than the GPLv2 under which IcedTea is licensed.  Indeed, in separate news, Mozilla (another major player in Free Software) is updating its MPL license, and says of its update:

The highlight of this release is new patent language, modeled on Apache’s. We believe that this language should give better protection to MPL-using communities, make it possible for MPL-licensed projects to use Apache code, and be simpler to understand.

Well, Mozilla is coming from a startingpoint closer to the GPL than Apache.  It seems I’m not alone in supposing the Apache license offers the better patent protection, contrary to the FSF’s implication!

Finally the tone[1] of the FSF statement, as expressed for example in the final paragraph, makes me uneasy:

Oracle once claimed that it only sought software patents for defensive purposes. Now it is using them to proactively attack free software.

Hmmm, attacking Android/Dalvik is proactively attacking free software?  While it’s a supportable position it’s also (to say the least) ambiguous, and you haven’t made a case to convince a sceptic.  Or a judge.

[1] Not to mention the grammar, up on which some readers of this blog will undoubtedly pick.

Google vs Google

I’m not a regular user of Google Chrome, but I have it installed, and turn to it from time to time.

Recently I tried to use it with Google Maps.  It failed outright to load the page, complaining of errors in it.  Meanwhile other browsers[1] are fine with it.

Oh dear!

[1] at least Firefox, MicroB and Opera.  Not interested enough to turn it into a survey, but I recollect using at least those with Google Maps.

Mobile Maps

I’ve been using Nokia’s maps and GPS on my ‘phone for some time.  It works well on the road, but has basically no information other than roads (and while the roads data are good, other data such as rivers and railways are often inaccurate).  An annoying artifact of the software assumes you’re on a road, and tends to “correct” the computed fix if you’re not.  This leads to an illusion of greater accuracy, but ensures poorer reliability[1].

Recently I tried Google’s maps app.  It’s very pretty, and contains rather more information than Nokia’s, though it’s also much slower e.g. to zoom/pan.  From home it could see two GPS satellites, and computed a poor fix, nearly 200m away from me (I presume it combined the two GPS satellites with non-GPS info – maybe it knows individual mobile phone masts or something).  Surprisingly, the fix was consistent: it gave me the same incorrect position the next day.  But since that was from indoors, I gave it the benefit of the doubt: surely it’ll do better in the open.

Then I tried it while out walking.  No use: it insists on a data connection (does it need to ‘phone home)?  Unlike Nokia’s map, which asks for a connection on startup but works fine without one if I hit “cancel”, google’s refuses to proceed without it.  Bah, Humbug.

This morning I tried another variant: I fired up google maps at home, then kept it running as I went out.  No use: a short way down the road, it lost my WIFI and insisted on a new connection.

So, back to Nokia maps.

[1] A subject I do know about, both in theory (as a mathematician) and in practice (as someone who has done quite a lot of work in the field).

Google Book Search

A couple of months ago, I received slightly-suspect email about the google book search settlement.

Now I’ve got it on paper, from someone called Rust Consulting, and referencing www.googlebooksettlement.com.  This one makes sense, and looks credible, though I’ll still have to google these folks to check for any suspicion of a scam.

Seems I have the choice to accept the settlement and possibly become eligible for some google-money in return for waiving any right to sue them over copyright, or opt out and retain my rights.  Well, the latter would obviously be nonsense for me as an individual, even if I wanted to sue.  My publisher (a $8-billion company) would have the resources to sue, but that’s not going to happen, nor would I want to get involved if it did.

So I guess that just leaves the question: do I get some google-gold?  The settlement provides for google to make money from books, and pay 63% of that to rightsholders.  But this begs the question: how is that allocated between the author and the publisher?  Our contract obviously doesn’t cover revenues from google, and I’m not sure what general/catchall clauses might apply here.  Other things being equal, it would probably be best if google pays the publisher, and my share gets added to my royalty cheques, to avoid the high cost of cashing a separate dollar-cheque from google.

Whether the same reaction would apply to a professional author – one for whom writing was their main occupation – is a different question.  I’ll leave that to them.

What’s so great about the Goog?

As we all know, Google is the best search engine and the most useful single site on the ‘net.  Not that it’s perfect, and outside of its core web-search function, it has some manure-grade junk out there too.

Now there seems to be a bit of a meeja fuss over some goog-killer-wannabe called cuil.  I can only attribute this to cuil having an effective bullshitPR department, to get so much attention.  The report on El Reg (NSFW) shows a snapshot of results, that leave me no wish to see more: for the first time in over 20 years on the ‘net, I’ve been exposed to … ahem … dirty pics (I won’t say porno, because I don’t see how two men masturbating is supposed to titillate).  Ahem, not what I want in my search results.

And don’t forget, El Reg is a regular Goog-basher who you’d expect to welcome – albeit not uncritically – a real alternative.

But what’s really so good about the Goog?

It is indeed a million times better than the other media darling, but then Yahoo’s indexing always was a sick joke, and I really can’t see how anything other than mindshare in the mainstream meeja ever sustained it.  But the same cannot be said for all the alternatives.  When Google was launched, Altavista was out there doing a great job – albeit with less hype than the big Y – and the Goog was no more than a comparably competent alternative.

I think the comparison to Altavista at a time when the Goog had yet to develop its pagerank (automated peer review) to give it a real lead in terms of results quality, can reveal what’s really so much better about the Goog.  It’s not that Google gave us what we wanted (which Altavista also did): it’s that Goog spared us what we didn’t want! No deezyner page with pretty graphics.  No crap.  Just the information we were looking for.  And – later – text-based ads that still don’t seriously detract from the useful results.

And that’s comparing Goog to the best of the pre-goog bunch: a good site with a logo that was at least pretty (wikipedia has it still), and incomparably less obnoxious crap than Yahoo inflicts on you.  Ironic that it now seems to have been swallowed by Yahoo and gone minimalist – a decade on from the opportunity it lost and Google won.

Things that come around

Several years ago, I tried proposing to Google that they should incorporate accessibility analysis into their search rankings. Their (eventual) reply was, not interested.

I’ve just heard the BBC’s In Touch program, which deals with issues affecting blind and partially-sighted people. Today we had a lengthy interview, with a blind Indian engineer working at Google on exactly that problem. He explained that the accessibility-enhanced search will as first priority select the best/most relevant pages by google’s standard closely-guarded-secret algorithms, but then order those results to ensure that the highest-placed results are accessible.

He even gave some technical details of how the accessibility assessment works. The perennial subject of alt attributes was mentioned (without details on how they assess them), but more interestingly, he referred to well-structured pages, and clearly uses HTML heading markup as a criterion.

It’s all happening very quietly, but it’s gratifying to those of us who have been banging on about this for years. Of course, it would’ve been far better if they’d used Site Valet (customised as necessary to integrate with their systems) for this analysis.

YouTube + Royalties = Spam

El Reg Reports that youtube has struck its first deal with a performing rights society, presumably involving royalties. So every youtube entry becomes a numbers game with potential money.

I expect that means we’ll see a great new wave of spam involving YouTube URLs, making another one for the spam filters. The only hint of good news is that google may be well-equipped to penalise spammers, if its deal(s) allow that. But from an end-user point of view, that doesn’t mean less spam, just another long battle.

Bah, Humbug.

Narcissism with google

My dad ‘phoned this morning, as he usually does on a Sunday. Somehow, the conversation got on to namesakes. I know there’s another Nick Kew on the web: he shows up on a google images search (which will also give you a lot of false hits for me).

Anyway, I switched the search to look for namesakes of my dad, and found a few. Then returned to google’s normal web search and found more. That’s a definite contrast: a google search on my own name turns up a page of just me.

So I wondered: how far can one dominate google’s results for one’s own name? Well, I just tried it, and the first 73 pages are either by me, or third-parties who reference me by name. The first competition is at position #74, where the reference is “… take Nick to Kew Gardens …”. That’s not me, and I’ve never been to the gardens.

Of course, it’s of no commercial value: anyone using the search term knows exactly what they want, and they’ll find it whether or not it’s me. But I wonder how much the “come top in …” con-artists would charge for that:-)

Follow

Get every new post delivered to your Inbox.