Category Archives: google

I blame Google

When google comes under attack, I’m usually one of the voices in the peanut gallery defending them.  That’s because most of the attacks on them, particularly the anti-trust stuff involving regulators, is grossly ill-informed and follows an Agenda that seeks to subvert Google’s central purpose of supplying the best possible search results for the person searching.

Now I’m going to attack.  It may be true (as I’ve argued here before) that there’s a certain historic inevitability to the Enclosure of the Commons.  But that doesn’t excuse Google’s crucial role, particularly in the demise of the Usenet commons.

The suicide and resurrection of an online community in which I participate has reminded me of that.  It started on November 3rd, with an an announcement that a set of discussion boards was to close on Nov 17th.  Just two weeks notice: quite a large number of boards and a thriving community. The reason given was problems with old/unmaintainable software (which had indeed left a lot to be desired), but we suspect that the more fundamental reason was that the website (which has, in other areas, a number of paid staff) was losing money.

Why they didn’t try to sell the boards – with community intact – to whomsoever thought they could make a go of it – eludes me.  But that’s now water under the bridge.  And it may be a long-term blessing, if a highest bidder might’ve been under financial pressure themselves and perhaps trashed the site with intrusive levels of advertising.

Of course, discussion turned to ideas for how it might be replaced.  My own preferred option of a decentralised solution – individual blogs with an aggregator to focus the community – was a non-starter on that timescale, even if it could in principle have gained traction in the absence of time pressure.   But someone else had a practical solution: they set up an alternative site at a new domain with well-chosen name, and phpbb driving a replacement set of boards.  They announced it within hours of the closure notice, and rapidly gained traction.  The community has been rapidly migrating to the new site, which now also has tremendous goodwill.  Early days, but it seems we have a level of continuity, albeit with archives about to be relegated to what may be found in dusty attics.

So what has this little tale got to do with Google or Usenet?  Well, the old boards originated in January 1998.  The second half of the ’90s was precisely when lots of websites were making a land-grab for online discussion fora, and a rising non-techie user base would follow the best-advertised route oblivious to inherent limitations like private (often quixotic) control and single points of congestion and failure.  As soon as a community moves from the Usenet commons to the private gardens – walled or otherwise – of a website, it becomes vulnerable to all kinds of things, like a rug being pulled.

Google’s role comes in their own land-grab, and in what they did to Dejanews.  Actually, come to think of it, the first time I ever heard the name Google was in that context: they were a company that had bought Dejanews.  So now the folks who run the fantastic Usenet search engine now also have web search, and … it turns out to be rather good, returning results more-or-less as good as Altavista but without all the clutter and crap that had made Altavista a pain to use.  Nice!

But it turned out to be part of a much more sinister agenda.  Google Groups started life as a WWW gateway to Usenet: all good.  But the waves of new users coming through Google weren’t being told that: they saw web fora, with thriving communities.  If memory serves, it was the whole of Usenet (less some of the wilds of alt.*) that had been hijacked in an audacious land grab.  Old-timers found ourselves fighting a losing battle against the impression that the whole thing was Google’s territory.  Google were far from the only people doing that (and public mailinglists got similar gateways), but they were unique in owning Dejanews.

But Dejanews itself disappeared.  Or rather, became just a tab in an integrated Google search frontend.  Then the tab wasn’t even labelled “news”, which took on the obvious meaning it still has today.  Then the “groups” tab vanished: after all, the content was Google Groups, and that’s just Web content like any other, right?  Over the following decade or so, Usenet content simply vanished, increasingly much of it literally so.

The community mindshare had been grabbed, except for old-timers.  Search had been lost gradually and the community, like a boiling frog, had failed to react to incremental changes and create an alternative.  In the face of such trends, the will to put much effort into other things like newsreader development and combating the rise of spam, also waned.  The land grab has happened, the commons are lost, we live in a world of private gardens.  Worse still, many including the biggest (Facebook) are walled off against us: access is limited to their registered users!  And it’s very largely all Google’s fault.

If I can be arsed I may post a followup to this, proposing a new alternative.  It won’t be Usenet: that ship has sailed.  It will be based on aggregation and syndication of distributed content, under the control of individuals.  Damn, am I fighting the same battle I pooh-poohed Moglen for?

Advertisements

Public wifi menace

A couple of days ago, I was looking up a bus timetable from my ‘phone.  All perfectly mundane.

The address I thought I wanted failed: I don’t have it bookmarked and I’ve probably misremembered.  So I googled.

Google failed too.  With a message about an invalid certificate.  WTF?  Google annoyingly[1] use https, and I got a message about an invalid certificate.    Who is sitting in the middle?  Surely they can’t really be eavesdropping: with browsers issuing strong warnings, they’re never going to catch anything sensitive.  Must be just a hopelessly misconfigured network.

I don’t care if someone watches as I look up a bus time, I just want to get on with it!  But it’s not obvious with android how I can override that warning and access google.  Or even an imposter: if they don’t give me the link I wanted from google, nothing lost!

So has my mobile network screwed up horribly?  Cursing at the hassle, I go into settings and see it’s picked up a wifi network.  BT’s public stuff: OpenZone, or something like that (from memory).  This is BT, or someone on their network, playing sillybuggers.  Just turn wifi off and all works well again as the phone reverts to my network.

Except, now I have to remember to re-enable wifi before doing anything a bit data-intensive, like letting the ‘phone update itself, or joining a video conference.  All too easy to forget.

Hmm, come to think of it, that broken network is probably also what got between me and the bus timetable in the first place.  That wasn’t https.

[1] There are good reasons to encrypt, but search is rarely one of them.  Good that google enables it (at least if you trust google more than $random-shady-bod), but it’s a pain that they enforce it.

Bleeding Heart

The fallout from heartbleed seems to be manifesting itself in a range of ways.  I’ve been required to set new passwords for a small number of online services, and expect I may encounter others as and when I next access them.

The main contrast seems to be between admins who tell you what’s happening, vs services that just stop working.  Contrast Apache and Google:

Apache: email arrives from the infrastructure folks: all system passwords will have to be reset.  Then a second email: if you haven’t already, you’ll have to set a new password via the “forgot my password” mechanism (which sends you PGP-encrypted email instructions).  All very smooth and maximally secure – unless some glitch has yet to manifest itself.

Google: @employer email address, which is hosted on gmail, just stopped working without explanation.  But this is the weekend, and similar things have happened before at weekends, so I ignore it.  But when it’s still not back on Monday, I try logging in with my web browser.  It allows me that, and insists I set a new password, whereupon normal imap access is also restored.  Hmmm … In the first place, no explanation or warning.  In the second place, if the password had been compromised then anyone who had it could trivially have reset it.  Bottom of the class both for insecurity and for the user experience.

There is also secondary fallout: worried users of products that link OpenSSL asking or wondering what they have to upgrade: for example, here.  For most, the answer is that you just upgrade your OpenSSL installation and then restart any services that link it (or reboot the whole system if you favour the sledgehammer approach).  Exceptions to that will be cases where you have custom builds with statically linked OpenSSL, or multiple OpenSSL installations (as might reasonably be the case on a developer’s machine).  If in doubt, restart your services and check for the OpenSSL version appearing in its startup messages: for example, with Apache HTTPD you’ll see it in the error log at startup.

Source and non-source repos

Some people engage in Holy Wars over what source control system to use.  For my part I really can’t get too worked up over a choice of tools, but I am concerned about another question.  What files do you keep in a source control repository?

I’d like to say source files.  Program source files, inputs for your choice of build system, legal stuff like licenses and acknowledgements, matters of record, documentation.  The key point is, files that are rightfully under the direct control of project members.  Not files that are generated by software, or managed by third-parties.

In practice, this principle is all-too-often lost.  One example is Apache HTTPD, whose source repos contain extensive HTML documentation that is not written by developers but generated from XML source.  There’s a clue in the headers of each of these files:

<!--
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
              This file is generated from xml source: DO NOT EDIT
        XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      -->

So these files are not source, and should really be generated in the build (or made a configuration option) rather than kept under source control.  But apart from raising the overhead of using the repos, they’re harmless.

I’ve recently come upon an altogether more problematic case.  It manifested itself after I’d installed all the prerequisites for a configure to succeed, but found my build fell down in compiling something.  Scrolling up through reams of error messages, I find at the top:

#error This file was generated by a newer version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please update
#error your headers.

OK, that’s simple enough: the version of google protobuf I installed with aptitude is too old.  Go to google and download the latest (cursing google for failing to sign it).  And hack protobuf.m4 to detect this error from configure rather than fall over in the build.

But hang on!  It’s not as simple as that.  This isn’t the usual dependency on a minimum version: it’s a requirement for an exact version of protobuf.  If I install a version that’s too new I get another error:

#error This file was generated by an older version of protoc which is
#error incompatible with your Protocol Buffer headers.  Please
#error regenerate this file with a newer version of protoc.

Altogether more problematic.  Nightmare if I have more than one app each requiring different protobuf versions.  And this is a library I’m building: it could be linked with somesuch.  Ouch!

The clue is at the top of the file that generates the errors:

// Generated by the protocol buffer compiler.  DO NOT EDIT!
// source: [filename].proto

This C++ is not source, it’s an intermediate file generated by protoc, which is part of the protobuf package.  Its source is the .proto file, which is also there in the repo but not used for the build.  It follows that hacking protobuf.m4 to test the version was the wrong solution: instead the build should be updated to generate the intermediate files from the .proto source.

Ouch.

Calling home: fatal?

Was asked if I could help solve a proxying problem this evening.  My provisional diagnosis raises a couple of issues of interest, and it would be good to confirm whether my diagnosis makes sense.  Any Iphone or Android users out there should be able to say whether it’s plausible.

It started with a request: did I have an iphone or ipad, or possibly Mac (the latter in case it was something Apple-specific).  Users have been unable to view pages through the proxy, but we have no detailed explanation beyond “doesn’t work”.  Yes I have a mac, but it’s not here: is this a problem I can go away and look at?  Or, why don’t I fire up Konqueror, the KDE browser that uses the same khtml engine as Apple?  What URL should I try to see if I can reproduce the error?

This is where it gets interesting.  The purpose of the project is to run a reverse proxy, but to test it I had to configure it as forward proxy for Konq and navigate to a test URL.  It all worked fine, but the forward proxy is a test-only setup and blocks all but a selected whitelist of sites.

OK, next tack, can I see what’s happening if I have ssh access to the proxy itself?  Trafficserver’s logs are in squid format (with which I am unfamiliar) and show ERR_CONNECT_FAIL when the errors occur.  Looking that message up, I find it should just mean Trafficserver was unable to contact the origin server.  By about this time it’s also been established that Android clients have the same problem.

Reading the log, I’m guessing the clients having trouble are trying to “phone home”, so to test this I generate a couple of requests using Lynx through the proxy: one to a proxied site, the other to Google.  This confirms my suspicion: the google request (which is blocked) generates precisely the log entry associated with failed requests.  It also helps clarify my reading of the squid-format logs, and confirms that the iphone and android clients’ failed requests are in fact to Google URLs.

So my question to iphone and android users: would a failed call-home request (to Google) throw an error that would prove fatal to a regular page loading from elsewhere?  That seems rather bizarre, though not really more so than Google maps/satnav refusing to work at all without a live data connection.

If that is indeed the problem, it still doesn’t explain why the problem should arise in normal use, when it’s a reverse proxy and the google connection is direct.  Looks like either the problem is in fact on someone else’s network (combined with dumb browser design), or the messages seen in Trafficserver’s logs are a complete red herring and unrelated to the problem.  Hmmm.

Furthering the interests of Free Software?

Or not.

The Free Software Foundation (FSF) has gone public with a statement on the Oracle vs Google litigation.  The FSF is of course free to do so, and since it’s also a campaigning organisation we should not be surprised when they do.  But does the statement itself stand up to scrutiny?

Before going any further, I should make it clear: this is a comment on the FSF’s position statement.  No matter where this appears aggregated, I don’t represent anyone or anything other than myself.  Any views I may have on the FSF itself, on Oracle or Google, on Java implementations, Android/Dalvik, on patents (software or otherwise) or on anyone/anything else, fall outside the scope of this posting.  Nor should this be taken as comment on the FSF beyond this single document: as it happens, I am in general terms an admirer of the FSF.

The introduction is clear enough:

As you likely heard on any number of news sites, Oracle has filed suit against Google, claiming that Android infringes some of its Java-related copyrights and patents. Too little information is available about the copyright infringement claim to say much about it yet; we expect we’ll learn more as the case proceeds. But nobody deserves to be the victim of software patent aggression, and Oracle is wrong to use its patents to attack Android.

That’s fair: the FSF’s position against software patents is rational and consistent.  Oracle vs Google is one of many patent cases currently in the courts throughout the rapidly-growing mobile devices space: some other household names that spring to mind include Apple, Nokia, HTC, and of course the victim of the biggest injustice, Blackberry-maker RIM.  But it’s also fair to say Oracle vs Google may have more far-reaching repercussions than the others, insofar as it may affect Free Software in the Android ecosystem.

The second paragraph is more problematic:

Though it took longer than we would’ve liked, Sun Microsystems ultimately did the right thing by the free software community when it released Java under the GPL in 2006. […]

That’s fair as far as it goes, but it’s becoming a partisan statement within FOSS when you implicitly dismiss the ongoing controversy over licensing a TCK.  The third paragraph goes on to say:

Now Oracle’s lawsuit threatens to undo all the good will that has been built up in the years since. Programmers will justifiably steer clear of Java when they stand to be sued if they use it in some way that Oracle doesn’t like. […]

Hang on!  How is that new?  The entire TCK issue is about field-of-use restrictions that are problematic for free software!  At the same time, let’s not forget that Java was hugely popular among Free Software developers even before 2006: these controversies matter only to an activist minority.

If the above is nitpicking, paragraph 4 is altogether more suspect.  Let’s quote it in full:

Unfortunately, Google didn’t seem particularly concerned about this problem until after the suit was filed. The company still has not taken any clear position or action against software patents. And they could have avoided all this by building Android on top of IcedTea, a GPL-covered Java implementation based on Sun’s original code, instead of an independent implementation under the Apache License. The GPL is designed to protect everyone’s freedom—from each individual user up to the largest corporations—and it could’ve provided a strong defense against Oracle’s attacks. It’s sad to see that Google apparently shunned those protections in order to make proprietary software development easier on Android.

Erm, this really is an attack on Apache!  How would IcedTea have helped here?  The only valid argument that it might have done is that rights were granted with Sun’s original code.  I don’t think it’s clear to anyone outside the Oracle and Google legal teams whether and to what extent such ‘grandfather’ rights might affect the litigation.  As far as licenses are concerned, the Apache License is a lot stronger on protection against patent litigation than the GPLv2 under which IcedTea is licensed.  Indeed, in separate news, Mozilla (another major player in Free Software) is updating its MPL license, and says of its update:

The highlight of this release is new patent language, modeled on Apache’s. We believe that this language should give better protection to MPL-using communities, make it possible for MPL-licensed projects to use Apache code, and be simpler to understand.

Well, Mozilla is coming from a startingpoint closer to the GPL than Apache.  It seems I’m not alone in supposing the Apache license offers the better patent protection, contrary to the FSF’s implication!

Finally the tone[1] of the FSF statement, as expressed for example in the final paragraph, makes me uneasy:

Oracle once claimed that it only sought software patents for defensive purposes. Now it is using them to proactively attack free software.

Hmmm, attacking Android/Dalvik is proactively attacking free software?  While it’s a supportable position it’s also (to say the least) ambiguous, and you haven’t made a case to convince a sceptic.  Or a judge.

[1] Not to mention the grammar, up on which some readers of this blog will undoubtedly pick.

Google vs Google

I’m not a regular user of Google Chrome, but I have it installed, and turn to it from time to time.

Recently I tried to use it with Google Maps.  It failed outright to load the page, complaining of errors in it.  Meanwhile other browsers[1] are fine with it.

Oh dear!

[1] at least Firefox, MicroB and Opera.  Not interested enough to turn it into a survey, but I recollect using at least those with Google Maps.

Mobile Maps

I’ve been using Nokia’s maps and GPS on my ‘phone for some time.  It works well on the road, but has basically no information other than roads (and while the roads data are good, other data such as rivers and railways are often inaccurate).  An annoying artifact of the software assumes you’re on a road, and tends to “correct” the computed fix if you’re not.  This leads to an illusion of greater accuracy, but ensures poorer reliability[1].

Recently I tried Google’s maps app.  It’s very pretty, and contains rather more information than Nokia’s, though it’s also much slower e.g. to zoom/pan.  From home it could see two GPS satellites, and computed a poor fix, nearly 200m away from me (I presume it combined the two GPS satellites with non-GPS info – maybe it knows individual mobile phone masts or something).  Surprisingly, the fix was consistent: it gave me the same incorrect position the next day.  But since that was from indoors, I gave it the benefit of the doubt: surely it’ll do better in the open.

Then I tried it while out walking.  No use: it insists on a data connection (does it need to ‘phone home)?  Unlike Nokia’s map, which asks for a connection on startup but works fine without one if I hit “cancel”, google’s refuses to proceed without it.  Bah, Humbug.

This morning I tried another variant: I fired up google maps at home, then kept it running as I went out.  No use: a short way down the road, it lost my WIFI and insisted on a new connection.

So, back to Nokia maps.

[1] A subject I do know about, both in theory (as a mathematician) and in practice (as someone who has done quite a lot of work in the field).

Google Book Search

A couple of months ago, I received slightly-suspect email about the google book search settlement.

Now I’ve got it on paper, from someone called Rust Consulting, and referencing www.googlebooksettlement.com.  This one makes sense, and looks credible, though I’ll still have to google these folks to check for any suspicion of a scam.

Seems I have the choice to accept the settlement and possibly become eligible for some google-money in return for waiving any right to sue them over copyright, or opt out and retain my rights.  Well, the latter would obviously be nonsense for me as an individual, even if I wanted to sue.  My publisher (a $8-billion company) would have the resources to sue, but that’s not going to happen, nor would I want to get involved if it did.

So I guess that just leaves the question: do I get some google-gold?  The settlement provides for google to make money from books, and pay 63% of that to rightsholders.  But this begs the question: how is that allocated between the author and the publisher?  Our contract obviously doesn’t cover revenues from google, and I’m not sure what general/catchall clauses might apply here.  Other things being equal, it would probably be best if google pays the publisher, and my share gets added to my royalty cheques, to avoid the high cost of cashing a separate dollar-cheque from google.

Whether the same reaction would apply to a professional author – one for whom writing was their main occupation – is a different question.  I’ll leave that to them.

What’s so great about the Goog?

As we all know, Google is the best search engine and the most useful single site on the ‘net.  Not that it’s perfect, and outside of its core web-search function, it has some manure-grade junk out there too.

Now there seems to be a bit of a meeja fuss over some goog-killer-wannabe called cuil.  I can only attribute this to cuil having an effective bullshitPR department, to get so much attention.  The report on El Reg (NSFW) shows a snapshot of results, that leave me no wish to see more: for the first time in over 20 years on the ‘net, I’ve been exposed to … ahem … dirty pics (I won’t say porno, because I don’t see how two men masturbating is supposed to titillate).  Ahem, not what I want in my search results.

And don’t forget, El Reg is a regular Goog-basher who you’d expect to welcome – albeit not uncritically – a real alternative.

But what’s really so good about the Goog?

It is indeed a million times better than the other media darling, but then Yahoo’s indexing always was a sick joke, and I really can’t see how anything other than mindshare in the mainstream meeja ever sustained it.  But the same cannot be said for all the alternatives.  When Google was launched, Altavista was out there doing a great job – albeit with less hype than the big Y – and the Goog was no more than a comparably competent alternative.

I think the comparison to Altavista at a time when the Goog had yet to develop its pagerank (automated peer review) to give it a real lead in terms of results quality, can reveal what’s really so much better about the Goog.  It’s not that Google gave us what we wanted (which Altavista also did): it’s that Goog spared us what we didn’t want! No deezyner page with pretty graphics.  No crap.  Just the information we were looking for.  And – later – text-based ads that still don’t seriously detract from the useful results.

And that’s comparing Goog to the best of the pre-goog bunch: a good site with a logo that was at least pretty (wikipedia has it still), and incomparably less obnoxious crap than Yahoo inflicts on you.  Ironic that it now seems to have been swallowed by Yahoo and gone minimalist – a decade on from the opportunity it lost and Google won.