Category Archives: http

Whither FTP?

I recently installed an update of a software package running on an Amazon EC2 host.

In the configure step I found there was an unsatisfied dependency: it wanted ossp-uuid, which was not available on the system.  Neither was yum able to find it: there was an alternative uuid, but no hint of anything from ossp.  Turned up some problems with yum too (a hung security-update process from weeks ago and a corrupted database), but that’s another story.  Checking my box at home, the reason I hadn’t stumbled on the dependency is that ossp-uuid is installed as a standard package here.  A case of different distros having different packages in their standard repos.

In the absence of a package, installing from source seemed the obvious thing to do.  So I made my way to ossp.org, from where navigation to an ossp-uuid source download is easy.  Reassuringly I see Ralf Engelschall is in charge (whois lists him too), but worryingly none of the packages are signed.  A summary look at the source package reassures me it looks fine, though I don’t have time for exhaustive review.  In the unlikely event of a trojan package having found its way to the site, I expect some reader of my blog will alert me to the story!

Anyway, that’s getting ahead of myself.  The unexpected problem I faced was actually downloading the package, which is available only through FTP.  Firefox from home timed out; lynx or perl GET from the ec2 machine returned an unhelpful error.  Looks like a firewall in the way of FTP building its data connection.  Installing an old-fashioned commandline ftp I found neither active nor passive mode would work, meaning neither the client nor the server could initiate the data connection.

Before going into an exhaustive investigation of those firewall components over which I have control (my router being #1 suspect at home), I decided to try other routes.  The problem was resolved when I was able to access the FTP server from my own (webthing) web server, then make the package available over HTTP from there to the ec2 box.

In the Good Old Days before the coming of web browsers and bittorrent, FTP was THE protocol for transferring files.  In 1990s web browsers it shared equal status with HTTP and others, and even into this century it was widely seen as a superior protocol to HTTP for data, particularly bigger files.

Now by contrast, the widespread use of blind firewalls requires me to jump through hoops just to use the protocol.  The rant I once published about everything-over-HTTP is coming to pass, and is not a good thing.

Everything-over-HTTP

There are some forces that just cannot be stopped. Block the path, and they’ll just go round some other way.

The classic 7-layer network model has served us well, and continues to do so. Each network interface has an IP address and a lot of ports, some of them allocated by policy to specific network services, others unused. The purpose of having lots of different ports is that different services, having different architectures and semantics for different applications, can use different ports. That way they run independent of each other and without interference on the same network. Simple, secure and elegant.

But nowadays we have firewalls. Packet-filtering firewalls, very simple to understand and deploy. Just shut off every port you don’t explicitly need. You can configure that on consumer-devices such as ADSL routers, and protect the unwashed masses from a significant part of their own ignorance. This is a Good Thing.

But like many good things, it has a downside. It’s so simple and useful that it gets into company policies, where it is misused. A typical policy goes something like “This is a webserver. We open ports 80 and 443, and firewall everything else”. This is a good policy up to a point, but no further. The point in question being where you have a legitimate need to run a service for which HTTP is not well-suited.

Like, for example, symmetric, stateful two-way communication over a persistent connection. OK, it can be done over HTTP. There are many ways to deal with state, at the cost of modest complexity. Server-driven communication goes back to Netscape’s Server Push in 1995, and lives on in pseudo-protocols[1] such as “Comet” today. But it’s completely contradictory to the HTTP Request-Response model, and implementing it on top of HTTP implies significant extra complexity over running a dedicated service on a different port.

A Comet application is a Heath-Robinson construction to drive a non-HTTP network application over HTTP. In a sensible world it would run over its own port, independent of the HTTP server. But security policies stand in the way of that: getting authorisation to open a port for it is more trouble than it’s worth. So the world routes around the firewall using Comet instead. And in doing so, introduces more complexity, more scope for bugs and security vulnerabilities.

This is a Bad Thing. And there’s a whole culture of it: the demand is such that we’re getting generic tools and a name. How long will it be before there are off-the-shelf applications that only support Comet, so that even a company with a pragmatic and informed firewall policy is driven to use it? From a browser point of view it’s just another potentially-useful capability they’ll want to support in an AJAX world.

Arguably yet more bizarre are the clutch of XML-over-HTTP protocols in and around “web services”. The simpler protocols such as XMTP, XMLRPC or SPARQL are straightforward wrappers on some kind of exchange, putting everything into an HTTP payload at a cost in complexity and performance. Once again, their raison d’être is to work around a firewall to provide services that could more simply run on their own port (and sometimes do).

In the case of webservices, routing through HTTP does serve a useful purpose. The webserver becomes an application firewall (e.g. with mod_security), and can become part of the application (e.g. with mod_publisher). Additionally, it can be used to enforce things like access control policies and bandwidth management. In short, the application gets the benefit of Apache’s modular framework. Or whatever benefits another server may offer.

Still, the bottom line is that when a traditional path gets closed, the world will route around it. On balance, it’s hard to call this a Good Thing or a Bad Thing: it’s just inevitable.  Dangerous, because any vulnerabilities in your applications won’t go away just because access is tunneled over HTTP.  But anyone who tries to make a distinction between good and bad uses of the breach in the firewall becomes a Cassandra.

[1] What do you call something that looks like an embryonic protocol but lacks things like an RFC or other published spec?