Monthly Archives: January 2010

Virgin hadoop/hdfs/C++

I’ve just been playing with the C++ API to Hadoop‘s HDFS.  All on my newly-installed virgin Linux box, so no baggage!  I encountered a few problems, all of which proved straightforward to fix, but which may highlight issues of possible interest to their documentation folks.  Recording here while it’s relatively fresh in the mind.

1:  I’ve downloaded hadoop, now how do I install it?  The docs and README tell me nothing; there’s no INSTALL.  I played with quick start in the download directory, but obviously that’s not something you want to keep on doing!  Fortunately someone on the wiki tells me: I just move the whole caboodle to /usr/local and set up the paths.  And a dedicated hadoop user as suggested there makes sense.

2: Now “hadoop” works and emits a usage message, but as soon as I try to do something it fails.  OK, my virgin linux box doesn’t have a JVM installed; just go ahead and install it.  The fact that “hadoop” had produced the usage message had led me to suppose it was installed: misleading until “file hadoop” revealed it to be a script!

3: How do I tell hadoop where to keep its filesystem?  Quickstart tells me

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

and the wiki is similar.  But neither of them tell me where in my filesystem it’ll start writing!  If I have to RTFM for that without a clue where in TFM to start, it rather defeats the purpose of a quick start!  OK, run it as my newly-minted hadoop user, so filesystem protections protect me from anything I can’t wipe-and-start-again if it seems to be writing to lots of places I don’t want.

Turns out it created stuff in /tmp (which is fine for now, though I think some of what it created is supposed to be persistent).  Also lots of log files, in hadoop’s logs dir – which is also fine just so long as I know where they are!  Takes a bit more browsing the wiki to find how to configure it – at yahoo’s tutorial pages!

4: Where are all the files?  Lots of find and locate required ‘cos they’re not under Hadoop’s /src and /lib directories, and there isn’t an /include!  The C++ API has its own directory as an apparent afterthought.

5: Trial and error required to compile the HelloWorld C sample program.  I ended up with the following makefile to record paths.  Not a problem, but perhaps the docs page could use it:

CFLAGS=         -g -O0 -Wall -c
INCLUDES=       -I /usr/local/hadoop/src/c++/libhdfs \
                -I /usr/lib/jvm/java-6-sun-1.6.0.15/include/ \
                -I /usr/lib/jvm/java-6-sun-1.6.0.15/include/linux/
LDPATH=         -L /usr/local/hadoop/c++/Linux-i386-32/lib/ \
                -L /usr/lib/jvm/java-6-sun-1.6.0.15/jre/lib/i386/client
LIBS=           -lhdfs -ljvm

sample:         sample.o
                $(CC) -o sample sample.o $(LDPATH) $(LIBS)

sample.o:       sample.c
                $(CC) $(CFLAGS) $(INCLUDES) sample.c

6: Finally, I needed to set library path and CLASSPATH.  Throwing the kitchen sink at the latter, as recommended in the scanty docs, I end up with the ugly but functional:

export PATH=/usr/local/hadoop/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/hadoop/c++/Linux-i386-32/lib/:/usr/lib/jvm/java-6-sun-1.6.0.15/jre/lib/i386/client:$LD_LIBRARY_PATH
export CLASSPATH=/usr/local/hadoop/hadoop-0.20.1-ant.jar:/usr/local/hadoop/hadoop-0.20.1-core.jar:/usr/local/hadoop/hadoop-0.20.1-examples.jar:/usr/local/hadoop/hadoop-0.20.1-test.jar:/usr/local/hadoop/hadoop-0.20.1-tools.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar:/usr/local/hadoop/lib/commons-codec-1.3.jar:/usr/local/hadoop/lib/commons-el-1.0.jar:/usr/local/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/lib/commons-logging-1.0.4.jar:/usr/local/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop/lib/commons-net-1.4.1.jar:/usr/local/hadoop/lib/core-3.1.1.jar:/usr/local/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/local/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/local/hadoop/lib/jets3t-0.6.1.jar:/usr/local/hadoop/lib/jetty-6.1.14.jar:/usr/local/hadoop/lib/jetty-util-6.1.14.jar:/usr/local/hadoop/lib/junit-3.8.1.jar:/usr/local/hadoop/lib/kfs-0.2.2.jar:/usr/local/hadoop/lib/log4j-1.2.15.jar:/usr/local/hadoop/lib/oro-2.0.8.jar:/usr/local/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/local/hadoop/lib/slf4j-api-1.4.3.jar:/usr/local/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/local/hadoop/lib/xmlenc-0.52.jar

Follow

Get every new post delivered to your Inbox.

Join 33 other followers