Virgin hadoop/hdfs/C++

I’ve just been playing with the C++ API to Hadoop‘s HDFS.  All on my newly-installed virgin Linux box, so no baggage!  I encountered a few problems, all of which proved straightforward to fix, but which may highlight issues of possible interest to their documentation folks.  Recording here while it’s relatively fresh in the mind.

1:  I’ve downloaded hadoop, now how do I install it?  The docs and README tell me nothing; there’s no INSTALL.  I played with quick start in the download directory, but obviously that’s not something you want to keep on doing!  Fortunately someone on the wiki tells me: I just move the whole caboodle to /usr/local and set up the paths.  And a dedicated hadoop user as suggested there makes sense.

2: Now “hadoop” works and emits a usage message, but as soon as I try to do something it fails.  OK, my virgin linux box doesn’t have a JVM installed; just go ahead and install it.  The fact that “hadoop” had produced the usage message had led me to suppose it was installed: misleading until “file hadoop” revealed it to be a script!

3: How do I tell hadoop where to keep its filesystem?  Quickstart tells me

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

and the wiki is similar.  But neither of them tell me where in my filesystem it’ll start writing!  If I have to RTFM for that without a clue where in TFM to start, it rather defeats the purpose of a quick start!  OK, run it as my newly-minted hadoop user, so filesystem protections protect me from anything I can’t wipe-and-start-again if it seems to be writing to lots of places I don’t want.

Turns out it created stuff in /tmp (which is fine for now, though I think some of what it created is supposed to be persistent).  Also lots of log files, in hadoop’s logs dir – which is also fine just so long as I know where they are!  Takes a bit more browsing the wiki to find how to configure it – at yahoo’s tutorial pages!

4: Where are all the files?  Lots of find and locate required ‘cos they’re not under Hadoop’s /src and /lib directories, and there isn’t an /include!  The C++ API has its own directory as an apparent afterthought.

5: Trial and error required to compile the HelloWorld C sample program.  I ended up with the following makefile to record paths.  Not a problem, but perhaps the docs page could use it:

CFLAGS=         -g -O0 -Wall -c
INCLUDES=       -I /usr/local/hadoop/src/c++/libhdfs \
                -I /usr/lib/jvm/java-6-sun- \
                -I /usr/lib/jvm/java-6-sun-
LDPATH=         -L /usr/local/hadoop/c++/Linux-i386-32/lib/ \
                -L /usr/lib/jvm/java-6-sun-
LIBS=           -lhdfs -ljvm

sample:         sample.o
                $(CC) -o sample sample.o $(LDPATH) $(LIBS)

sample.o:       sample.c
                $(CC) $(CFLAGS) $(INCLUDES) sample.c

6: Finally, I needed to set library path and CLASSPATH.  Throwing the kitchen sink at the latter, as recommended in the scanty docs, I end up with the ugly but functional:

export PATH=/usr/local/hadoop/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/hadoop/c++/Linux-i386-32/lib/:/usr/lib/jvm/java-6-sun-$LD_LIBRARY_PATH
export CLASSPATH=/usr/local/hadoop/hadoop-0.20.1-ant.jar:/usr/local/hadoop/hadoop-0.20.1-core.jar:/usr/local/hadoop/hadoop-0.20.1-examples.jar:/usr/local/hadoop/hadoop-0.20.1-test.jar:/usr/local/hadoop/hadoop-0.20.1-tools.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar:/usr/local/hadoop/lib/commons-codec-1.3.jar:/usr/local/hadoop/lib/commons-el-1.0.jar:/usr/local/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/lib/commons-logging-1.0.4.jar:/usr/local/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop/lib/commons-net-1.4.1.jar:/usr/local/hadoop/lib/core-3.1.1.jar:/usr/local/hadoop/lib/hsqldb-

Posted on January 3, 2010, in apache. Bookmark the permalink. 2 Comments.

  1. Nice feedback, thanks. You might find the following 5 minute Hadoop quick start guide useful:

  2. Virgin Linux box? Didn’t know that Big Beardy dug Free Software…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: