Distributing JARs for Map/Reduce jobs via HDFS

Posted by chetan on December 29, 2010 in development

Hadoop has a built-in feature for easily distributing JARs to your worker nodes via HDFS but, unfortunately, it’s broken. There’s a couple of tickets open with a patch again 0.18 and 0.21 (trunk) but for some reason they still haven’t been committed yet. We’re currently running 0.20 so the patch does me no good anyway. So here’s my simple solution:

I essentially copied the technique used by ToolRunner when you pass a “libjars” argument on the command line. You simply pass the function the HDFS paths to the JAR files you want included and it’ll take care of the rest.

Example usage:

public int run(String[] args) throws Exception {
 
    JobConf job = new JobConf(getConf());
 
    // ... job setup ...
 
    NerfUtils.addJarsToJobClasspath(job, 
        new String[] { 
            "/libraries/java/solr-commons-csv-1.4.1.jar" });
 
    // ... more job setup ...
 
    return JobClient.runJob(job).getJobState();
}

Might not be the prettiest or best solution but it works for me!