An easy way to browse HDFS clusters

Posted December 30, 2010

After spending the better part of a day trying to get HDFS to mount on my Mac, I finally gave up. Luckily, I was able to find MuCommander, a cross-platform Java port of the Norton Commander of old, and as luck would have it, it supports HDFS in the latest version! Very handy for quickly browsing your HDFS clusters if you can’t mount it or don’t have the Hadoop toolset installed.

Distributing JARs for Map/Reduce jobs via HDFS

Posted December 29, 2010

Hadoop has a built-in feature for easily distributing JARs to your worker nodes via HDFS but, unfortunately, it’s broken. There’s a couple of tickets open with a patch again 0.18 and 0.21 (trunk) but for some reason they still haven’t been committed yet. We’re currently running 0.20 so the patch does me no good anyway. So here’s my simple solution:

I essentially copied the technique used by ToolRunner when you pass a “libjars” argument on the command line. You simply pass the function the HDFS paths to the JAR files you want included and it’ll take care of the rest.

Example usage:

public int run(String[] args) throws Exception {
    JobConf job = new JobConf(getConf());
    // ... job setup ...
        new String[] { 
            "/libraries/java/solr-commons-csv-1.4.1.jar" });
    // ... more job setup ...
    return JobClient.runJob(job).getJobState();

Might not be the prettiest or best solution but it works for me!

Using Hadoop’s DistributedCache

Posted December 28, 2010

Using Hadoop’s DistributedCache mechanism is fairly straightforward, but as I’m finding is common with everything-Hadoop, not very well documented.

Adding files

When setting up your Job configuration:

// Create symlinks in the job's working directory using the link name 
// provided below
// Add a file to the cache. It must already exist on HDFS. The text
// after the hash is the link name.
    new URI("hdfs://localhost:9000/foo/bar/baz.txt#baz.txt"), conf);

Accessing files

Now that we’ve cached our file, let’s access it:

// Direct access by name
File baz = new File("baz.txt");
// prints "true" since the file was found in the working directory
// We can also get a list of all cached files
Path[] cached = DistributedCache.getLocalCacheFiles(conf);
for (int i = 0; i < cached.length; i++) {
    Path path = cached[i];
    String filename = path.toString();

Backing up your Android apps to the cloud

Posted December 22, 2010

Android 2.0 introduced a feature that lets you backup your apps and their settings to the “cloud.” Normally this feature is accessible by going to Settings > Privacy and checking Back up my settings.

Unfortunately, it seems that many HTC phones hide this setting. If “Privacy” doesn’t appear in your settings then use this trick to get there.

  • Settings > Search > Searchable items
  • Enable “Settings”
  • Goto the “Home” screen
  • Search for “Privacy”
  • Check “Back up my settings”

It’s just too bad there’s no easy way to backup your apps manually, without resorting to an app on the market…

Social Distortion @ Roseland Ballroom / Nov 4, 2010

Posted November 18, 2010

Continue reading…

Bad Religion @ Irving Plaza / Oct 20, 2010

Posted November 02, 2010

Continue reading…

Running mlocate on Mac OS X

Posted June 25, 2010

After suffering through yet another morning of ‘find’ pegging my CPU and grinding my sad little laptop hard drive for a couple of hours, I finally decided to rectify the situation. Most Linux distributions these days ship mlocate by default, and so I decided to give it a go on OS X. Unfortunately, it doesn’t quite support OS X out of the box yet. I’ll spare you all the miserable details (it was an epic struggle getting this compiled), but I finally managed to get it working.

Here’s how:

# first we get the source and patch it
$ hg clone
$ cd mlocate
$ wget '' -O mlocate-mountlist-hg.2.diff
$ hg import mlocate-mountlist-hg.2.diff
# prep
$ cd ..
$ git clone git://
$ cd mlocate
$ ../gnulib/gnulib-tool --import
$ mv gnulib/lib/stat-time.h~ gnulib/lib/stat-time.h
$ cp ../gnulib/lib/canonicalize.h gnulib/lib/
$ autoreconf --install --force
# install
$ ./configure
$ make
$ sudo make install

Ok, now we have the binaries in place; we’re almost there! One last thing to do, and that’s create a new user account and group for mlocate. After running updatedb it will try to chown the locate db as the user ‘mlocate’, so we need to make sure this users exists. Unfortunately, there’s no adduser command in OS X and it’s a little bit of a pain, so I wrote this little script to take care of it for you. Grab it here.

Simply run it like so:

$ adduser 
username: mlocate
home dir [default=/var/empty]: 
group id (default=700):
user id (default=700):
real name (default=daemon): 
shell: (default=/usr/bin/false):

All done! Now let’s try updating the db –

$ sudo /usr/local/bin/updatedb

And that should be it. I still ran into a couple of problems with permissions so your best bet is to fix them every time you run updatedb (just stick it in cron):

$ chmod 664 /usr/local/var/mlocate/mlocate.db

Now wasn’t that easy?

PS. If you don’t have Mercurial (hg) or Git installed, get them here:

Updated (July 23, 2010): You may need newer versions of automake and autoconf as well. I had problems with 1.5 and 2.63 respectively but upgrading to 1.11 and 2.65 (with fink) worked fine.


Posted March 30, 2010

Today I’m launching my latest personal project,, to help Java developers find Maven artifacts. It’s also the first project I’ve finished* and released in a very long time.

The Problem

While Maven is, at it’s core, a build system, one of the most valuable features it offers is it’s centralized repository and transitive dependency management for your projects. You can simply include an artifact definition and Maven will, at build time, download and provide not only the selected artifact, but all it’s dependencies as well.

The problem is actually finding definitions for your artifacts, specifically when you already know the name of the library or project. You would think that a popular project like Spring would include this information somewhere on their download page, but for some odd reason they don’t and neither do most other opensource projects.

Existing attempts

As it turns out, I’m not the only one with this problem; I found at least 7 different sites that try to solve it. All of them appear to have taken a similar approach and yet offer wildly different results.

As a test, try searching for “struts” on each of them and see if you can locate the correct artifact id for the latest version.

Were you able to find Struts I wasn’t.

My Solution

Enter, my attempt at building a search engine for Maven artifacts. With srchmvn I offer two key improvements over existing solutions: a clean, speedy interface and an improved search algorithm. Specifically, rather than relying solely on a simple keyword search, I also take into account a given artifact’s “popularity” in much the same way that Google determines Page Rank. In this case, I’m using Maven’s own dependency system to determine which artifacts are referenced more often than others.

Now let’s try that search again, shall we?

Ah, much better. Exactly what I was looking for. The latest versions of both Struts 1.x and 2.x in the top 3 results, followed by several other popular artifacts.

Fun Bits

Other than scratching an itch I’ve had for the last couple projects I’ve worked on, building srchmvn gave me a chance to play with things I generally haven’t had a chance to use at work, like Rails 3 beta, nginx, Capistrano, and jquery.

I’ve had a love/hate relationship with Rails over the years and I was hoping to find out if any of the “hates” had been addressed in this latest version. I can’t say I’ve done much more than simply kick the tires on this thing, but so far I’m not blown away. The rails team has done a remarkable job in refactoring the monolithic codebase that rails had grown into, but my biggest complaint has always been the use of [generally] undocumented “magic” behaviors, and they’re still here, for better or worse.

Surprisingly, the area that gave me the most trouble, and where I spent the most time, was in trying to correctly parse the Maven project (POM) files. Of course, anyone who has ever had to write or maintain one will know exactly why that is… :-)

*Not Quite Finished

Ok, so it’s not quite finished. I have a few ideas on ways to improve various bits of the app, but more than anything, I’d appreciate some feedback. I’ve so far been able to quickly find every artifact I’ve thrown at it, but if you happen to find some cases where you get weird results, please let me know. I’ve got some ideas on improving the search as well, but unless I see a need to tinker with it, it probably won’t be changing for now.

Using MySQL with JRuby

Posted February 23, 2010

For some reason, I had a relatively hard time finding this info all in one spot, so here it is:

Using MySQL with JRuby is actually pretty easy (and no annoying arch issues on OS X! :-)

Install GEMs (DBI, JDBC driver, DBI adapter):

$ jgem install dbi jdbc-mysql dbd-jdbc

Then use it!

require 'dbi'
require 'jdbc/mysql'
dbh = DBI.connect('dbi:jdbc:mysql://localhost:3306/test', 'root', '', 
                  { "driver" => "com.mysql.jdbc.Driver" } )

Hooking app exit in Firefox extensions

Posted February 10, 2010

I’ve spent the last few days since joining Better Advertising working on a new feature for a Firefox extension called Ghostery. We’ll be announcing the new feature soon, but until then I thought I’d share some of what I’ve learned so far.

I’ve never worked on an extension before but as it turns out, it’s really quite easy to pick up; some fairly simple XML (aka XUL) for composing the UI and JavaScript for the rest. One of the trickier bits has to do with scope. After doing some testing I figured out that the entry point into an extension is via the browser window; that is, your extension code will be executed each time you open a new window and that means that all your code is basically scoped to a single window.

In developing the new Ghostery feature I needed a way to run some code when the user quits Firefox. Luckily, the extension architecture is extremely flexible (if poorly documented at times) and I didn’t have to jump through any hoops to do it. Almost anything, it seems can be either chained or hooked in some way. In this case, the nsIObserverService gives us access to the necessary shutdown event to which we can attach an observer using a simple interface.

The problem, then, was that since our code is run every time a new window is created, I needed a way to register the hook only once to avoid firing multiple times on exit. My first thought was to try to register the hook outside of the window scope (e.g. using a different chrome overlay) but that appeared to be a dead end. Using a globally scoped variable as a lock was also a dead end. In the end I settled on something I already knew how to use: preferences. Essentially, I created a simple lock around a preference variable which, while I don’t need to store it between sessions, is in fact a global storage area that can be accessed from different windows.

Check out the code below for an example implementation. I left out the actual preferences code since it’s not crucial to understanding the solution.