DevOps: It’s the culture, stupid!

Posted by chetan on October 26, 2013

Last week saw the return of DevOpsDays to New York and like many who attended, I went into day one without a solid definition for, or real understanding of, what “DevOps” actually means. Does it mean wearing both dev and ops hats? Is it a new team outside of the existing dev and ops teams? Is it a tool?

Continue reading…


Profiling Java programs on OS X

Posted by chetan on May 18, 2011

Sounds easy, doesn’t it? Well it actually is quite simple but the error messages along the way can really trip you up!

My first attempt at profiling a bit of code was to use the full-boat Eclipse stack: Eclipse Test & Performance Tools Platform Project! Well, what they don’t tell you anywhere on the project page is that it’s only supported on Windows and Linux. A Mac port was started sometime around 2004 and never completed. Yeah, it’s been that long!

And so this brings us to Apple’s Shark: an extremely barebones, no-frills profiler, but, what the heck, it’s free. For the basics on using this tool with your Java app, this great post has all the details. There’s just one catch: 64-bit support. There is none. If you’re on a 64-bit stack and you try to run it, you’ll see something like the following:

$ java -agentlib:Shark -cp foo.jar com.foo.Bar
Error occurred during initialization of VM
Could not find agent library: libShark.jnilib (searched /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Libraries:/System/Library/Java/Extensions:/Library/Java/Extensions:.)

Er, what? Now let’s see here…

$ ls -al /System/Library/Java/Extensions/libShark.jnilib 
-rwxr-xr-x 1 root wheel 50352 Oct 24  2010 /System/Library/Java/Extensions/libShark.jnilib

Well, that’s odd. But it said it looked there, right?! Well, as it turns out, the Shark JNI library only supports 32-bit JVMs. So finally, we arrive at the following:

$ java -d32 -agentlib:Shark -cp foo.jar com.foo.Bar
2011-05-18 23:05:20.756 java[84473:20f] Shark for Java enabled...

Great sucess! And now, back to work, just a little less sane…


An easy way to browse HDFS clusters

Posted by chetan on December 30, 2010

After spending the better part of a day trying to get HDFS to mount on my Mac, I finally gave up. Luckily, I was able to find MuCommander, a cross-platform Java port of the Norton Commander of old, and as luck would have it, it supports HDFS in the latest version! Very handy for quickly browsing your HDFS clusters if you can’t mount it or don’t have the Hadoop toolset installed.


Distributing JARs for Map/Reduce jobs via HDFS

Posted by chetan on December 29, 2010

Hadoop has a built-in feature for easily distributing JARs to your worker nodes via HDFS but, unfortunately, it’s broken. There’s a couple of tickets open with a patch again 0.18 and 0.21 (trunk) but for some reason they still haven’t been committed yet. We’re currently running 0.20 so the patch does me no good anyway. So here’s my simple solution:

I essentially copied the technique used by ToolRunner when you pass a “libjars” argument on the command line. You simply pass the function the HDFS paths to the JAR files you want included and it’ll take care of the rest.

Example usage:

public int run(String[] args) throws Exception {
 
    JobConf job = new JobConf(getConf());
 
    // ... job setup ...
 
    NerfUtils.addJarsToJobClasspath(job, 
        new String[] { 
            "/libraries/java/solr-commons-csv-1.4.1.jar" });
 
    // ... more job setup ...
 
    return JobClient.runJob(job).getJobState();
}

Might not be the prettiest or best solution but it works for me!


Using Hadoop’s DistributedCache

Posted by chetan on December 28, 2010

Using Hadoop’s DistributedCache mechanism is fairly straightforward, but as I’m finding is common with everything-Hadoop, not very well documented.

Adding files

When setting up your Job configuration:

// Create symlinks in the job's working directory using the link name 
// provided below
DistributedCache.createSymlink(conf);
 
// Add a file to the cache. It must already exist on HDFS. The text
// after the hash is the link name.
DistributedCache.addCacheFile(
    new URI("hdfs://localhost:9000/foo/bar/baz.txt#baz.txt"), conf);

Accessing files

Now that we’ve cached our file, let’s access it:

// Direct access by name
File baz = new File("baz.txt");
// prints "true" since the file was found in the working directory
System.out.println(baz.exists()); 
 
 
// We can also get a list of all cached files
Path[] cached = DistributedCache.getLocalCacheFiles(conf);
for (int i = 0; i < cached.length; i++) {
    Path path = cached[i];
    String filename = path.toString();
}

Using MySQL with JRuby

Posted by chetan on February 23, 2010

For some reason, I had a relatively hard time finding this info all in one spot, so here it is:

Using MySQL with JRuby is actually pretty easy (and no annoying arch issues on OS X! :-)

Install GEMs (DBI, JDBC driver, DBI adapter):

$ jgem install dbi jdbc-mysql dbd-jdbc

Then use it!

require 'dbi'
require 'jdbc/mysql'
dbh = DBI.connect('dbi:jdbc:mysql://localhost:3306/test', 'root', '', 
                  { "driver" => "com.mysql.jdbc.Driver" } )

Hooking app exit in Firefox extensions

Posted by chetan on February 10, 2010

I’ve spent the last few days since joining Better Advertising working on a new feature for a Firefox extension called Ghostery. We’ll be announcing the new feature soon, but until then I thought I’d share some of what I’ve learned so far.

I’ve never worked on an extension before but as it turns out, it’s really quite easy to pick up; some fairly simple XML (aka XUL) for composing the UI and JavaScript for the rest. One of the trickier bits has to do with scope. After doing some testing I figured out that the entry point into an extension is via the browser window; that is, your extension code will be executed each time you open a new window and that means that all your code is basically scoped to a single window.

In developing the new Ghostery feature I needed a way to run some code when the user quits Firefox. Luckily, the extension architecture is extremely flexible (if poorly documented at times) and I didn’t have to jump through any hoops to do it. Almost anything, it seems can be either chained or hooked in some way. In this case, the nsIObserverService gives us access to the necessary shutdown event to which we can attach an observer using a simple interface.

The problem, then, was that since our code is run every time a new window is created, I needed a way to register the hook only once to avoid firing multiple times on exit. My first thought was to try to register the hook outside of the window scope (e.g. using a different chrome overlay) but that appeared to be a dead end. Using a globally scoped variable as a lock was also a dead end. In the end I settled on something I already knew how to use: preferences. Essentially, I created a simple lock around a preference variable which, while I don’t need to store it between sessions, is in fact a global storage area that can be accessed from different windows.

Check out the code below for an example implementation. I left out the actual preferences code since it’s not crucial to understanding the solution.


Pointless rewrite? Probably.

Posted by chetan on August 06, 2008

Del.icio.us (sorry, it’s just plain old “Delicious” now) 2.0 finally launched a few days ago and the response so far has been mixed. But now that the dust has settled some, it’s time to think about just how we got here and if it was really worth all the trouble.

According to the official blog post, the new and improved Delicious brings us speed, usability, and oh so good looks among other features and it was a long time in the making. The Yahoo acquisition was announced on Dec 9, 2005 and the new site finally went live a little over two and a half years later on July 31, 2008. So why did it take them so long?

A key change as a result of the Yahoo! acquisition was their decision to rewrite the whole thing in PHP using the Symfony framework, for no other reason than that it’s the current corporate standard at Yahoo!. Oh, and, coincidentally, Yahoo! Bookmarks was also built on PHP+Symfony.

So now it starts to make a bit of sense: you take a system being actively used by millions of users around the world and you start over from scratch with the goal of building it bigger and better, toss in a couple of hot buzzwords to meet Web 2.0 compliance guidelines, and before you know it 2 years have gone by.

I find it very hard to believe that with all the talent and the thousands of man years combined software development experience over there, that no one understands the pros and cons of rewriting vs refactoring a code base, especially given the enormous success of the service and the relatively trouble-free history as compared to, say, Twitter.

At the same time, I understand it all too well. From where I sit, and having been involved in a similar situation in the past as well as with my current employer, the decision to move to PHP was clearly not based on a cost/benefit analysis of maintaing the current system. In fact, I wonder if they even understood what the real problems, if any, were with the existing system before deciding to not just rewrite it, but write it in another language.

Moving to another language is a pretty drastic step to take and will rarely solve your problem.


SimpleDB: MapReduce for the masses?

Posted by chetan on December 16, 2007

On Thursday, Amazon announced SimpleDB, “a web service for running queries on structured data in real time.” As many others have noted this more or less completes the cloud computing stack that Amazon has been steadily building, ever since they launched the Simple Storage Service (S3), early last year.

Where their earlier releases (S3, Elastic Compute Cloud [EC2], Flexible Payments, Mechanical Turk) commoditized much of the infrastructure required for building scalable applications, SimpleDB (SDB) and the earlier Simple Queue Service (SQS) are bringing cutting edge technologies and design patterns to the masses. First they made it cheap and easy to have a cluster; now they’ve made it cheap and easy to use a cluster! Amazing.

What’s even more startling is just how much Amazon gets it, and just how far off base Salesforce was earlier this year when they announced Force.com, as a “platform as a service”.

Then again, maybe they’re not even competing services at all.

Amazon is clearly providing services targeted towards developers and entrepreneurs with the goal of enabling them to explore new and innovative ideas by lowering the cost of entry. They’re providing the basic building blocks for developers to do exactly what they have done (and spent the last 10 years building) and they’re providing it at a very competitive price.

The Force.com proposition is different: they market to the same target audience but the selling point is not their tools — they offer a run of the mill JAVA-based platform — their selling point is the market that they can deliver. A built-in customer base of salesforce.com [enterprise] users and a marketplace for connecting those users with the applications they want.

But I digress. Amazon’s SimpleDB is an important, if small, step towards moving the web to column-oriented databases like Google’s BigTable, or the relatively unknown, opensource Hadoop project, now largely sponsored by Yahoo!.

What sticks out to me most, however, is the choice of name. It’s called “SimpleDB” and yet it’s neither a “database”, as most would understand it, nor is the concept of a column-oriented database “simple”; it requires an orthogonal way of thinking. What is clear, however, is that the choice of name was a very deliberate move by Amazon to market this technology to the masses. It will take some time for developers to come around and see the light, but when they do, we’re in for another huge advance in dynamic web applications.


You call that content management?

Posted by chetan on February 07, 2007

A large part of my job here at ICAR has been wrestling with various so-called content management systems (CMS). In an effort to build various applications I’ve been evaluating many popular opensource CMS projects and I’ve run into the same basic problem with just about all of them: I don’t want a blog, I want content management. They all claim to be flexible systems with all the latest doodads but in the end, they’re just glorified blogs. Case in point, almost every system sets itself up as a blog out of the box, and, in general, that’s the most complete part of the system. Other areas are sorely lacking.

Continue reading…