Monday, September 30, 2013

Private Git Repos on Dropbox

Github is great, but if you want to keep your code non-public they try to charge you 7$ a month. I think that capability is great for organizations or companies wanting private code repos on the Internet. But I'm already subscribed to enough $10ish a month services and I didn't want another one draining my bank account for no good reason. Bitbucket allows for unlimited private repos, and looks pretty awesome (thanks Phil). But after some playing around I accomplished the same by just using git and Dropbox. This is nice, because now that Dropbox has me hooked on file synchronization I've been planning to migrate off of it using rsync anyway if they break their service model or I just get motivated one of these weekends.

Here's how you can do the same. I'm assuming you have basic familiarity with git and Dropbox.

Go to your dropbox folder and init a bare repository:
cd ~/Dropbox/code/
git init --bare myproject.git

Now go to your local workspace directory and git init your code:
cd ~/workspace/myproject/
git init .
git add .
git commit -m "first commit"

Now add your dropbox as the origin repo:
git remote add origin ~/Dropbox/code/myproject.git
git push origin master

That's it, you're done. Now you have all the functionality of a code repo synced across all of  your machines to work on all that code that you're not ready to give to the world yet. If you're compiling or generating lots of data, don't forget to use your .gitignore file to prevent those being copied to your dropbox and using up your quotas. When you move to a different computer for the first time, git clone ~/Dropbox/code/myproject.git. From then on, just remember to do a push/pull when you start or finish working on the code.

If you like this tip, share my blog url.

Sunday, September 15, 2013

Circle of shop broom life

Four or five years ago I needed a dowel and cut it off the end of this broom. Today, I had an extra dowel, a short broom, and the skills to finger join them together. Circle of broom life complete.

Wednesday, September 11, 2013

Map is Map is Map

If you're involved or following developments in data science over the last few years, you've heard the term "map reduce". It's one of the new primitives that's a part of a few of the big data frameworks. When understanding how a technology works it helps to understand when two concepts are closely related. If you're still trying to wrap your brain around map reduce, here's a light-bulb moment that helped me: Map functions are found in a few programming languages like perl and python. Initially, I had thought of them as completely different from the map in map reduce. However at some point it clicked for me and I realized that in principle they are the same operation. All the tricks and idioms that I used with those map functions translates almost directly to things I can do with big data frameworks with the map reduce primitives. Map is map is map. In hindsight it's almost embarrassing that I didn't immediately connect these dots in my mind, but that's how these things sometimes go. Sometimes learning new technology is really as easy as making the mental connection with a concept you've already mastered. 

Perl map
Python map
Map Reduce

Wednesday, September 4, 2013

Three t-shirts this week

Woke up this Sunday morning (today) and had a choice of a few new t-shirts to wear while running errands and playing with my kid.

Last Saturday I ran and finished a Super Spartan in Virginia. I received two t-shirts there, one was a team shirt the 8 guys I ran it with had made, the other was a shirt for finishing. I was far from dominating, but did finish it. Was not ready for the 8 miles of obstacle race to be up and down ski slopes. Next time somebody talks me into one of these I'm going to train harder.

Wednesday of the next week I flew to UC Berkeley to attend their AMP Lab's Big Data Bootcamp, Amp Camp. I didn't have to climb any obstacles to get a t-shirt there, but the course material was excellent. Although I've been reading about and playing with components of their big data stack BDAS for a few months, I learned a great deal and was introduced to a few new pieces of software: BlinkDB and MLBase. Was more excited about the latter, making ML easier is more exciting to me than tuning query response time/accuracy at the DB level, but they both have potential. Spark/Shark I already consider successful, and I'm even more of a fan now. A lot of the code for all the software covered was written in the last few months. I expect the newer projects to change a lot, even more than the rest of the big data software has been the last year or so. They outlined a road-map, but it's a long way from here to there. I thought the best presentation was on Mesos. I'm excited to explore Chronos more which is written in Scala on top of Mesos. But since Chronos wasn't an AMP Lab piece of software, it was only mentioned and not covered in detail.

They provided a 5 node EC2 cluster for each attendee in the class, for which they emailed us private keys. I thought this was a great way to step through the exercises. During previous training I've taken on this I've had to bring a beefy laptop and spin up enough virtual machines to emulate a real cluster. Although it does require considerably more capital. On EC2 a 5 large nodes cost about $2.88 per hour, which is more expensive than free but well worth the money to crunch on 20G+ of data as part of the training exercises.

In the airport on the way back home I managed to duplicate most of the cluster provided during the class by stepping through this article on Amazon: http://aws.amazon.com/articles/4926593393724923
One hiccup was the elastic-mapreduce program is not compatible with the latest version of Ruby. To make it more challenging, Ubuntu 13.04 apt-get doesn't work for installing the Ruby Version Manager rvm. So I had to install rvm from binary, then rvm --use 1.8.7 or something and then the Amazon map-reduce script worked great. Kind of. The first cluster I tried bringing up failed when hadoop didn't start. On my next try, it worked. This is all still alpha software... Impressive nonetheless since I was able to get the functionality we covered in class on my own with only a bit of mucking around. That's kind of like a spartan race obstacle...

I'm wearing my AMP Camp t-shirt today...