Not Just Hadoop: NoSQL in the Enterprise at Strata NYC 2012

At the NYC Strata & Hadoop World conference I presented on ‘Not Just Hadoop: NoSQL in the Enterprise’. Robert Lancaster from Orbitz joined me on stage for the final presentation of the Bridge to Big Data track. Mark Madsen did a great job moderating the session and kept the energy high the entire day. Robert shared how Orbitz uses MongoDB with Apache Hadoop to provide real time rates. This is my second time presenting at Strata’s Big Data conference.

Backup, Replication and Disaster Recovery

One of the most common concerns people have is how to ensure that their application is safe, secure and available in the event of an emergency. Often I have found that people are mistakenly believe that they are protected when in fact they often have ignored potential scenarios. The principles explained apply equally well in RDBMSs, MongoDB and other databases. Potential scenarios to protect against Drive failure Machine failure Switch failure Power circuit failure Data center failure Intrusion Fat fingers Programmer error Raid To prevent drive failure use multiple drives in a single machine for high availability.

Where have all the good databases gone?

Perhaps you’ll recognize these words, “About five years ago I started to notice an odd thing. The products that the database vendors were building had less and less to do with what the customers wanted. … So, what is this growing disconnect?” Those words were written in 2004 by Adam Bosworth, a veteren of Microsoft, Google and BEA. In the 7 years since things have only gotten worse. Open source products came to maturity (if you can call it that), but none improved on any of the challenges Bosworth outlines.

Easy bash scripting with shflags

One of the most frustrating things about bash scripts is how challenging it is to create unix style executables. You know, the ones where you can pass in -h or –help and see the set of options for the program. Up until now this has been a very manual process in bash, but no longer. Enter the shflags project from Kate Ward where a bash library takes care of all the nasty work and producing an elegant way to add option (or argument) support to your scripts.

Unix Jobs Management

Every self respecting linux, mac os X or *nix user should have a solid handle on managing jobs in unix. The following will explain how to run tasks in the background, bring tasks to the foreground, background already running tasks and keeping a task running while logged out. Run a task in the background All you need to to is follow a command with the ‘&’ character. Pretty simple.

Human readable du sorted by size

du is the *nix command for disk usage”). It tells you how much space everything in the given directory is taking up. GNU du introduced a handy option -h making it human readable, or showing sizes using K, M, G rather than bytes. Unfortunately this makes it not sortable numerically. Here’s how to sort du by size and keep it as human readable. Insert the following function into your .

Benchmarking Cloudfront (and S3)

Amazon has done it again bringing another computing service to the masses. This time it’s the Content Delivery Network or CDN. Cloudfront is a direct competitor to other popular CDNs such as Akamai. While Akamai requires a fairly substantial amount of traffic to become a customer, Cloudfront doesn’t. It follows all of Amazons, pay for what you use mentality. This means that everyone can benefit from incorporating Cloudfront into their blog, site, store, etc.

Using the right keys

HeyThereSpaceman Today I was visiting a friends office and like many offices in NYC they have a shared bathroom in the hall for the entire floor. In this building it had five buttons on the door that when pressed in the correct order unlocked the door. A simple password. In our office we have a similarly shared bathroom, but instead of a password, we have a physical key required to unlock the door.

REST vs SOAP, the difference between soap and rest

Someone asked me a question today “Why would anyone choose SOAP (Simple Object Access Protocol) instead of REST (Representational State Transfer)?” My response: “The general rule of thumb I’ve always heard is ‘Unless you have a definitive reason to use SOAP use REST’”. He asked “what’s one reason?” I thought about it for a minute and honestly answered that I haven’t ever come across a reason. My background is building great internet companies.

7 security practices you need to follow

Some of this may seem like a broken record, yet every single time you hear about a bank losing millions of customer data, or a company having a security breach they consistently have failed to implement and enforce the most basic security practices. Here are 7 simple security practices that you cannot afford to not follow. 1. Secure pass phrases Throw away the notion of a password. Pass phrases consisting of multiple words and symbols are considerably more secure and easy to remember.

Secure Automated, Key Based SSH

SSH is great and secure… Unless you need to automate it. Then it sucks because your only options are to create a passwordless key, or login add your key to ssh-agent, stay logged in forever. Here’s a quick guide to having the best of both worlds. A Secure SSH Connection that can be used in automated scripts. ( with the single catch, that upon reboot you need to re-enter your key’s password ) Create and Distribute your Key

Installing Git on a Shared Host

Git is a fantastic tool and is very useful for deployment. If you can’t install git system wide or don’t want to mess with installing it on the entire system here is an easy way to install it for a single user. This also works well on Mac OS X where installing git is more challenging than necessary. Script included I used this script to install git on 1and1.

Mastering the Command Line

If you use *nix, no doubt you’ve spent some time on the command line. Here are a few of the most helpful tricks you can use in the bash shell to really optimize your time, impress your friends, and make everyone else feel inferior… not to mention become more productive. People familar with the command line can usually work considerably faster (for most tasks) than you can through a gui.

Windows 7 launch notes

I was fortunate to be able to attend the Microsoft Launch Developer Preview meeting in NYC. Microsoft is holding these all over the country to prepare IT and developers for the upcoming launch. Overall it was a good meeting and Microsoft is delivering a great product. More importantly they have a really good chance of overcoming the bad taste of vista and emerge as a innovation leader. These are my notes taken during the meeting with some of my insights.

Using Nginx as a Load Balancer

Nginx is a relatively new web server that has a light footprint and relatively easy configuration. The following configuration demonstrates how to properly use nginx as a load balancer in front of two web servers. pid /var/run/; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 75; proxy_buffering off; log_not_found off; error_log /dev/null; access_log off; proxy_connect_timeout 20; client_header_timeout 60; client_body_timeout 60; send_timeout 60; server { listen 127.

Choosing a Hosting Partner

Image via Wikipedia No question about it, choosing a good hosting partner is one of the most important decisions a CTO / CIO can make, especially in a .com company. I recently had to choose a hosting partner for the new Since the space changes so rapidly the last provider you used may no longer be the best fit for you now. Here’s 10 criteria you need to evaluate when analyzing a hosting partner.

Setting up Subversion with multiple access methods

One thing that makes subversion such a powerful revision system is it’s ability to permit multiple methods of access. Https, WebDAV, SSH and svnserve. In spite of svn’s ability to support multiple access methods, doing so simultaniously can be quite challenging. Typically one will run into permission issues as the http(s) access will all be written to the filesystem as the user running the webserver. The SSH access will all write to the filesystem under each users given account.

Scaling Web Sites (LAMP) : Top Resources

Image via Wikipedia Luckily it’s 2009 and there have been a bunch of successful websites that have had to deal with large scalability challenges. Many have been kind enough to share their knowledge with the world. Here is a list of the best books, articles, presentations and practices from the likes of Twitter, Facebook, Flickr and more. Books Building Scalable Web Sites Building, scaling, and optimizing the next generation of web applications by Cal Henderson

Backup Your Files

One of the worst experiences you can have as a computer operator is to realize you (or something else) just did something and wiped out your files. The purpose of this article is to show you how to automatically backup your files often and automatically. I use this setup to backup my documents every hour (I save more often then that). This gives me hourly versions of all my files I am working on.

Backing up MySQL

Image via Wikipedia I don’t know very many people that haven’t been devastated by the loss of data… Yet I am baffled that millions of professional IT workers still ignore backing up their data. Since computers are great at doing repetitive things like backups.. why not spend 20 minutes setting up your machine to backup your files for you. This guide will be specific to mysql to create a local copy of the backup.

Using SVK to Increase Productivity

SVK is a client for SVN built using perl. It makes a number of improvements over the standard svn client, while retaining much of the same feel. It works with the standard Subversion server and works perfectly in an environment with some users using svn and some using svk on the client side. It provides a number of sizable advantages over the standard svn client and is a must have for any development project.

Implementing a Corporate Wiki

It seems all of a sudden, the two buzz words in the corporate IT world are wiki and blog. Corporate wikis are emerging as cheap, intelligent, flexible systems for shared-document collaboration and content management. Because they are browser based, wikis are quite easy to implement and deploy. The wiki works well in the corporate world as it solves two problematic areas, the need for internal collaboration and document management.

Be more productive using GNU Screen

GNU Screen Image via Wikipedia Despite living in the age of multicore processors, GUI everything and mountains of ram, I continually find myself more productive with a terminal open. Especially when that terminal is running GNU Screen. About GNU Screen GNU Screen is a free terminal multiplexer developed by the GNU Project. It allows a user to access multiple separate terminal sessions inside a single terminal window or remote terminal session.