At the NYC Strata & Hadoop World conference I presented on ‘Not Just
Hadoop: NoSQL in the Enterprise’. Robert
Lancaster from
Orbitz joined me on stage for the final
presentation of the Bridge to Big
Data track.
Mark Madsen did a great job moderating
the session and kept the energy high the entire day. Robert shared how
Orbitz uses MongoDB with Apache Hadoop to provide real time rates. This
is my second time presenting at Strata’s Big Data conference. There are
few things I enjoy more in my work than presenting to an engaged
audience full of good questions which is exactly what I found at Strata.
While Hadoop is the most well-known technology in big data, it’s not
always the most approachable or appropriate solution for data storage
and processing. In this session you’ll learn about enterprise NoSQL
architectures, with examples drawn from real-world deployments, as well
as how to apply big data regardless of the size of your own enterprise.
Big data for the rest of
us
from Steve Francia
As the talk was going on the following tweets mentioned some highlights
.@spf13: “Moore’s Law applies to more
than just CPUs.It also applies to data”
#structureconf
— Matt Asay (@mjasay) October 23,
2012
.@spf13: “For 10+ years, Big Data =
‘custom sw w/ big hw.’” In the past few years open source has made Big
Data accessible to the rest of us
— Matt Asay (@mjasay) October 23,
2012
Learning from @spf13 of
@10gen how MongoDB enables
#BigData (for the rest of
us) #strataconf
— Tamara Dull (@tamaradull) October 23,
2012
“What is BIG? What is big today is normal tomorrow.”
@spf13
#strataconf
— Tamara Dull (@tamaradull) October 23,
2012
https://twitter.com/markmadsen/status/260844790348918784
Related Posts
Presentation Transcript
- Not Just Hadoop, NoSQL in the Enterprise
- Talking about What is BIG Data BIG Data & you Real world examples
The future of Big Data
- @spf13 AKA Steve Francia 16+ years building the internet Father,
husband, skateboarder Chief Evangelist @responsible for
drivers,integrations, web & writing
- What isBIG data ?
- 2000 Google Inc Today announced it has released the largest
search engine on theInternet. Google’s new index, comprising more
than 1 billion URLs
- 2008 Our indexing system for processing links indicates that we
now count 1 trillion unique URLs (and the number of individual
webpages out there is growing by several billion pages per day).
- An unprecedented amount of data is being created and is
accessible
- Data Growth
- Truly Exponential GrowthIs hard for people to grasp. A BBC
reporter recently: “Your current PC is more powerful than the
computer they had on board the first flight to the moon”.
- Moore’s LawApplies to more than just CPUs Boiled down it is that
things double at regular intervals. It’s exponential growth.. and
applies to big data
- How BIG is it?
- How BIG is it? 2008
- How BIG is it? 20072008 2005 2006 2003 2004 2001 2002
- We’ve had BIG Data needs for a long time. In 1998 Google won the
search race through custom software & infrastructure
- We’ve had BIG Data needs for a long time. In 2002 Amazon again
wrote custom & proprietary software to handle their BIG Data needs
- We’ve had BIG Data needs for a long time. In 2006 Facebook
started with off the shelf software, but quickly turned to
developing their own custom built solutions
- Ability to handle big data is one of the largest factors in
determining winners vs losers.
- For over a decade BIG Data = custom software
- Why all this talk about BIG Data now?
- In the past fewyears open source software emerged enabling ‘us’
to handle BIG Data
- The Big Data Story
- Is actually two stories
- Doers & Tellers talking about different
things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
- Tellers
- Doers
- Doers talk a lot more about actual solutions
- They know it’s a two sided story: Storage & Processing
- Take aways: MongoDB and Hadoop, MongoDB for storage &
operations. Hadoop for processing & analytics
- How MongoDB enables big data • Flexible schema• Horizontal scale
built in & free•Operates at near speed of memory• Optimized for
modern apps
- MongoDB @ Orbitz Rob Lancaster October 23 | 2012
- Use Cases • Hotel Data Collection • Hotel Rate Feed: • Supply
hotel rates to Google for their Hotel Finder • Uses MongoDB: –
Maintain state of data sent to Google – Identify changes in rates as
they occur • Makes use of complex querying, secondary indexing •
EasyLoader: • Feature allowing suppliers to easily load inventory to
Orbitz • Uses MongoDB to persist all changes for auditing purposes
29
- Hotel Data Collection • Goals: • Better understand performance
of our hotel path • Optimize our hotel rate cache • Methods: •
Capture every action performed by our hotel search engine. • Persist
this data for long periods. • Challenges: • Need high performance
capture. • Scalable, inexpensive storage. 30
- Requirements Collection Storage & Processing • High write
throughput • High data volume • 500 servers • ~500 GB/day • > 100
million documents/day • 7 TB/month compressed • Flexibility •
Scalable • Complex extendable documents • Inexpensive • No forced
schema • Proximity with other data • Scalability • Simplicity 31
- The Solution • Utilize MongoDB as a collector: • ~ 500 clients
• Utilize unsafe writes for high throughput • Heterogeneous
documents • New collection for each hour • HDFS for storage &
processing: • Data moved via M/R job: – One job per collection – One
mapper per MongoDB instance • Additional processing and analysis by
other jobs 32
- Challenges & Conclusions •Challenges? None really. •Achieved a
robust and simple solution •MongoDB has been entirely worry free •
Very high write throughput • Reads (well, full collection dumps
across the wire) are slower 33
- The Futureof BIG data
- What is BIG? BIG today isnormal tomorrow
- Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120
250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 Millions of URLs
- Data Growth 9,00090006750 4,4004500 2,1502250 1,000 500 55 120
250 1 4 10 24 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010 2011 Millions of URLs
- How BIG is it?
- How BIG is it? 2012
- How BIG is it? 20112012 2009 2010 2007 2008 2005 2006
- 2012 Generating over 250 Millions of tweets per day
- MongoDB enables us to scale with the redefinition of BIG. Tools
like Hadoop are enabling us to process thenew BIG.
- MongoDB iscommitted to working with best data tools including
Hadoop, Storm,Disco, Spark & more