This tutorial provides you a detailed introduction to big data and big data history. We will also discuss the big data technologies like – Hadoop, Apache Spark, and Flink. Various real life use cases of big data are also discussed in this tutorial.
2. Big Data – Introduction
Day by day the big world of internet is creating 2.5 quintillion bytes of data on regular basis according to the statistics the percentage of data that has been generated from last two years is 90%. This data comes from many industries like climate information collects by the sensor, different stuff from social media sites, digital images and videos, different records of the purchase transaction. This data is big data.
3. Big Data History
This section of tutorial gives you a clear picture of big data history-
- Research & Development in these Big Data native businesses are very close, and very close to the research and open source community.
- Each paper on the cost-efficient innovative information processing techniques has been accompanied by open source adoption within an ever growing ecosystem called Hadoop.
Two major milestones in the development of Hadoop also added confidence into the Power of open source and Big Data Technologies. Only two years after its first release, in 2008, Hadoop won the terabyte sort benchmark in big data history. This is the first time that either a Java or an open source program has won. In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage for their social messaging platform.
4. Facts and Figures
- 91% of leaders belongs to marketing believe successful brands use customer data to drive business decisions.
- The overall percentage of the world’s total data has been created just within the past two years is 90%.
- 87% companies agree capturing and sharing the right data is important to effectively measure ROI in their own company.
- 500 million calls record daily analyzed by IBM to predict the customer’s churns.
- 350 billion annual meter readings converted by IBM through Big Data to better predict power consumption.
- On Facebook, 30 billion pieces of content are sharing by users in each month.
5. Big Data Technologies
While the topic of Big Data is broad and encompasses many trends and new technology developments, the top emerging technologies are given below that are helping users cope with and handle Big Data in a cost-effective manner.
5.1. Apache Hadoop
The backbone of every Big Data solution, It is anticipated that world’s 75% of the data will be stored in Hadoop by 2017. Learn more about Apache Hadoop
5.2. Apache Spark
Apache Flink is called 4G of Big Data. It is an open source framework that can handle streaming as well as batch data. Learn more about Apache Flink
6. Figures Of Big Brands
Because of more than 950 million users, Facebook is collecting a huge amount of data. Every time whenever you are clicking a notification, visiting a page, uploading a photo, or checking out a friend’s link, you’re generating data for the company to track various records.Users shared 2.5 billion content items daily (status updates + wall posts + photos + videos + comments). 300 million photos are uploaded by users per day. 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language in every 30 minutes. 70,000 queries executed on these databases per day. 500+terabytes of new data ingested into the databases every day.
Twitter – the second biggest social network generating less social data as compared to dating app, Tinder. Tinder users swipe 290,278 matches per minute – that is potentially 35 million lovers per hour! on the other hand, twitter users generate 347,222 Tweets each minute – or 21 million Tweets per hour.
The video is a big part of our everyday lives on the internet, and although Facebook is also trying really hard to fit in and it is succeeding, with over 3 billion video views per day but YouTube is still the king. Every minute users are uploading over 300 hours of new video on YouTube.
7. Big Data Use-cases
- 360-degree complete view of customer
- Risk and fraud monitoring and management
- Real-time transaction tracking and analytics
- Disease diagnosis analysis
- Medical record text analysis
- Genomic analytics
Follow this guide to get more big data use cases in healthcare.
- Real-time Call detail record CDR processing and analysis
- Customer profile monetization and analysis
- Real-time network element monitoring
- Real-time Network fault analysis
- IVR calls analytics
- Real-time ad matching, analysis, and targeting
- Website analytics and conversion tracking
- Cross-channel marketing
- Customer Clustering and Segmentation
- Click-stream analysis
- Market Basket Analytics
- Real-time Recommendation
- Sentiment Analysis
To learn more use cases of big data in the Retail industry follow this guide.
- Multimodal surveillance
- Real-time Cybersecurity detection
- Energy and utilities
- Smart meter analytics
- Asset management
To get deep dive into Big data real life use cases follow this comprehensive guide.
8. Related Links
- Learn Big Data and Hadoop
- Install Cloudera Hadoop CDH5 on Ubuntu
- Next Gen Big Data Tool – Apache Spark