Data Science Tutorial – Introduction to Data Science for Python
1. Data Science Tutorial – Objective
This Data Science tutorial aims to guide you to the world of data science and get you started with the basics like what is Data Science, History of Data Science, and Data Science Methodologies. Here, we will cover the Data Science Applications, a difference between Business Intelligence and Data Science. Along with this, we will discuss Life-Cycle of Data Science and Python Libraries.
So, let’s begin Data Science Tutorial.
2. What is Data Science?
Before we start the Data Science Tutorial, we should find out what data science really is.
Data science is a way to try and discover hidden patterns in raw data. To achieve this goal, it makes use of several algorithms, machine learning(ML) principles, and scientific methods. The insights it retrieves from data lie in forms structured and unstructured. So in a way, this is like data mining. Data science encompasses all- data analysis, statistics, and machine learning. With more practices being labelled into data science, the term itself becomes diluted beyond usefulness. This leads to variation in curricula for introductory data science courses worldwide.
3. Data Science Tutorial – History
Through the recent hype that data science has picked up, we observe that it has been around for over thirty years. What one we could use as a synonym for practices like business analytics, business intelligence, or predictive modeling, now refers to a broad sense of dealing with data to find a relationship within it. To quote a timeline, it would go something like this:
a. In 90’s
- 1960- Peter Naur uses the term as a substitute for computer science.
- 1974- Peter Naur publishes Concise Survey of Computer Methods, uses a term in a survey of contemporary data processing methods.
- 1996- Biennial conference in Kobe; members of the IFCS (International Federation of Classification Societies include the term in the conference title.
- 1997- November- Professor C.F. Jeff Wu delivers inaugural lecture on the topic “Statistics=Data Science?”.
b. In 2000’s
- 2001- William S. Cleveland introduces data science as an independent discipline in article Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics.
- 2002- April- The ICSU (International Council for Science): Committee on Data for Science and Technology (CODATA) starts Data Science Journal- this publication is to focus on issues pertaining to data systems- description, publication, application, and also legal issues.
- 2003- January- Columbia University publishes journal The Journal of Data Science- a platform that allows data workers to exchange ideas.
- 2005- National Science Board publishes Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century- this provides a new definition to the term “data scientists”.
- 2007- Jim Gray, Turing awardee, envisions data-driven science as the fourth paradigm of science.
- 2012- Harvard Business Review article attributes coinage of the term to DJ Patil and Jeff Hammerbacher in 2008.
- 2013- IEEE launches a task force on Data Science and Advanced Analytics; first European Conference on Data Analysis (ECDA)organized in Luxembourg, European Association for Data Science (EuADS) comes into existence.
- 2014- IEEE launches first international conference International Conference on Data Science and Advanced Analytics; General Assembly launches student-paid Bootcamp, The Data Incubator launches data science fellowship for free.
- 2015- Springer launches International Journal on Data Science and Analytics.
4. Data Science Tutorial – Methodologies
In this Data Science Tutorial, we will cover the following Methodologies in data Science:
a. Machine Learning for Pattern Discovery
With this, clustering comes into play. This is an algorithm to use to discover patterns; an unsupervised model. When you don’t have parameters on which to make predictions, clustering will let you find hidden patterns within a dataset.
One such use-case is to use clustering in a telephone company to determine tower locations for optimum signal strength.
b. Machine Learning for Making Predictions
When we have the data we need to train our machine, we can use supervised learning to deal with transactional data. Making use of machine learning algorithms, we can build a model and determine what trends the future will observe.
c. Predictive Causal Analytics
Causal analytics lets us make predictions based on a cause. This will tell us how probable an event is to hold occurrence in future. One use-case will be to perform such analytics on payment histories of customers in a bank. This tells us how likely customers are to reimburse loans.
d. Prescriptive Analytics
Predictive analysis will prescribe your actions and the outcomes associated with those. This intelligence lets it take decisions and modify those using dynamic parameters. For a use-case, let us suggest the self-driving car by Google. With the algorithms in place, it can decide when to speed up or slow down, when to turn, and which road to take.
5. Data Science Applications
Let’s see some applications in this Data Science Tutorial:
a. Image Recognition
Using the face recognition algorithm of data science, we can get a lot done. Did Facebook ever suggest people tag in your pictures? Have you tried the search-by-image feature from Google? Do you remember scanning a barcode to log in to WhatsApp Web using your smartphone?
b. Speech Recognition
Siri, Alexa, Cortana, Google Voice all make use of speech recognition to understand your commands. Attributing to issues like different accents and ambient noise, this isn’t always completely accurate, though intelligible most of the time. This facilitates luxury like speaking the content of a text to send, using your virtual assistant to set an alarm, or even use it to play music, inquire about the weather, or make a call.
c. Internet Search
Search engines like Google, Duckduckgo, Yahoo, and Bing make good use of data science to make fast, real-time searching possible.
d. Digital Advertisements
Data science algorithms let us understand customer behaviour. Using this information, we can put up relevant advertisements curated for each user. This also applies to advertisements as banners on websites and digital billboards at airports.
e. Recommender Systems
Names like Amazon and Youtube will throw in suggestions about similar products aside or below as you browse through a product or a video. This enriches the UX(user experience) and helps retain customers and users. This will also take into account the user’s search history and wishlist.
f. Price Comparison Websites
Websites like Junglee and PriceDekho let us compare prices for the same products across different platforms. This facility lets you make sure you grab the best deal. These websites work in the domains of technology, apparel, and policy among many others, and use APIs and RSS feeds to fetch data.
As a player levels up, a machine learning algorithm can improve or upgrade itself. It is also possible for the opponent to analyze the player’s moves and add an element of difficulty to the game. Companies like Sony and Nintendo make use of this.
h. Delivery Logistics
Freight giants like UPS, FedEx, and DHL use practices of data science to discover optimal routes, delivery times, and transport modes among many others. A plus with logistics is the data obtained from the GPS devices installed.
i. Fraud and Risk Detection
Practices like customer profiling and past expenditures let us analyze whether there will be a failure. This lets banks avoid debts and losses.
6. Business Intelligence vs Data Science
Here, in this part of Data Science Tutorial, we discuss Data Science Vs BI. Business intelligence and data science aren’t exactly the same thing.
- BI works on structured data; data science works on both- structured and unstructured data.
- Where BI focuses on the past and the present, data science considers the present and the future.
- The approach to BI is statistics and visualization; that to data science is statistics, machine learning, graph analysis, and NLP.
- Some tools for BI are Pentaho, Microsoft BI, and R; those for data science are RapidMiner, BigML, and R.
7. Data Science Tutorial – Life-Cycle
The journey with data science goes through six phases-
Before anything else, you should understand what the project requires. Also consider the specifications, the budget needed, and priorities. This is the phase where you frame the business problem and form initial hypotheses.
b. Data Preparation
In the preparation phase, you will need to perform analytics in an analytical sandbox. This is for an entire project. You will also extract, transform, load, and transform data into the sandbox.
c. Model Planning
In the third phase, you choose the methods you want to work with to find out how the variables relate to each other. This includes carrying out Exploratory Data Analytics (EDA) making use of statistical formulae and visualization tools.
d. Model Building
This phase includes developing datasets for training and testing. It also means you will have to analyze techniques like classification and clustering and determine whether the current infrastructure will do.
e. Communicate results
This is the second last phase in the cycle. You must determine whether your goals have been met. Document your findings, communicate to stakeholders, label the project a success or failure.
Do you know the Skills Needed to Become a Data Scientist
In the last phase, you must craft final reports, technical documents, and briefings
This Data Science Tutorial is dedicated to Python. So, let’s start Data Science for Python.
8. Data Science Tutorial – Why Python?
So, now you know what data science is all about. But why is Python the best choice for it? Here are a few reasons-
- Open-source and free.
- Easy to learn; intuitive.
- Fewer lines of code.
- Better productivity.
- Demand and popularity.
- Excellent online presence/ community.
- Support for many packages usable with analytics projects; can also use packages that can use code from other languages.
- It is faster than similar tools like R and MATLAB.
- Amazing memory management abilities.
9. Python 2.x or 3.x- Which should you go for?
Among a lot of other factors, the support for Python 2 ends officially on January 1st, 2020, so the future belongs to Python 3. Also, 95% of the libraries for data science are done being migrated from Python 2 to Python 3. Apart from that, Python 3 is cleaner and faster.
Well, then what about Python 2? It has its own perks- it is rich with a large online community and plenty of third-party libraries, and some features are backwards-compatible and work with both versions.
With the perks of each version listed, make your choices.
10. Data Science Tutorial – Python Libraries
For carrying out data analysis and other scientific computation, you will need any of the following libraries:
Pandas help us with munging and preparing data; it is great for operating on and maintaining structured data.
SciPy (Scientific Python) stands on top of NumPy. With this library, we can carry out functionality like Linear Algebra, Fourier Transform, Optimization, and many others.
NumPy (Numerical Python) is another library that lets us deal with features like linear algebra, Fourier transforms and advanced random number capabilities. One very import feature of NumPy is the n-dimensional array.
Matplotlib will let you plot different kinds of graphs. These include pie charts, bar graphs, histograms, and even heat plots.
Scikit-learn is great for machine learning. It will let you statistically model and implement machine learning. The tools for these include clustering, regression, classification, and dimensionality reduction.
Seaborn is good with statistical data visualization. Making use of it, we can create useful and attractive graphics.
Scrapy will let you crawl the web. It begins on a home page and gets deeper within a website for information.
11. Learning in Data Science Tutorial
Before you begin with data science Tutorials, we suggest you should brush up on the following:
- Variables in Python
- Operators in Python
- Dictionaries in Python
- Strings in Python
- Python Lists
- Python Tuples
So, this was all about Data Science Tutorial. Hope you like our explanation.
Hence, we complete this Data Science Tutorial, in which we learned: what is Data Science, History of Data Science, and Data Science Methodologies. In addition, we covered the Data Science Applications, BI Vs Data Science. At last, we discussed Life-Cycle of Data Science and Python Libraries. This will get you started with Python.
Got something else to add in this Data Science Tutorial? Drop it in the comments below.