# 18 Popular Data Science Interview Questions and Answer

## 1. Data Science Interview Questions and Answer

Still not prepared for Data Science Interview? or are you worried to face interview questions? Here we are coming with our new part, 18 Popular Data Science Interview Questions and Answer, will help you to face Data science interview questions and answer.

## 2. Frequently Asked Data Science Interview Questions and Answer

Below we discussed 18 mostly asked data science interview questions and answer.

**Q.1. Explain types of clustering algorithm?**

*Let’s look some of them in detail:*

**a. Distribution models**

Basically, these are based on the notion of how probable is it that all data points in the cluster belong to the same distribution. These models often suffer from over-fitting.

**For Example:**

**Model-based clustering:** It is being used on a heuristic approach to constructing clusters. it assumes a data model. Also, we can apply an EM algorithm. That’s need to find the most likely model components and the number of clusters.

**b. Connectivity models**

It is based on the notion. The data points closer in data space exhibit more similarity to each other. These models can follow two approaches:

We first start with classifying all data points into separate clusters. Then aggregating them as the distance decreases.

All data points are classified as a single cluster. Then partitioned as the distance increases. Also, the choice of the distance function is subjective. These models are very easy to interpret. But lack scalability for handling big data sets.

**For Example:**

**Hierarchical clustering:** It helps in creating a hierarchy of clusters. Then presents the hierarchy in a dendrogram. In this method, he does not need any number of clusters to be specified at the beginning.

**c. Density Models**

It helps in searching the data space for areas of varied density of data points in the data space. It isolates different density regions. Also, assigns the data points within these regions in the same cluster.

**For Example:**

**Density-based clustering in R:** In regards to the density measurement it creates clusters. In this method, we have known that cluster has a higher density than the rest of the dataset. Density in data space is the measure.

**d. Centroid models**

These are iterative clustering algorithms. In this, the notion of similarity is derived by the closeness of a data point to the centroid of the clusters.

**For Example:**

**K-means clustering:** It is also referred to as flat clustering. Also, it requires the number of clusters as an input. But, its performance is faster than hierarchical clustering. Distance from the mean value of each observation/cluster is the measure.

**Q.2. Explain DBSCAN algorithm in R?**

This algorithm works on a parametric approach. Moreover, we use two parameters in this algorithm that are:

- e (eps)- Radius of our neighborhoods around a data point p.
- minPts is the smallest number of data points we want in a neighborhood to define a cluster.

**Q.3. Explain advantages of density-based clustering in R?**

- It does not need a predefined number of clusters.
- Basically, clusters can be of any shape, including non-spherical ones.
- Also, this technique is able to identify noise data (outliers).
- Unlike K-means, DBSCAN does not need the user to specify the number of clusters to be generated.
- DBSCAN can find any shape of clusters.
- Also, the cluster doesn’t have to be circular.
- DBSCAN can identify outliers.

**Q.4. Explain disadvantages of density-based clustering in R?**

- If there are no density drops between clusters, then density-based clustering will fail.
- It seems to be difficult to detect noise points if there is variation in the density.
- It is sensitive to parameters i.e. its hard to determine the correct set of parameters.
- The quality of DBSCAN depends on the distance measure.

**Q.5. What are limitations of DBSCAN?**

- It is sensitive to the choice of e. In particular, if clusters have different densities, there are two conditions-
- If e is too small then we have to define sparser clusters as noise.
- e is too large- If we this condition then the denser clusters may be merged together.

**Q.6. What is meant by hierarchical Clustering in R?**

It is an algorithm which builds a hierarchy of clusters. Although, it starts with all the data points that are assigned to a cluster of their own. Then the two nearest clusters will merge into the same cluster. In the end, we use to terminate it when there is only a single cluster left.

**Data Science Interview Questions and Answer for freshers. Q- 1,3,4,5**

**Data Science Interview Questions and Answer for experienced. Q- 2,6**

**Follow this link to get extra interview questions for Data Science**

**Q.7. What are Characteristics of R Hierarchical Clustering?**

- Multilevel decomposition.
- The merges or splits cannot perform a rollback. Also, we can’t correct an error in an algorithm which occurs by merging.
- The hybrid algorithm.

**Q.8. Why data science and data scientists are needed?**

- For the development of enabling technology.
- For raising expectations from customers.

**Q.9. What are data science and analytic?**

Data science, also known as data-driven science. it is an interdisciplinary field of scientific methods and processes. Also, systems to extract knowledge.

**Q.10. What do you do as a data scientist?**

A data scientist must know how to extract meaning from the data. It also interprets data. It requires both tools and methods from statistics and **machine learning**. A person has to spend a lot of time in collecting, cleaning, and munging data, because data is never clean.

**Q.11. What are the roles and responsibilities of a data scientist?**

Data analysts is a vast concept. Its basic knowledge includes languages like **R**, **Python**, and SQL. Much like data scientist role, a broad skill set is also mandatory for the data analyst role. Also, it combines technical and analytical knowledge.

**Q12. What is the job of a data scientist?**

Data scientists are big data wranglers. They take an enormous mass of messy data points. Also, use their formidable skills in math, statistics. Along with this, it includes programming to clean, massage and organize them.

**Data Science Interview Questions and Answer for freshers. Q- 7,8,9,10,12**

**Data Science Interview Questions and Answer for experienced. Q- 11**

**Must read 30 Data science interview questions**

**Q.13. What is the job description for a data scientist?**

A Data Scientist is someone who makes value out of data. … Its duties include creating various Machine Learning-based tools. Also, It must include processes within the company, such as recommendation engines. Moreover, people within this role need to perform statistical analysis.

**Q.14. What is the salary for a data scientist?**

The median annual salary is $122,747, as of September 27, 2017, with a range usually between $106,949-$137,575

**Q.15. What is the job of analytics?**

If you belong to a business background—

- Product Managers;
- Project Managers;
- MBAs—consider a Business Analytics job.

If you are having experience in statistics. Then a Predictive Analytics professional job may suit you.

**Q.16. What do I need to study to become a data analyst?**

- A bachelor’s degree is needed for most entry-level jobs.
- A master’s degree will be needed for many upper-level jobs.
- Most analysts will have degrees in fields like math, statistics, computer science. Also, something related to their field. Strong math and analysis skills are needed.

**Q.17. What skills do you need to be a data analyst?**

**Technical Skills:**

- A basic knowledge of statistics to a correct understanding of
**Machine Learning**. - Computer skills that are useful are –
- A querying Language (SQL,
**Hive**, Pig); - A scripting Language (
**Python**, Matlab); - A Statistical Language (
**R**, SAS, SPSS); A Spreadsheet (Excel).

**Q.18. Explain Examples of Data Science?**

**a.Marketing Predictive Lifetime Value(LTV)**

**What for:** This will support customer segmentation. Also supports other marketing initiatives. If in case we predict the characteristics of its customers.

**Usage:** We can use it in both online algorithm and a static report. It shows the characteristics of high LTV customers.

**b.Logistics**

- Demand Forecasting
- How many of what thing do you need and where will we need them?
**revenue impact:**It supports growth and also militates against revenue leakage.**Usage:**it is used in an online algorithm and static report.

**Data Science Interview Questions and Answer for freshers. Q- 13,18**

**Data Science Interview Questions and Answer for experienced. Q- 12,14,15,16,17**

## 3. Conclusion

In conclusion, we hope this Data Science Interview Questions and answer, will help you with your data science interview. Still, if you have Query regarding Data Science Interview Questions and answer, feel free to ask in a comment section.