{"id":5401,"date":"2018-01-30T12:30:46","date_gmt":"2018-01-30T12:30:46","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=5401"},"modified":"2025-08-04T21:44:30","modified_gmt":"2025-08-04T16:14:30","slug":"data-science-interview-questions","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/","title":{"rendered":"130 Data Science Interview Questions and Answers [Latest]"},"content":{"rendered":"<p>DataFlair has published a series of best Data Science Interview Questions which consists of more than 130 data science interview questions and answers. Bookmark the links now and thank us later &#8211;<\/p>\n<ul>\n<li><em><strong>Data Science Interview Questions for Freshers<\/strong><\/em><\/li>\n<li><a href=\"https:\/\/data-flair.training\/blogs\/data-science-interview-questions-and-answers\/\"><em><strong>Data Science Interview Questions for Intermediate Level<\/strong><\/em><\/a><\/li>\n<li><a href=\"https:\/\/data-flair.training\/blogs\/r-data-science-interview-questions\/\"><em><strong>Data Science Interview Questions for Experienced<\/strong><\/em><\/a><\/li>\n<\/ul>\n<p>So, let&#8217;s start with the first part &#8211; top Data Science Interview Questions for Freshers.<\/p>\n<p><span style=\"font-weight: 400\">We bring to you a variety of challenging, insightful data science interview questions curated by top data scientists, industry experts, and experienced professionals widely asked in the industry. This will surely help you to get your desired data science job. This blog consists of the following types of questions &#8211;\u00a0<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400\"><strong>Scenario-based<\/strong> data science interview questions to help build critical thinking and improve performance under pressure.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Project-based<\/strong> data science interview questions based on the projects you worked on.<\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Technical <\/strong>data science interview questions related to different programming languages like<em> R, SQL, Python.<\/em><\/span><\/li>\n<li><span style=\"font-weight: 400\"><strong>Non-technical <\/strong>data science interview questions based on your problem-solving\u00a0<em>ability, analytical thinking, and skills.<\/em><\/span><\/li>\n<li><span style=\"font-weight: 400\">And finally <strong>open-ended and behavior-based<\/strong> data science interview questions.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Not only this, all the below data science interview questions cover the <a href=\"https:\/\/data-flair.training\/blogs\/data-science-tutorials-home\/\"><em><strong>important concepts of data science<\/strong><\/em><\/a>, machine learning, statistics, and probability.\u00a0<\/span><\/p>\n<p><strong>Q.1\u00a0What do you understand by the term Normal Distribution?<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-55697\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01.jpg\" alt=\"Normal Distribution in data science\" width=\"325\" height=\"259\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01.jpg 527w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01-150x120.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01-300x239.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/Normal-Distribution-01-520x414.jpg 520w\" sizes=\"auto, (max-width: 325px) 100vw, 325px\" \/><\/a><\/p>\n<p><em><strong><a href=\"https:\/\/data-flair.training\/blogs\/normal-distribution-in-r\/\">Normal Distribution<\/a><\/strong><\/em> is also known as Gaussian Distribution. It is a type of probability distribution that is symmetric about the mean. it shows that the data is closer to the mean and the frequency of occurrences in data are far from the mean.<\/p>\n<p><strong>Q.2 How will you explain linear regression to a non-tech person?<\/strong><\/p>\n<p>Linear Regression is a statistical technique of measuring the linear relationship between the two variables. By linear relationship, we mean that an increase in a variable would lead to increase in the other variable and a decrease in one variable would lead to attenuation in the second variable as well. Based on this linear relationship, we establish a model that predicts the future outcomes based on an increase in one variable.<\/p>\n<p><strong>Q.3 How will you handle missing values in data?<\/strong><\/p>\n<p>There are several ways to handle missing values in the given data-<\/p>\n<ul>\n<li>Dropping the values<\/li>\n<li>Deleting the observation (not always recommended).<\/li>\n<li>Replacing value with the mean, median and mode of the observation.<\/li>\n<li>Predicting value with regression<\/li>\n<li>Finding appropriate value with clustering<\/li>\n<\/ul>\n<p><strong>Q.4 How will you verify if the items present in list A are present in series B?<\/strong><\/p>\n<p>We will use the isin() function. For this, we create two series s1 and s2 &#8211;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">s1 = pd.Series([1, 2, 3, 4, 5])\r\ns2 = pd.Series([4, 5, 6, 7, 8])\r\ns1[s1.isin(s2)]<\/pre>\n<p><strong>Q.5 How to find the positions of numbers that are multiples of 4 from a series?<\/strong><\/p>\n<p>For finding the multples of 4, we will use the argwhere() function. First, we will create a list of 10 numbers &#8211;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">s1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\r\nnp.argwhere(ser % 4==0)<\/pre>\n<p><strong>Output &gt; [3], [7]<\/strong><\/p>\n<p><strong>Q.6 How are KNN and K-means clustering different?<\/strong><\/p>\n<p>Firstly, KNN is a supervised learning algorithm. In order to train this algorithm, we require labeled data. K-means is an unsupervised learning algorithm that looks for patterns that are intrinsic to the data. The K in KNN is the number of nearest data points. On the contrary, the K in K-means specify the number of centroids.<\/p>\n<p><em><strong>Read our latest article on<a href=\"https:\/\/data-flair.training\/blogs\/k-means-clustering-tutorial\/\"> K-means clustering<\/a> and learn everything about it.\u00a0<\/strong><\/em><\/p>\n<p><strong>Q.7 Can you stack two series horizontally? If so, how?<\/strong><\/p>\n<p>Yes, we can stack the two series horizontally using concat() function and setting axis = 1.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">df = pd.concat([s1, s2], axis=1)<\/pre>\n<p><strong>Q.8 How can you convert date-strings to timeseries in a series?<\/strong><\/p>\n<p><strong>Input:<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">s = pd.Series(['02 Feb 2011', '02-02-2013', '20160104', '2011\/01\/04', '2014-12-05', '2010-06-06T12:05])<\/pre>\n<p>To solve this, we will use the to_datetime() function.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">pd.to_datetime(s)<\/pre>\n<p><strong>Q.9 Python or R \u2013 Which one would you prefer for text analytics?<\/strong><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-56859\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1.jpg\" alt=\"R vs Python for data science\" width=\"616\" height=\"323\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1.jpg 802w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/05\/R-vs-Python-for-data-science1-520x272.jpg 520w\" sizes=\"auto, (max-width: 616px) 100vw, 616px\" \/><\/a><\/p>\n<p>Both Python and R provide robust functionalities for working with text data. R provides extensive text analytics libraries but its data mining libraries are still in a nascent stage. Python is best suited for enterprise level and for increasing software productivity. For handling unstructured data, R provides a vast variety of support packages. Python is best apt at handling colossal data while R has memory constraints and is slower in response to large data. Therefore, the preference for using Python or R depends on the area of functionality and usage.<\/p>\n<p><em><strong>Revise\u00a0<a href=\"https:\/\/data-flair.training\/blogs\/r-vs-python-for-data-science\/\">Python vs R<\/a> to frame the answer of this data science interview question<\/strong><\/em><\/p>\n<p><strong>Q.10 Explain ROC curve.<\/strong><\/p>\n<p>Receiver Operating Characteristic is a measurement of the True Positive Rate (TPR) against False Positive Rate (FPR). We calculate True Positive (TP) as TPR = TP\/ (TP + FN). On the contrary, false positive rate is determined as FPR = FP\/FP+TN where where TP = true positive, TN = true negative, FP = false positive, FN = false negative.<\/p>\n<p><strong>Q.11 How is AUC different from ROC?<\/strong><\/p>\n<p>AUC curve is a measurement of precision against the recall. Precision = TP\/(TP + FP) and TP\/(TP + FN). This is in contrast with ROC that measures and plots True Positive against False positive rate.<\/p>\n<p><strong>Q.12 Why is Naive Bayes referred to as Naive?<\/strong><\/p>\n<p>Ans. In <em><strong>Naive Bayes<\/strong><\/em>, the assumptions and probabilities that are computed of the features are independent of each other. It is the assumption of feature independence that makes Naive Bayes, \u201cNaive\u201d.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/bayes-theorem-data-science-interview-questions.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-63826\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/bayes-theorem-data-science-interview-questions.png\" alt=\"Data science interview questions - bayes theorem\" width=\"405\" height=\"420\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/bayes-theorem-data-science-interview-questions.png 405w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/bayes-theorem-data-science-interview-questions-145x150.png 145w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/bayes-theorem-data-science-interview-questions-289x300.png 289w\" sizes=\"auto, (max-width: 405px) 100vw, 405px\" \/><\/a><\/p>\n<p><strong>Q.13 How will you create a series from a given list in Pandas?<\/strong><\/p>\n<p>We will the list to the Series() function.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">ser1 = pd.Series(mylist)<\/pre>\n<p><strong>Q.14 Explain bias, variance tradeoff.<\/strong><\/p>\n<p>Bias leads to a phenomenon called underfitting. This is caused by the introduction of error due to the oversimplification of the model. On the contrary, variance occurs due to complexity in the machine learning algorithm. In variance, the model also learns noise and other distortions that affect the overall performance of it. If you increase the complexity of your model, then the error will go down due to reduction in bias. However, after a certain point, the error will increase due to increasing complexity and addition of noise. This is known as bias-variance tradeoff. A good machine learning algorithm should possess low bias and low variance.<\/p>\n<p><strong>Q.15 What is a confusion matrix?<\/strong><\/p>\n<p>A confusion matrix is a table that delineates the performance of a supervised learning algorithm. It provides a summary of prediction results on a classification problem. With the help of confusion matrix, you can not only find the errors made by the predictor but also the type of errors.<\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-63789\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science.jpg\" alt=\"Data Science Interview Questions - Type I and II Errors\" width=\"700\" height=\"500\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science.jpg 700w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science-150x107.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science-300x214.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/Type-I-and-II-erorrs-interview-questions-in-data-science-520x371.jpg 520w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/a><\/p>\n<p><strong>Q.16 What is SVM? Can you name some kernels used in SVM?<\/strong><\/p>\n<p><em><strong>SVM stands for support vector machine<\/strong><\/em>. They are used for classification and prediction tasks. SVM consists of a separating plane that discriminates between the two classes of variables. This separating plane is known as hyperplane. Some of the kernels used in SVM are &#8211;<\/p>\n<ul>\n<li>Polynomial Kernel<\/li>\n<li>Gaussian Kernel<\/li>\n<li>Laplace RBF Kernel<\/li>\n<li>Sigmoid Kernel<\/li>\n<li>Hyperbolic Kernel<\/li>\n<\/ul>\n<p><em><strong><a href=\"https:\/\/data-flair.training\/blogs\/svm-support-vector-machine-tutorial\/\">Support Vector Machine<\/a> &#8211; Important topic for data science interview<\/strong><\/em><\/p>\n<p><strong>Q.17 How is Deep Learning different from Machine Learning?<\/strong><\/p>\n<p>Deep Learning is an extension of Machine Learning. It is a special area within ML that is about developing algorithms that simulate human nervous system. Deep Learning involves neural networks which are trained over large datasets to understand the patterns and then perform classification and prediction.<em><strong> Check out the detailed comparison of<a href=\"https:\/\/data-flair.training\/blogs\/deep-learning-vs-machine-learning\/\"> Deep Learning vs Machine Learning<\/a> in easy steps<\/strong><\/em><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-55007\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning.jpg\" alt=\"data science interview question - deep learning vs machine learning\" width=\"803\" height=\"421\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning.jpg 803w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning-768x403.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/04\/deep-learning-vs-machine-learning-520x273.jpg 520w\" sizes=\"auto, (max-width: 803px) 100vw, 803px\" \/><\/a><\/p>\n<p><strong>Q.18 How can you compute significance using p-value?<\/strong><\/p>\n<p>After a hypothesis test is conducted, we compute the significance of the results. The <a href=\"https:\/\/en.wikipedia.org\/wiki\/P-value\">p-value<\/a> is present between 0 and 1. If the p-value is less than 0.05, then it means that we cannot reject the null hypothesis. However, if it is greater than 0.05, then we reject the null hypothesis.<\/p>\n<p><strong>Q.19 Why don\u2019t gradient descent methods always converge to the same point?<\/strong><\/p>\n<p>This is because, in some cases, they reach to local or local optima point. The methods don\u2019t always achieve global minima. This is also dependent on the data, the descent rate and origin point of descent.<\/p>\n<p><strong>Q.20 Explain A\/B testing.<\/strong><\/p>\n<p>To perform a hypothesis testing of a randomized experiment with two variables A and B, we make use of A\/B testing. A\/B testing is used to optimize web-pages based on user preferences where small changes are added to web-pages that are delivered to a sample of users. Based on their reaction to the web-page and reaction of the rest of the audience to the original page, we can carry out this statistical experiment.<\/p>\n<p><strong>Q.21 What is box cox transformation?<\/strong><\/p>\n<p>In order to transform the response variable so that the data meets its required assumptions, we make use of Box Cox Transformation. With the help of this technique, we can transform non-normal dependent variables into normal shapes. We can apply a broader number of tests with the help of this transformation.<\/p>\n<p><strong>Q.22 What is meant by \u2018curse of dimensionality\u2019? How can we solve it?<\/strong><\/p>\n<p>While analyzing the dataset, there are instances where the number of variables or columns are in excess. However, we are required to only extract significant variables from the group. For example, consider that there are a thousand features. However, we only need to extract handful of significant features. This problem of having numerous features where we only need a few is called \u2018curse of dimensionality\u2019.<\/p>\n<p>There are various algorithms for dimensionality reduction like PCA (Principal Component Analysis).<\/p>\n<p><strong>Q.23 What is the difference between recall and precision?<\/strong><\/p>\n<p>Recall is the fraction of instances that have been classified as true. On the contrary, precision is a measure of weighing instances that are actually true. While recall is an approximation, precision is a true value that represents factual knowledge.<\/p>\n<p><strong>Q.24 What is pickle module in Python?<\/strong><\/p>\n<p>For serializing and de-serializing an object in Python, we make use of pickle module. In order to save this object on drive, we make use of pickle. It converts an object structure into character stream.<\/p>\n<p><em><strong>Learn everything about <a href=\"https:\/\/data-flair.training\/blogs\/python-pickle\/\">Pickle module in Python<\/a><\/strong><\/em><\/p>\n<p><strong>Q.25 What are the different forms of joins in a table?<\/strong><\/p>\n<p>Some of the different joins in a table are &#8211;<\/p>\n<ul>\n<li>Inner Join<\/li>\n<li>Left Join<\/li>\n<li>Outer Join<\/li>\n<li>Full Join<\/li>\n<li>Self Join<\/li>\n<li>Cartesian Join<\/li>\n<\/ul>\n<p><strong>Q.26 List differences between DELETE and TRUNCATE commands.<\/strong><\/p>\n<p>DELETE command is used in conjunction with WHERE clause to delete some rows from the table. This action can be rolled back.<\/p>\n<p>However, TRUNCATE is used to delete all the rows of a table and this action cannot be rolled back.<\/p>\n<p><strong>Q.27 Can you tell some clauses used in SQL?<\/strong><\/p>\n<p>Some of the commonly used <a href=\"https:\/\/data-flair.training\/blogs\/clause-in-sql\/\"><em><strong>clauses in SQL<\/strong> <\/em><\/a>are &#8211;<\/p>\n<ul>\n<li>WHERE<\/li>\n<li>GROUP BY<\/li>\n<li>ORDER BY<\/li>\n<li>USING<\/li>\n<\/ul>\n<p><strong>Q.28 How will you get second highest salary of an employee emp from employee_table?<\/strong><\/p>\n<p>In order to get the second highest salary of an employee, we will use the following query &#8211;<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"null\">SELECT TOP 1 salary\r\nFROM(\r\nSELECT TOP 2 salary\r\nFROM employee_table\r\nORDER BY salary DESC) AS emp\r\nORDER BY salary ASC;<\/pre>\n<p><em>According to many data scientist, this question is considered as the most asked data science interview question.\u00a0<\/em><\/p>\n<p><strong>Q.29 What is a foreign key?<\/strong><\/p>\n<p>A foreign key is a special key that belongs to one table and can be used as a primary key of another table. In order to create a relationship between the two tables, we reference the foreign key with the primary key of the other table.<\/p>\n<p><strong>Q.30 What do you mean by Data Integrity?<\/strong><\/p>\n<p>With data integrity, we can define the accuracy as well as the consistency of the data. This integrity is to be ensured over the entire life-cycle.<\/p>\n<p><strong>Q.31 How is SQL different from NoSQL?<\/strong><\/p>\n<p>SQL deals with <a href=\"https:\/\/data-flair.training\/blogs\/sql-rdbms\/\"><em><strong>Relational Database Management Systems<\/strong><\/em><\/a> or RDBMS. This type of database stores structured data that is organized in rows and columns, that is, in a table. However, NoSQL is a query language that deals with Non-Relational Database Management Systems. The data present here is unstructured. Structured data is mostly generated from services, gadgets and software systems. However, unstructured data, which is increasing day by day, is generated from users directly.<\/p>\n<p><strong>Q.32 Can you tell me about some NoSQL databases?<\/strong><\/p>\n<p>Some of the popular NoSQL databases are Redis, MongoDB, Cassandra, HBase, Neo4j etc.<\/p>\n<p><strong>Q.33 How is Hadoop used in Data Science?<\/strong><\/p>\n<p>Hadoop provides the data scientists the ability to deal with large scale unstructured data. Furthermore, various new extensions of Hadoop like Mahout and PIG provide various features to analyze and implement machine learning algorithms on large scale data. This makes Hadoop a comprehensive system that is capable of handling all forms of data, making it an ideal suite for data scientists.<\/p>\n<p><em><strong><a href=\"https:\/\/data-flair.training\/blogs\/hadoop-tutorials-home\/\">Improve your Hadoop skills<\/a> and become the next data scientist<\/strong><\/em><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/big-data-and-data-science-interview-questions.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-63790\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/big-data-and-data-science-interview-questions.png\" alt=\"data science interview questions - hadoop in data science\" width=\"252\" height=\"252\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/big-data-and-data-science-interview-questions.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/big-data-and-data-science-interview-questions-150x150.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2019\/07\/big-data-and-data-science-interview-questions-160x160.png 160w\" sizes=\"auto, (max-width: 252px) 100vw, 252px\" \/><\/a><\/p>\n<p><strong>Q.34 How can you select an ideal value of K for K-means clustering?<\/strong><\/p>\n<p>There are several methods like the elbow method and kernel method to find the number of centroids in the given cluster. However, to ascertain an approximate number of centroids quickly, we can also take the square root of the number of data points divided by two. While this technique is not entirely accurate but is fast as compared to the previously mentioned techniques.<\/p>\n<p><em><strong>It is the right time to practice your data science learning through Project &#8211; <a href=\"https:\/\/data-flair.training\/blogs\/r-data-science-project-uber-data-analysis\/\">Uber Data Analysis Project in R<\/a>\u00a0<\/strong><\/em><\/p>\n<p><strong>Q.35 Define underfitting and overfitting.<\/strong><\/p>\n<p>Most statistics and ML projects need to fit a model on training data to be able to create predictions. There can be two problems while fitting a model- overfitting and underfitting.<\/p>\n<ul>\n<li>Overfitting is when a model has random error\/noise and not the expected relationship. If a model has a large number of parameters or is too complex, there can be overfitting. This leads to bad performance because minor changes to training data highly changes the model&#8217;s result.<\/li>\n<li>Underfitting is when a model is not able to understand the trends in the data. This can happen if you try to fit a linear model to non-linear data. This also results in bad performance.<\/li>\n<\/ul>\n<p><strong>Q.36\u00a0What are univariate, bivariate and multivariate analysis?<\/strong><\/p>\n<p>Three types of analysis are univariate, bivariate and multivariate.<\/p>\n<ul>\n<li>Univariate analysis includes descriptive statistical analysis techniques which you can differentiate on the basis of how many variables are involved. Some pie charts can have a single variable.<\/li>\n<li>Bivariate analysis explains the difference between two variables at one time. This can be analyzing sale volume and spending volume using a scatterplot.<\/li>\n<li>Multivariate analysis has more than two variables and explains effects of variables on responses.<\/li>\n<\/ul>\n<h3>Best Data Science Interview Questions<\/h3>\n<p>Below I am sharing top data science interview questions and this time I am not providing the answers. Now it is your turn to answer. Try to answer them and then share your answer through comments. Trust me this is the best practice for any interview preparations. So, here are the questions &#8211;<\/p>\n<p><strong>Q.1\u00a0<\/strong>Tell us about your favorite machine learning algorithm and why you like this?<\/p>\n<p><strong>Q.2<\/strong>\u00a0If you are a data scientist, how will you collect the data. What will be your data acquisition and retention strategy?<\/p>\n<p><strong>Q.3<\/strong>\u00a0Which uncommon skills you can add to your data science team?<\/p>\n<p><strong>Q.4\u00a0<\/strong>How did you upgrade your analytical skills? Tell us your practices<\/p>\n<p><strong>Q.5\u00a0<\/strong>If I will give you a data set, what will you do with it to know whether it suits your business needs or not?<\/p>\n<p><strong>Q.6<\/strong>\u00a0Tell us how to effectively represent data using 5 dimensions.<\/p>\n<p><strong>Q.7<\/strong>\u00a0What do you know about an exact test?<\/p>\n<p><strong>Q.8<\/strong>\u00a0What makes a good data scientist?<\/p>\n<p><strong>Q.9<\/strong>\u00a0Which tools will help you to succeed in your role as a data scientist?<\/p>\n<p><strong>Q.10<\/strong>\u00a0How would you resolve a dispute with a colleague?<\/p>\n<p><strong>Q.11\u00a0<\/strong>Have you ever changed someone&#8217;s opinion at work?<\/p>\n<p><strong>Q.12<\/strong> According to you, what makes data science so popular?<\/p>\n<p>These were some of the most asked data science interview questions. I hope you will try to frame the answers on your own, post them through comments. Let&#8217;s check how much you know about Data Science, Machine Learning, and R.<\/p>\n<h3>Summary<\/h3>\n<p>So, this is the end of our first part of data science interview questions. Hope you enjoyed it. If there is anything we missed or you have any suggestions comment below. It will help other students to crack the data science interview.<\/p>\n<p>If you want to practice top scenario or situation based data science interview questions then don&#8217;t forget to check the second part of\u00a0the <em><a href=\"https:\/\/data-flair.training\/blogs\/data-science-interview-questions-and-answers\/\"><strong>Data Science Interview Questions and Answers<\/strong><\/a><strong> Series<\/strong>.<\/em><\/p>\n<p><strong>All the best\ud83d\udc4d<\/strong><span hidden class=\"__iawmlf-post-loop-links\" data-iawmlf-links=\"[{&quot;id&quot;:2047,&quot;href&quot;:&quot;https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/P-value&quot;,&quot;archived_href&quot;:&quot;http:\\\/\\\/web-wp.archive.org\\\/web\\\/20251012040309\\\/https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/P-value&quot;,&quot;redirect_href&quot;:&quot;&quot;,&quot;checks&quot;:[{&quot;date&quot;:&quot;2025-12-11 00:01:00&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-15 22:14:25&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-21 09:19:06&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-26 11:52:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2025-12-30 03:31:02&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-02 08:48:07&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-08 15:24:37&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-12 21:55:51&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-18 22:12:49&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-23 15:42:17&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-01-28 09:13:23&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-03 20:32:36&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-13 04:47:27&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-16 22:15:39&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-23 02:49:04&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-02-26 10:19:19&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-03-01 12:03:20&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-06 08:33:39&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-09 17:41:31&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-13 02:23:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-17 05:03:56&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-03-23 18:00:05&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-06 02:28:50&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-14 05:49:49&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-17 16:24:55&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-22 05:54:26&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-04-28 12:03:08&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-01 17:52:59&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-06 00:09:50&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-09 04:09:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-12 10:07:52&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-17 12:33:16&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-20 14:10:46&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-23 22:04:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-05-27 21:47:45&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-01 05:23:05&quot;,&quot;http_code&quot;:404},{&quot;date&quot;:&quot;2026-06-04 14:34:46&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-11 08:47:43&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-17 08:53:58&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-21 04:22:06&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-06-29 00:27:20&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-03 00:30:15&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-06 04:39:54&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-09 05:30:14&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-12 18:27:06&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-17 13:51:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-23 20:08:21&quot;,&quot;http_code&quot;:200},{&quot;date&quot;:&quot;2026-07-28 22:27:23&quot;,&quot;http_code&quot;:404}],&quot;broken&quot;:false,&quot;last_checked&quot;:{&quot;date&quot;:&quot;2026-07-28 22:27:23&quot;,&quot;http_code&quot;:404},&quot;process&quot;:&quot;done&quot;}]\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>DataFlair has published a series of best Data Science Interview Questions which consists of more than 130 data science interview questions and answers. Bookmark the links now and thank us later &#8211; Data Science&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":71016,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[16714,3430,20796,11209],"class_list":["post-5401","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","tag-data-scence-interview-questions","tag-data-science-interview-questions-and-answers","tag-prepare-for-data-science-interview","tag-r-interview-questions"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>130 Data Science Interview Questions and Answers [Latest] - DataFlair<\/title>\n<meta name=\"description\" content=\"Top data science interview questions &amp; answers.Prepare for data scientist interview -How will you handle missing values in data,Can you stack two series horizontally\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"130 Data Science Interview Questions and Answers [Latest] - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Top data science interview questions &amp; answers.Prepare for data scientist interview -How will you handle missing values in data,Can you stack two series horizontally\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-30T12:30:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-04T16:14:30+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"802\" \/>\n\t<meta property=\"og:image:height\" content=\"420\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"130 Data Science Interview Questions and Answers [Latest] - DataFlair","description":"Top data science interview questions & answers.Prepare for data scientist interview -How will you handle missing values in data,Can you stack two series horizontally","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/","og_locale":"en_US","og_type":"article","og_title":"130 Data Science Interview Questions and Answers [Latest] - DataFlair","og_description":"Top data science interview questions & answers.Prepare for data scientist interview -How will you handle missing values in data,Can you stack two series horizontally","og_url":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-01-30T12:30:46+00:00","article_modified_time":"2025-08-04T16:14:30+00:00","og_image":[{"width":802,"height":420,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"130 Data Science Interview Questions and Answers [Latest]","datePublished":"2018-01-30T12:30:46+00:00","dateModified":"2025-08-04T16:14:30+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/"},"wordCount":2658,"commentCount":5,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg","keywords":["Data Scence Interview Questions","data science interview questions and answers","prepare for data science interview","R Interview Questions"],"articleSection":["Data Science Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/","url":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/","name":"130 Data Science Interview Questions and Answers [Latest] - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg","datePublished":"2018-01-30T12:30:46+00:00","dateModified":"2025-08-04T16:14:30+00:00","description":"Top data science interview questions & answers.Prepare for data scientist interview -How will you handle missing values in data,Can you stack two series horizontally","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/01\/top-data-science-interview-questions-with-answers.jpg","width":802,"height":420,"caption":"list of data science interview questions for freshers"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/data-science-interview-questions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Data Science Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/data-science\/"},{"@type":"ListItem","position":3,"name":"130 Data Science Interview Questions and Answers [Latest]"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=5401"}],"version-history":[{"count":11,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5401\/revisions"}],"predecessor-version":[{"id":146550,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/5401\/revisions\/146550"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/71016"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=5401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=5401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=5401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}