{"id":23659,"date":"2018-08-05T04:00:00","date_gmt":"2018-08-05T04:00:00","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=23659"},"modified":"2026-04-28T11:58:25","modified_gmt":"2026-04-28T06:28:25","slug":"python-ml-data-preprocessing","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/","title":{"rendered":"Data Preprocessing, Analysis &amp; Visualization &#8211; Python Machine Learning"},"content":{"rendered":"<p>Today in this <a href=\"https:\/\/data-flair.training\/blogs\/python-machine-learning-tutorial\/\"><strong>Python Machine Learning Tutorial<\/strong><\/a>, we will discuss Data Preprocessing, Analysis &amp; Visualization. Moreover, in this Data Preprocessing in Python<a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-tutorial\/\"> <strong>machine learning<\/strong><\/a>, we will look at rescaling, standardizing, normalizing, and binarizing the data. Also, we will see different steps in Data Analysis, Visualization, and Python Data Preprocessing Techniques.<\/p>\n<p>Data preprocessing remains one of the significant stages in the machine learning process because it aids in enhancing the models\u2019 accuracy and effectiveness. Data preparation is a process that is typically used to clean data and make arrangements that are favorable for the analysis process.<\/p>\n<p>On the other hand, Data preprocessing has an essential role, a prerequisite step in simplifying the data so that it can be ingested by the machine learning algorithm. Cleaning the data entails checking for pertinent features that keep errors and biases away and making sure the data is correctly formatted for the model.<\/p>\n<p>So, let&#8217;s start machine learning with Python Data Preprocessing.<\/p>\n<div id=\"attachment_23703\" style=\"width: 1210px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23703\" class=\"wp-image-23703 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg\" alt=\"Data Preprocessing, Analysis &amp;amp; Visualization - Python Machine Learning\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><p id=\"caption-attachment-23703\" class=\"wp-caption-text\">Data Preprocessing, Analysis &amp; Visualization &#8211; Python Machine Learning<\/p><\/div>\n<h3><strong>Data Preprocessing in Python Machine Learning<\/strong><\/h3>\n<p><span style=\"font-weight: 400\"><a href=\"https:\/\/data-flair.training\/blogs\/machine-learning-algorithm\/\"><strong>Machine Learning algorithms<\/strong><\/a> don\u2019t work so well with processing raw data. Before we can feed such data to an ML algorithm, we must preprocess it. In other words, we must apply some transformations to it. With data preprocessing, we convert raw data into a clean data set.<\/span><\/p>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-23696 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01.jpg\" alt=\"Data Preprocessing in Python machine Learning\" width=\"1200\" height=\"628\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01.jpg 1200w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01-150x79.jpg 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01-300x157.jpg 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01-768x402.jpg 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Data-Preprocessing-in-Python-Machine-Learning-01-1024x536.jpg 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400\">Some <strong>ML models<\/strong> need information to be in a specified format. For instance, the Random Forest algorithm does not take null values. To preprocess data, we will use the library scikit-learn or sklearn in this tutorial.<\/span><\/p>\n<h3>Python Data Preprocessing Techniques<\/h3>\n<p><span style=\"font-weight: 400\">Let\u2019s talk about seven such techniques for Data Preprocessing in Python Machine Learning.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/data-structures-in-python-lists-tuples-sets-dictionaries\/\" target=\"_blank\" rel=\"noopener\">Let&#8217;s have a look at data Structure in Python<\/a><\/strong><\/p>\n<h4><strong>1. Rescaling Data using Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">For data with attributes of varying scales, we can rescale attributes to possess the same scale. We rescale attributes into the range 0 to 1 and call it normalization. We use the MinMaxScaler class from scikit-learn. Let\u2019s take an example.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; import pandas, scipy, numpy\r\n&gt;&gt;&gt; from sklearn.preprocessing import MinMaxScaler\r\n&gt;&gt;&gt; df=pandas.read_csv( 'http:\/\/archive.ics.uci.edu\/ml\/machine-learning-databases\/wine-quality\/winequality-red.csv ',sep=';')\r\n&gt;&gt;&gt; array=df.values\r\n&gt;&gt;&gt; #Separating data into input and output components\r\n&gt;&gt;&gt; x=array[:,0:8]\r\n&gt;&gt;&gt; y=array[:,8]\r\n&gt;&gt;&gt; scaler=MinMaxScaler(feature_range=(0,1))\r\n&gt;&gt;&gt; rescaledX=scaler.fit_transform(x)\r\n&gt;&gt;&gt; numpy.set_printoptions(precision=3) #Setting precision for the output\r\n&gt;&gt;&gt; rescaledX[0:5,:]<\/pre>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/rescale.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-23663 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/rescale.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"536\" height=\"97\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/rescale.png 536w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/rescale-150x27.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/rescale-300x54.png 300w\" sizes=\"auto, (max-width: 536px) 100vw, 536px\" \/><\/a><\/p>\n<p>This gives us values between 0 and 1. Rescaling data proves useful with neural networks, optimization algorithms and those that use distance measures like k-nearest neighbors and weight inputs like regression.<\/p>\n<h4><strong>2. Standardizing Data in Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">With standardization, we can take attributes with a Gaussian distribution and different means and standard deviations and transform them into a standard Gaussian distribution with a mean of 0 and a standard deviation of 1. For this, we use the StandardScaler class. Let\u2019s take an example.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-statistics\/\" target=\"_blank\" rel=\"noopener\">Do you know about Python Statistics<\/a>?<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import StandardScaler\r\n&gt;&gt;&gt; scaler=StandardScaler().fit(x)\r\n&gt;&gt;&gt; rescaledX=scaler.transform(x)\r\n&gt;&gt;&gt; rescaledX[0:5,:]<\/pre>\n<p><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/standardize.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-23664 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/standardize.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"593\" height=\"97\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/standardize.png 593w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/standardize-150x25.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/standardize-300x49.png 300w\" sizes=\"auto, (max-width: 593px) 100vw, 593px\" \/><\/a><\/p>\n<h4>3. Normalizing Data using Python<\/h4>\n<p><span style=\"font-weight: 400\">In this task, we rescale each observation to a length of 1 (a unit norm). For this, we use the Normalizer class. Let\u2019s take an example.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import Normalizer\r\n&gt;&gt;&gt; scaler=Normalizer().fit(x)\r\n&gt;&gt;&gt; normalizedX=scaler.transform(x)\r\n&gt;&gt;&gt; normalizedX[0:5,:]<\/pre>\n<div id=\"attachment_23665\" style=\"width: 605px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/normalize.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23665\" class=\"wp-image-23665 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/normalize.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"595\" height=\"176\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/normalize.png 595w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/normalize-150x44.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/normalize-300x89.png 300w\" sizes=\"auto, (max-width: 595px) 100vw, 595px\" \/><\/a><p id=\"caption-attachment-23665\" class=\"wp-caption-text\">Normalizing Data in Data Preprocessing<\/p><\/div>\n<h4><strong>4. Binarizing Data using Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Using a binary threshold, it is possible to transform our data by marking the values above it 1 and those equal to or below it, 0. For this purpose, we use the Binarizer class. Let\u2019s take an example.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-data-science-environment-setup\/\">Learn about Python Data Science Environment Setup<\/a><\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import Binarizer\r\n&gt;&gt;&gt; binarizer=Binarizer(threshold=0.0).fit(x)\r\n&gt;&gt;&gt; binaryX=binarizer.transform(x)\r\n&gt;&gt;&gt; binaryX[0:5,:]<\/pre>\n<p><span style=\"font-weight: 400\">This marks 0 over all values equal to or less than 0, and marks 1 over the rest. When you want to turn probabilities into crisp values, this functionality comes in handy.<\/span><\/p>\n<div id=\"attachment_23666\" style=\"width: 352px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/binarize.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23666\" class=\"wp-image-23666 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/binarize.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"342\" height=\"101\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/binarize.png 342w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/binarize-150x44.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/binarize-300x89.png 300w\" sizes=\"auto, (max-width: 342px) 100vw, 342px\" \/><\/a><p id=\"caption-attachment-23666\" class=\"wp-caption-text\">Binarizing Data in Data Preprocessing<\/p><\/div>\n<h4><strong>5. Mean Removal using Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">We can remove the mean from each feature to center it on zero.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import scale\r\n&gt;&gt;&gt; data_standardized=scale(df)\r\n&gt;&gt;&gt; data_standardized.mean(axis=0)<\/pre>\n<p><strong>array([ 3.555e-16, \u00a01.733e-16, -8.887e-17, -1.244e-16, \u00a03.910e-16,<\/strong><br \/>\n<strong> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-6.221e-17, \u00a04.444e-17, 2.364e-14, \u00a02.862e-15, 6.754e-16,<\/strong><br \/>\n<strong> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a01.066e-16, \u00a08.887e-17])<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; data_standardized.std(axis=0)<\/pre>\n<p><strong>array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])<\/strong><\/p>\n<h4><strong>6. One Hot Encoding using Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">When dealing with a few and scattered numerical values, we may not need to store them. Then, we can perform one-hot encoding. For k distinct values, we can transform the feature into a k-dimensional vector with one value of 1 and 0 for the rest of the values.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import OneHotEncoder\r\n&gt;&gt;&gt; encoder=OneHotEncoder()\r\n&gt;&gt;&gt; encoder.fit([[0,1,6,2],\r\n[1,5,3,5],\r\n[2,4,2,7],\r\n[1,0,4,2]\r\n])<\/pre>\n<p><strong>OneHotEncoder(categorical_features=&#8217;all&#8217;, dtype=&lt;class &#8216;numpy.float64&#8217;&gt;,<\/strong><br \/>\n<strong> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0handle_unknown=&#8217;error&#8217;, n_values=&#8217;auto&#8217;, sparse=True)<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; encoder.transform([[2,4,3,4]]).toarray()<\/pre>\n<p><strong>array([[0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0.]])<\/strong><\/p>\n<h4><strong>7. Label Encoding using Python<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Some labels can be words or numbers. Usually, training data is labelled with words to make it readable. Label encoding converts word labels into numbers to let algorithms work on them. Let\u2019s take an example.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-packages\/\">Let&#8217;s discuss Python Packages<\/a><\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; from sklearn.preprocessing import LabelEncoder\r\n&gt;&gt;&gt; label_encoder=LabelEncoder()\r\n&gt;&gt;&gt; input_classes=['Havells','Philips','Syska','Eveready','Lloyd']\r\n&gt;&gt;&gt; label_encoder.fit(input_classes)<\/pre>\n<p><span style=\"font-weight: 400\"><strong>LabelEncoder()<\/strong><\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; for i,item in enumerate(label_encoder.classes_):\r\nprint(item,'--&gt;',i)<\/pre>\n<p><strong>Eveready &#8211;&gt; 0<\/strong><br \/>\n<strong>Havells &#8211;&gt; 1<\/strong><br \/>\n<strong>Lloyd &#8211;&gt; 2<\/strong><br \/>\n<strong>Philips &#8211;&gt; 3<\/strong><br \/>\n<strong>Syska &#8211;&gt; 4<\/strong><br \/>\n<span style=\"font-weight: 400\">This gives us a set of numeric labels that map to these words. Let\u2019s confirm this:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; labels=['Lloyd','Syska','Philips']\r\n&gt;&gt;&gt; label_encoder.transform(labels)<\/pre>\n<p><strong>array([2, 4, 3], dtype=int32)<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; label_encoder.inverse_transform(label_encoder.transform(labels))<\/pre>\n<p><strong>array([&#8216;Lloyd&#8217;, &#8216;Syska&#8217;, &#8216;Philips&#8217;], dtype='&lt;U8&#8242;)<\/strong><\/p>\n<h3><strong>4. Analyzing Data in Python Machine Learning<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">Assuming that you have loaded your dataset using pandas (which, if you haven\u2019t, refer to the <\/span><strong><a href=\"https:\/\/data-flair.training\/blogs\/pandas-tutorial\/\" target=\"_blank\" rel=\"noopener\">Python Pandas Tutorial<\/a><\/strong><span style=\"font-weight: 400\"> to learn how), let\u2019s find out more about our data.<\/span><\/p>\n<h4><strong>a. Describing the dataset using ML<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Using the method describe(), we can find out parameters like count, mean, std, and max.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.describe()<\/pre>\n<div id=\"attachment_23667\" style=\"width: 637px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/describe-2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23667\" class=\"wp-image-23667 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/describe-2.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"627\" height=\"192\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/describe-2.png 627w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/describe-2-150x46.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/describe-2-300x92.png 300w\" sizes=\"auto, (max-width: 627px) 100vw, 627px\" \/><\/a><p id=\"caption-attachment-23667\" class=\"wp-caption-text\">Analyzing Data in Python Machine Learning<\/p><\/div>\n<h4><strong>b. Shape of the dataset<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Simply, the shape tuple will give us the dimensions of the dataset.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.shape<\/pre>\n<p><strong>(1599, 12)<\/strong><\/p>\n<h4><span style=\"font-weight: 400\">c. Extracting data from the dataset<\/span><\/h4>\n<p><span style=\"font-weight: 400\">Now, if we want only the first ten rows from the dataset, we can call the head() method on it. To this, we can pass it the argument 10.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.head(10)<\/pre>\n<div id=\"attachment_23668\" style=\"width: 522px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/extract.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23668\" class=\"wp-image-23668 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/extract.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"512\" height=\"228\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/extract.png 512w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/extract-150x67.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/extract-300x134.png 300w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><\/a><p id=\"caption-attachment-23668\" class=\"wp-caption-text\">Extracting data from the dataset &#8211; Data Analyzing<\/p><\/div>\n<h4>d. Performing operations around a variable<\/h4>\n<p><span style=\"font-weight: 400\">We can perform certain operations on a variable. For instance, here, we demonstrate how to group data on a variable. For this, we use the groupby() function.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-numpy-tutorial\/\" target=\"_blank\" rel=\"noopener\">Have a look at Python NumPy<\/a><\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.groupby('quality').size()<\/pre>\n<div id=\"attachment_23669\" style=\"width: 277px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/operations.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23669\" class=\"wp-image-23669 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/operations.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"267\" height=\"146\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/operations.png 267w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/operations-150x82.png 150w\" sizes=\"auto, (max-width: 267px) 100vw, 267px\" \/><\/a><p id=\"caption-attachment-23669\" class=\"wp-caption-text\">Performing operations around a variable in data Analyzing<\/p><\/div>\n<h3><strong>Visualizing Data-Univariate Plots in Python Machine Learning<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">Finally, when we want to visualize data as plots and charts to learn more about it, we can use pandas with Matplotlib. We will discuss two kinds of plots- univariate and multivariate.<\/span><br \/>\n<span style=\"font-weight: 400\">A univariate plot suggests we\u2019re only examining one variable.<\/span><\/p>\n<h4><strong>a. Histograms<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Since<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-histogram-python-bar\/\"> histograms<\/a><\/strong> group data into bins and give us an idea of how many observations each bin holds, this is a good way to visualize data for ML. The shapes of the bins tell us whether an attribute is Gaussian, skewed, or has an exponential distribution. It also hints at outliers.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; import matplotlib.pyplot as plt\r\n&gt;&gt;&gt; df.hist()\r\n&gt;&gt;&gt; plt.show()<\/pre>\n<div id=\"attachment_23670\" style=\"width: 1205px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23670\" class=\"wp-image-23670 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"1195\" height=\"673\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram.png 1195w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram-150x84.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram-300x169.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram-768x433.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/histogram-1024x577.png 1024w\" sizes=\"auto, (max-width: 1195px) 100vw, 1195px\" \/><\/a><p id=\"caption-attachment-23670\" class=\"wp-caption-text\">Histograms in Python Machine Learning<\/p><\/div>\n<p><span style=\"font-weight: 400\">Perhaps the attributes \u2018total sulfur dioxide\u2019, \u2018free sulfur dioxide\u2019, and \u2018residual sugar\u2019 have an exponential distribution. Attributes \u2018density\u2019, \u2018pH\u2019, \u2018fixed acidity\u2019, and \u2018Volatile acidity\u2019 have Gaussian or nearly Gaussian distributions.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-descriptive-statistics\/\" target=\"_blank\" rel=\"noopener\">Let&#8217;s discuss Python Descriptive Statistics<\/a><\/strong><\/p>\n<h4><strong>b. Density Plots<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">A density plot appears to be an abstracted histogram. Each bin has a smooth curve drawn through its top. Your eyes can rest now.<\/span><\/p>\n<p><strong>Features of Density Plots:<\/strong><\/p>\n<ul>\n<li><strong>Smooth curves:<\/strong> It uses a continuous curve to show the shape of the data.<\/li>\n<li><strong>Y-axis meaning:<\/strong> The Y-axis represents the probability density. The whole area under the curve represents 100% of the data.<\/li>\n<li><strong>Bandwidth control:<\/strong> A small bandwidth makes a detailed plot, whereas a large bandwidth can make the plot smoother.<\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.plot(kind='density',subplots=True,sharex=False)\r\n&gt;&gt;&gt; plt.show()<\/pre>\n<div id=\"attachment_23671\" style=\"width: 1375px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23671\" class=\"wp-image-23671 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"1365\" height=\"483\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1.png 1365w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1-150x53.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1-300x106.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1-768x272.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density1-1024x362.png 1024w\" sizes=\"auto, (max-width: 1365px) 100vw, 1365px\" \/><\/a><p id=\"caption-attachment-23671\" class=\"wp-caption-text\">Density Plots in Python Machine Learning<\/p><\/div>\n<div id=\"attachment_23672\" style=\"width: 1193px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23672\" class=\"wp-image-23672 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"1183\" height=\"673\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2.png 1183w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2-150x85.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2-300x171.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2-768x437.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/density2-1024x583.png 1024w\" sizes=\"auto, (max-width: 1183px) 100vw, 1183px\" \/><\/a><p id=\"caption-attachment-23672\" class=\"wp-caption-text\">Density Plot in Python Machine Learning<\/p><\/div>\n<p><span style=\"font-weight: 400\">This gives us a clearer idea.<\/span><\/p>\n<h4><strong>c. Box and Whisker Plots<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">A box plot summarizes how each attribute is distributed. It also draws a line for the median and a box around the 25th and 75th percentiles. Whiskers tell us how the data is spread, and the dots outside the whiskers give candidate outlier values. Let\u2019s plot this for our dataset.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; df.plot(kind='box',subplots=True,sharex=False,sharey=False)<\/pre>\n<p><strong>fixed acidity \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.125,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>volatile acidity \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.190493,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>citric acid \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.255986,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>residual sugar \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.321479,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>chlorides \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.386972,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>free sulfur dioxide \u00a0\u00a0\u00a0\u00a0AxesSubplot(0.452465,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>total sulfur dioxide \u00a0\u00a0\u00a0AxesSubplot(0.517958,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>density \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.583451,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>pH \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.648944,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>sulphates \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.714437,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>alcohol \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.77993,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>quality \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AxesSubplot(0.845423,0.11;0.0545775&#215;0.77)<\/strong><br \/>\n<strong>dtype: object<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; plt.show()<\/pre>\n<div id=\"attachment_23673\" style=\"width: 1375px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23673\" class=\"wp-image-23673 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"1365\" height=\"569\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box.png 1365w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box-150x63.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box-300x125.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box-768x320.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/box-1024x427.png 1024w\" sizes=\"auto, (max-width: 1365px) 100vw, 1365px\" \/><\/a><p id=\"caption-attachment-23673\" class=\"wp-caption-text\">Box and Whisker Plots in Python Machine Learning<\/p><\/div>\n<p><span style=\"font-weight: 400\">Here, attributes like \u2018total sulfur dioxide\u2019, \u2018sulphates\u2019, and \u2018residual sugar\u2019 appear skewed toward smaller values.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-machine-learning-techniques\/\" target=\"_blank\" rel=\"noopener\">Do you know about Python\u00a0Machine Learning Techniques<\/a><\/strong><\/p>\n<h3><strong>Visualizing Data: Multivariate Plots in Python Machine Learning<\/strong><\/h3>\n<p><span style=\"font-weight: 400\">A multivariate analysis examines more than two variables. For two variables, we call it bivariate.<\/span><\/p>\n<h4><strong>a. Correlation Matrix Plot<\/strong><\/h4>\n<p><span style=\"font-weight: 400\">Such a plot denotes how changes between two variables relate. Two variables that change in the same direction are positively correlated. A change in opposite directions implies a negative correlation. Let\u2019s plot a correlation matrix.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; correlations=df.corr()\r\n&gt;&gt;&gt; fig=plt.figure()\r\n&gt;&gt;&gt; ax=fig.add_subplot(111)\r\n&gt;&gt;&gt; cax=ax.matshow(correlations,vmin=-1,vmax=1)\r\n&gt;&gt;&gt; fig.colorbar(cax)<\/pre>\n<p><strong>&lt;matplotlib.colorbar.Colorbar object at 0x086D7A30&gt;<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; ticks=numpy.arange(0,9,1)\r\n&gt;&gt;&gt; ax.set_xticks(ticks)\r\n&gt;&gt;&gt; ax.set_yticks(ticks)\r\n&gt;&gt;&gt; plt.show()<\/pre>\n<div id=\"attachment_23675\" style=\"width: 538px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/correlation-matrix-plot.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23675\" class=\"wp-image-23675 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/correlation-matrix-plot.png\" alt=\"Data Preprocessing in Python Machine Learning\" width=\"528\" height=\"453\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/correlation-matrix-plot.png 528w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/correlation-matrix-plot-150x129.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/correlation-matrix-plot-300x257.png 300w\" sizes=\"auto, (max-width: 528px) 100vw, 528px\" \/><\/a><p id=\"caption-attachment-23675\" class=\"wp-caption-text\">Correlation Matrix Plot in Visualizing Data<\/p><\/div>\n<p><span style=\"font-weight: 400\">This matrix is symmetrical around the left diagonal.<\/span><\/p>\n<h4><strong>b. Scatterplot Matrix<\/strong><\/h4>\n<p><span style=\"font-weight: 400\"><strong><a href=\"https:\/\/data-flair.training\/blogs\/python-scatter-plot\/\" target=\"_blank\" rel=\"noopener\">Scatterplot <\/a><\/strong>matrices depict how two variables relate as dots in two dimensions. Plotting all scatterplots for a dataset together in one place results in a scatterplot matrix. These plots can spot structured relationships between variables. Let\u2019s take an example.<\/span><br \/>\n<strong><a href=\"https:\/\/data-flair.training\/blogs\/python-compilers\/\" target=\"_blank\" rel=\"noopener\">Let&#8217;s learn about Python Compilers<\/a><\/strong><\/p>\n<pre class=\"EnlighterJSRAW\">&gt;&gt;&gt; pandas.plotting.scatter_matrix(df)<\/pre>\n<div id=\"attachment_23678\" style=\"width: 1367px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-23678\" class=\"wp-image-23678 size-full\" src=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1.png\" alt=\"Data Preprocessing in Python machine Learning\" width=\"1357\" height=\"663\" srcset=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1.png 1357w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1-150x73.png 150w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1-300x147.png 300w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1-768x375.png 768w, https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/scatterplot-matrix-1-1024x500.png 1024w\" sizes=\"auto, (max-width: 1357px) 100vw, 1357px\" \/><\/a><p id=\"caption-attachment-23678\" class=\"wp-caption-text\">Scatterplot Matrix in Data Visualization<\/p><\/div>\n<p><span style=\"font-weight: 400\">This is symmetrical too. The left diagonal has histograms of the attributes because it doesn\u2019t make much sense to plot an attribute\u2019s scatterplot with itself.<\/span><\/p>\n<p>So, this was all in Python Machine Learning: Data Preprocessing, Visualizing, and Analyzing. Hope you like our explanation<\/p>\n<h3><strong>Conclusion<\/strong><\/h3>\n<p>Before we build any machine learning model, we must clean and prepare the data. This is called data preprocessing. In real life, data is messy. It may have missing values, wrong values, or extra information. Python helps clean this data easily using libraries like Pandas and NumPy. We can fill missing values, drop unwanted columns, and convert words into numbers.<\/p>\n<p><span style=\"font-weight: 400\">Hence, in this Python Machine Learning Tutorial, we discussed Machine Learning with Python data Preprocessing. Also, we discussed the Data Analysis and Data Visualization for Python Machine Learning. We saw rescaling, normalizing, binarizing, and standardizing the data in Python machine Learning Data Preprocessing. Still, if you have any doubt regarding Data Preprocessing, ask in the comment tab.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today in this Python Machine Learning Tutorial, we will discuss Data Preprocessing, Analysis &amp; Visualization. Moreover, in this Data Preprocessing in Python machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the&#46;&#46;&#46;<\/p>\n","protected":false},"author":5,"featured_media":23703,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36,46],"tags":[3279,3401,3501,8191,10454,16788,15687],"class_list":["post-23659","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-python","tag-data-analysis-for-python","tag-data-preprocessing-oin-python","tag-data-visualization-in-python","tag-learning-python-data-visualization","tag-python-data-analysis","tag-python-data-preprocessing-techniques","tag-what-is-data-preprocessing"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning - DataFlair<\/title>\n<meta name=\"description\" content=\"In this Data Preprocessing in machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the data. Let&#039;s learn\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning - DataFlair\" \/>\n<meta property=\"og:description\" content=\"In this Data Preprocessing in machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the data. Let&#039;s learn\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-05T04:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-28T06:28:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning - DataFlair","description":"In this Data Preprocessing in machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the data. Let's learn","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/","og_locale":"en_US","og_type":"article","og_title":"Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning - DataFlair","og_description":"In this Data Preprocessing in machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the data. Let's learn","og_url":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-08-05T04:00:00+00:00","article_modified_time":"2026-04-28T06:28:25+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823"},"headline":"Data Preprocessing, Analysis &amp; Visualization &#8211; Python Machine Learning","datePublished":"2018-08-05T04:00:00+00:00","dateModified":"2026-04-28T06:28:25+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/"},"wordCount":1663,"commentCount":4,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg","keywords":["data analysis for Python","Data Preprocessing oin Python","Data Visualization in Python","learning python data Visualization","Python Data Analysis","Python Data Preprocessing Techniques","what is Data Preprocessing"],"articleSection":["Machine Learning Tutorials","Python Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/","url":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/","name":"Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg","datePublished":"2018-08-05T04:00:00+00:00","dateModified":"2026-04-28T06:28:25+00:00","description":"In this Data Preprocessing in machine learning, we will look at rescaling, standardizing, normalizing, and binarizing the data. Let's learn","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/07\/Python-Machine-Learning-01.jpg","width":1200,"height":628,"caption":"Data Preprocessing, Analysis &amp; Visualization - Python Machine Learning"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/python-ml-data-preprocessing\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Machine Learning Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/machine-learning\/"},{"@type":"ListItem","position":3,"name":"Data Preprocessing, Analysis &amp; Visualization &#8211; Python Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/7f83c342f5d1632d6f7b4b0b0f447823","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4cf3a74600d131330b8c481d519afd1574093ed89f6d3396a95393ad223eb7cd?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"DataFlair Team creates expert-level guides on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our goal is to empower learners with easy-to-understand content. Explore our resources for career growth and practical learning.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam1\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/23659","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=23659"}],"version-history":[{"count":10,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/23659\/revisions"}],"predecessor-version":[{"id":147986,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/23659\/revisions\/147986"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/23703"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=23659"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=23659"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=23659"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}