{"id":7460,"date":"2018-02-08T04:11:00","date_gmt":"2018-02-08T04:11:00","guid":{"rendered":"https:\/\/data-flair.training\/blogs\/?p=7460"},"modified":"2021-05-28T14:31:23","modified_gmt":"2021-05-28T09:01:23","slug":"data-mining-terminologies","status":"publish","type":"post","link":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/","title":{"rendered":"Data Mining Terminologies and Predictive Analytics Terms"},"content":{"rendered":"<p>In this <strong>Data Mining Tutorial<\/strong>, we will study Data Mining Terminologies. We will cover each and every Data Mining Terminologies related to every domain. Moreover, we will discuss some predictive analytics terms used in Data Mining.<\/p>\n<p>So, let&#8217;s start Data Mining Terminologies.<\/p>\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Data Mining Terminologies<\/h3>\n<p>Let&#8217;s begin data mining terminologies:<\/p>\n<p><strong>i. Data Mining<\/strong><\/p>\n<p>We use data mining to extract information from a huge set of data. Also, we can use this information for any of the following applications \u2212<\/p>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Market Analysis<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Fraud Detection<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Customer Retention<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Production Control<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Science Exploration<\/li>\n<\/ul>\n<p><strong>ii. Data Mining Engine<\/strong><\/p>\n<p>It is very important to the data mining system. Also, it consists of too many set of function modules. They perform the following functions.<\/p>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Characterization<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Association and Correlation Analysis<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Classification<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Prediction<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Cluster analysis<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Outlier analysis<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Evolution analysis<\/li>\n<\/ul>\n<p><strong>iii. Knowledge Base<\/strong><\/p>\n<p>We can say this is domain knowledge. We use this to guide the search.<\/p>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Knowledge Discovery<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Cleaning of data<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Data Integration<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Selection of data<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Transformation of data<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Data Mining<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Pattern Evaluation<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Knowledge Presentation<\/li>\n<\/ul>\n<p><strong>iv. User Interface<\/strong><\/p>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Interact with the system by specifying a data mining query task.<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Providing information to help focus the search.<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Mining based on the intermediate data mining results.<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Browse database and data warehouse schemas or data structures.<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\"><span class=\"complexword\">Evaluate<\/span> mined patterns.<\/li>\n<\/ul>\n<ul>\n<li class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Visualize the patterns in different forms.<\/li>\n<\/ul>\n<p><strong>v. Data Integration<\/strong><\/p>\n<p>It is a data pre-processing technique. We use it to merge the data from <span class=\"complexword\">multiple<\/span> heterogeneous data sources into a coherent data store. Also, it involves inconsistent data and <span class=\"complexword\">therefore<\/span> needs data cleaning.<\/p>\n<p><strong>vi. Associations<\/strong><\/p>\n<p>An association is a type of algorithm. We use it to create rules that describe how often events have occurred together.<\/p>\n<p><strong>vii. Backpropagation<\/strong><\/p>\n<p>It is a type of training method. Also, we use it to calculate the weights in a neural net from the data.<\/p>\n<p><strong>viii. Binning<\/strong><\/p>\n<p>It is a type of data preparation activity. As we use data mining to convert continuous data to discrete data. Also, to convert it we need to replace a value from a continuous range with a bin identifier.<\/p>\n<p><strong>ix. CART<\/strong><\/p>\n<p>CART refers to Classification And Regression Trees. As in this method, we have to split the independent variables into small groups. And, fitting a constant function to the small data sets. Although, the constant function is one that takes on a finite small set of values. While in regression trees, the mean value of the response is fit to small connected data sets.<\/p>\n<p><strong>x. Categorical data<\/strong><\/p>\n<p>Generally, categorical data fits into a small number of discrete categories. Also, Categorical data <span class=\"passivevoice\">is defined<\/span> in a particular way. That is either non-ordered such as gender or city, or ordered such as high, medium, or low temperatures.<\/p>\n<p><strong>xi. CHAID<\/strong><\/p>\n<p><span class=\"adverb\">Basically<\/span>, it\u2019s an algorithm. That we use for fitting categorical trees. Also, it relies on the chi-squared statistic to split the data into small connected data sets.<\/p>\n<p><strong>xii. Chi-squared<\/strong><\/p>\n<p><span class=\"passivevoice\">Chi-Square is defined<\/span> as a statistic assessment that defines how well a model fits the data. Also, we use it in data mining to find homogeneous subsets for fitting categorical trees as in CHAID.<\/p>\n<p><strong>xiii. Classification<\/strong><\/p>\n<p>It refers to the data mining problem. Also, we have to predict the category of categorical data by building a model. That model must base on some predictor variables.<\/p>\n<p><strong>xiv. Classification tree<\/strong><\/p>\n<p>A decision tree that places categorical variables into classes.<\/p>\n<p><strong>xv. Cleaning (cleansing)<\/strong><\/p>\n<p>It is a process of preparing data for a data mining activity. Obvious data errors <span class=\"passivevoice\">are detected<\/span> and corrected and missing data <span class=\"passivevoice\">is replaced<\/span>.<\/p>\n<p><strong>xvi. Confusion matrix<\/strong><\/p>\n<p>We use this matrix to show the count of the actual versus predicted class values. It shows not only how well the model predicts but also presents the details needed to see exactly where things may have gone wrong.<\/p>\n<p><strong>xvii. Consequent<\/strong><\/p>\n<p>Whenever an association between two variables <span class=\"passivevoice\">is defined, t<\/span>he second item <span class=\"passivevoice\">is called<\/span> the consequent.<\/p>\n<p><strong>xviii. Continuous<\/strong><\/p>\n<p>Continuous data can have any value in an interval of real numbers. That is, the value does not have to be an integer. Continuous is the opposite of discrete or categorical.<\/p>\n<p><strong>xix. Cross-validation<\/strong><\/p>\n<p>A method of estimating the accuracy of a classification or regression model. The data set <span class=\"passivevoice\">is divided<\/span> into several parts, with each part in turn used to test a model fitted to the remaining parts.<\/p>\n<p><strong>xx. Data<\/strong><\/p>\n<p>Data <span class=\"passivevoice\">is defined<\/span> as facts, transactions, and figures.<\/p>\n<p><strong>xxi. DBMS<\/strong><\/p>\n<p>It refers to database management systems.<\/p>\n<p><strong>xxii. Data format<\/strong><\/p>\n<p>Data items can exist in many formats such as text, integer, and floating-point decimal. The form of the data in the database is data format.<\/p>\n<p><strong>xxiii. Decision Tree<\/strong><\/p>\n<p>We use it to represent a collection of hierarchical rules that lead to a class or value.<\/p>\n<p><strong>xxiv. Data Mining method<\/strong><\/p>\n<p>In this, procedure and algorithms <span class=\"passivevoice\">are designed<\/span> to analyze the data in databases.<\/p>\n<p><strong>xxv. Deduction<\/strong><\/p>\n<p>Deduction infers information that is a logical consequence of the data.<\/p>\n<p><strong>xxvi. Degree of fit<\/strong><\/p>\n<p>A measure of how <span class=\"adverb\">closely<\/span> the model fits the training data. A common measure is the r-square.<\/p>\n<p><strong>xxvii. Dependent Variable<\/strong><\/p>\n<p>These are the variables of the model. These need to <span class=\"passivevoice\">be predicted by<\/span> the equation of the model using the independent variables.<\/p>\n<p><strong>xxviii. Deployment<\/strong><\/p>\n<p>Once the model <span class=\"passivevoice\">is trained<\/span> and validated, then we use it to analyze new data and make predictions. Hence, the use of the model <span class=\"passivevoice\">is called<\/span> deployment.<\/p>\n<p><strong>xxix. Dimension<\/strong><\/p>\n<p>Each attribute of a case or occurrence in the data <span class=\"passivevoice\">being mined<\/span>. Also, stored as a field in a flat file record or a column of a relational database table.<\/p>\n<p><strong>xxx. Discrete<\/strong><\/p>\n<p>A data item that has a finite set of values. Discrete is the opposite of continuous.<\/p>\n<p><strong>xxxi.Discriminant analysis<\/strong><\/p>\n<p>It a type of statistical method that <span class=\"passivevoice\">is based<\/span> on the <span class=\"complexword\">maximum<\/span> likelihood for determining boundaries. Boundaries must separate the data into categories.<\/p>\n<p><strong>xxxii. Entropy<\/strong><\/p>\n<p>A way to measure variability other than the variance statistic. Some decision trees split the data into groups based on <span class=\"complexword\">minimum<\/span> entropy.<\/p>\n<p><strong>xxxiii. Exploratory Analysis<\/strong><\/p>\n<p>Looking at data to discover relationships not <span class=\"complexword\">previously<\/span> detected. Exploratory analysis tools <span class=\"adverb\">typically<\/span> assist the user in creating tables and graphical displays.<\/p>\n<p><strong>xxxiv. External Data<\/strong><\/p>\n<p>In this, data is not collected by the organization. Such as data available from a reference book, a government source.<\/p>\n<p><strong>xxxv.Feed-forward<\/strong><\/p>\n<p>A neural net in which the signals only flow in one direction, from the inputs to the outputs.<\/p>\n<p><strong>xxxvi. Fuzzy Logic<\/strong><\/p>\n<p>Fuzzy logic <span class=\"passivevoice\">is applied<\/span> to fuzzy sets where membership in a fuzzy set is a probability, not <span class=\"adverb\">necessarily<\/span> 0 or 1. Non-fuzzy logic manipulates outcomes that are either true or false. Fuzzy logic needs to be able to manipulate degrees of \u201c<span class=\"qualifier\">maybe<\/span>\u201d <span class=\"complexword\">in addition<\/span> to true and false.<\/p>\n<p><strong>xxxvii. Genetic Algorithms<\/strong><\/p>\n<p>A computer-based method of generating and testing combinations of possible input parameters. That need to find the optimal output. It uses processes based on natural evolution concepts. Such as genetic combination, mutation, and natural selection.<\/p>\n<p><strong>xxxviii. GUI<\/strong><\/p>\n<p>Graphical User Interface.<\/p>\n<p><strong>xxxix. Independent variable<\/strong><\/p>\n<p>These variables of a model are the variables used in the equation. That need to predict the output variable.<\/p>\n<p><strong>xl. Induction<\/strong><\/p>\n<p>A technique that infers generalizations from the information in the data.<\/p>\n<p><strong>xli. Interaction<\/strong><\/p>\n<p>It occurs only when two independent variables interact. Whenever changes in the value of one change the effect on the dependent variable of the other.<\/p>\n<p><strong>xlii. Internal data<\/strong><\/p>\n<p>Data collected by an organization such as operating and customer data.<\/p>\n<p><strong>xliii. k-nearest neighbor<\/strong><\/p>\n<p>In this, a classification method is present that classifies a point by calculating the distances between the points. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).<\/p>\n<p><strong>xliv. Kohonen Feature Map<\/strong><\/p>\n<p>A type of neural network that uses unsupervised learning to find patterns in data. In data mining, it <span class=\"passivevoice\">is employed<\/span> for cluster analysis.<\/p>\n<p><strong>xlv. Layer<\/strong><\/p>\n<p><span class=\"adverb\">Basically<\/span>, nodes in a neural net are usually grouped into layers. Also, with each layer described as input, output or hidden. There are as many input nodes as there is input variables and as many output nodes as there is output variables. <span class=\"adverb\">Typically<\/span>, there are one or two hidden layers.<\/p>\n<p><strong>xlvi. Leaf<\/strong><\/p>\n<p>A node not further split \u2014 the terminal grouping \u2014 in classification or decision tree.<\/p>\n<p><strong>xlvii. Learning<\/strong><\/p>\n<p>Training models (estimating their parameters) based on existing data.<\/p>\n<p><strong>xlviii. Least Squares<\/strong><\/p>\n<p>It is the most common method of training the weights of a model. For this, we need to choose the weights that must <span class=\"complexword\">minimize<\/span> the sum of the squared deviation of the predicted values of the model. That is from the observed values of the data.<\/p>\n<p><strong>xlix. MARS<\/strong><\/p>\n<p>Multivariate Adaptive Regression Splines. MARS is a generalization of a decision tree.<\/p>\n<p><strong><span class=\"complexword\">l. Maximum<\/span> likelihood<\/strong><\/p>\n<p>Another training or estimation method. This estimate of a parameter is the value of a parameter that need to maximizes the probability of the data. That the data came from the population defined by the parameter.<\/p>\n<p><strong>li. Mean<\/strong><\/p>\n<p>The arithmetic average value of a collection of numeric data.<\/p>\n<p><strong>lii. Median<\/strong><\/p>\n<p>The value in the middle of a collection of ordered data. In other words, the value with the same number of items above and below it.<\/p>\n<p><strong>liii. Missing data<\/strong><\/p>\n<p>Data values can be missing because they were not measured, not answered, were unknown or <span class=\"passivevoice\">were lost<\/span>. Data mining methods vary in the way they treat missing values.<\/p>\n<p><strong>liv. Mode<\/strong><\/p>\n<p>The most common value in a data set. If more than one value occurs the same number of times, the data is multi-model.<\/p>\n<p><strong>lv. Node<\/strong><\/p>\n<p>A decision point in a classification tree. Also, a point in a neural net that needs to combine input from other nodes. Further, produce an output through an application of an activation function.<\/p>\n<p><strong>lvi. Noise<\/strong><\/p>\n<p>The difference between a model and its predictions. Sometimes data <span class=\"passivevoice\">is referred<\/span> to as noisy as when it contains errors. Such as many missing or incorrect values or when there are extraneous columns.<\/p>\n<p><strong>lvii. Non-applicable Data<\/strong><\/p>\n<p>Missing values that would be <span class=\"adverb\">logically<\/span> impossible are <span class=\"adverb\">obviously<\/span> not relevant.<\/p>\n<p><strong>lviii. Normalize<\/strong><\/p>\n<p>We can say it is a collection of numeric data that need to be normalized by subtracting the <span class=\"complexword\">minimum<\/span> value from all values. And then dividing by the range of the data. This yields data with a <span class=\"adverb\">similarly<\/span> shaped histogram but with all values between 0 and 1. It is useful to do this for all inputs into neural nets and also for inputs into other regression models.<\/p>\n<p><strong>lix. OLAP<\/strong><\/p>\n<p>On-Line Analytical Processing tools give the user the capability<span class=\"complexword\">\u00a0to<\/span> perform multi-dimensional analysis of the data.<\/p>\n<p><strong>lx. Optimization Criterion<\/strong><\/p>\n<p>A positive function of the difference between predictions and data estimates that <span class=\"passivevoice\">are chosen<\/span> so <span class=\"complexword\">as to<\/span> optimize the function or criterion. The least squares and <span class=\"complexword\">maximum<\/span> likelihood are examples.<\/p>\n<p><strong>lxi. Outliers<\/strong><\/p>\n<p>Generally, outliers are data items that did not come from the assumed population of data.<\/p>\n<p><strong>lxii. Overfitting<\/strong><\/p>\n<p>A tendency of some modeling techniques that need to assign importance to random variations in the data. That is by declaring them important patterns.<\/p>\n<p><strong>lxiii. Overlay<\/strong><\/p>\n<p>Data not collected by the organization. Such as data from a proprietary database, that <span class=\"passivevoice\">is combined<\/span> with the organization\u2019s own data.<\/p>\n<p><strong>lxiv. Parallel processing<\/strong><\/p>\n<p>Several computers or CPUs linked together so that each can be computing <span class=\"adverb\">simultaneously<\/span>.<\/p>\n<p><strong>lxv. Prevalence<\/strong><\/p>\n<p>The measure of how often the collection of items in an association occur together. That <span class=\"complexword\">in terms of<\/span> a percentage of all the transactions.<\/p>\n<p><strong>lxvi. Pruning<\/strong><\/p>\n<p>Eliminating lower-level splits in a decision tree. Also, we use this term to describe algorithms. As that adjusts the topology of a neural net by removing (i.e., pruning) hidden nodes.<\/p>\n<p><strong>lxvii. Range<\/strong><\/p>\n<p>The range of the data is the difference between the <span class=\"complexword\">maximum<\/span> value and the <span class=\"complexword\">minimum<\/span> value. <span class=\"complexword\">Alternatively<\/span>, a range can include the <span class=\"complexword\">minimum<\/span> and <span class=\"complexword\">maximum<\/span>, as in \u201cThe value ranges from 2 to 8.\u201d<\/p>\n<p><strong>lxviii. RDBMS<\/strong><\/p>\n<p>Relational Database Management System.<\/p>\n<p><strong>lxix. Regression Tree<\/strong><\/p>\n<p>A decision tree that predicts values of continuous variables.<\/p>\n<p><strong>lxx. Resubstitution Error<\/strong><\/p>\n<p>The estimate of error based on the differences between the predicted values. And the observed values in the training set.<\/p>\n<p><strong>lxxi. Right-hand side<\/strong><\/p>\n<p>Whenever we need to define an association between two variables, the second item <span class=\"passivevoice\">is <\/span>the right-hand side.<\/p>\n<p><strong>lxxii. R-squared<\/strong><\/p>\n<p>A number between 0 and 1 that measures how well a model fits its training data. One is a perfect fit; <span class=\"complexword\">however<\/span>, zero implies the model has no predictive ability. We compute it\u00a0 as the covariance between the predicted and observed values that <span class=\"passivevoice\">was divided by<\/span> the standard deviations of the predicted and observed values.<\/p>\n<p><strong>lxxiii. Sampling<\/strong><\/p>\n<p>Creating a subset of data from the whole. Random sampling attempts to represent the whole by choosing the sample through a random mechanism.<\/p>\n<p><strong>lxxiv. Sensitivity Analysis<\/strong><\/p>\n<p>Varying the parameters of a model to assess the change in its output.<\/p>\n<p><strong>lxxv. Sequence Discovery<\/strong><\/p>\n<p>The same as an association, except that we consider here the time sequence of events also. For example, \u201cTwenty percent of the people who buy a VCR buy a camcorder within four months.\u201d<\/p>\n<p><strong>lxxvi. SMP<\/strong><\/p>\n<p>Symmetric multi-processing is a computer configuration where many CPUs share a common operating system, main memory, and disks. They can work on different parts of a problem at the same time.<\/p>\n<p><strong>lxxvii. Standardize<\/strong><\/p>\n<p>The collection of techniques where analysis uses a well-defined (known) dependent variable. All regression and classification techniques <span class=\"passivevoice\">are supervised<\/span>.<\/p>\n<p><strong>lxxviii. Support<\/strong><\/p>\n<p>The measure of how often the collection of items in an association occur together is present as a percentage of all the transactions. For example, \u201cIn 2% of the purchases at the hardware store, both a pick and a shovel <span class=\"passivevoice\">were bought<\/span>.\u201d<\/p>\n<p><strong>lxxix. Test data<\/strong><\/p>\n<p>A data set independent of the training data set that we use to fine-tune the estimates of the model parameters (i.e., weights).<\/p>\n<p><strong>lxxx. Test Error<\/strong><\/p>\n<p>The estimate of error based on the difference between the predictions of a model on a test data set and the observed values in the test data set when the test data set was not used to train the model.<\/p>\n<p><strong>lxxxi. Time Series<\/strong><\/p>\n<p>A series of measurements taken at consecutive points in time. Data<\/p>\n<p><strong>lxxxii. Time Series Model<\/strong><\/p>\n<p>It&#8217;s a type of model that forecasts future values of a time series based on past values.<\/p>\n<p><strong>lxxxiii. Topology<\/strong><\/p>\n<p>For a neural net, topology refers to the number of layers and the number of nodes in each layer.<\/p>\n<p><strong>lxxxiv. Training<\/strong><\/p>\n<p>Another term for estimating a model\u2019s parameters based on the data set at hand.<\/p>\n<p><strong>lxxxv. Training data<\/strong><\/p>\n<p>A data set used to estimate or train a model.<\/p>\n<p><strong>lxxxvi. Transformation<\/strong><\/p>\n<p>It <span class=\"passivevoice\">is <\/span>re-expression of the data such as aggregating it, normalizing it, changing its unit of measure.<\/p>\n<p><strong>lxxxvii. Unsupervised Learning<\/strong><\/p>\n<p>We can say it is a group of techniques as in this group, data is <span class=\"passivevoice\">defined<\/span> without the use of a dependent variable.<\/p>\n<p><strong>lxxxviii. Validation<\/strong><\/p>\n<p>The process of testing the models with a data set different from the training dataset.<\/p>\n<p><strong>lxxxix. Variance<\/strong><\/p>\n<p>The most <span class=\"adverb\">commonly<\/span> used statistical measure of dispersion. The first step is to square the deviations of a data item from its average value. Then the average of the squared deviations <span class=\"passivevoice\">needs<\/span> to calculate. Thus, to <span class=\"complexword\">obtain<\/span> an <span class=\"complexword\">overall<\/span> measure of variability.<\/p>\n<p><strong>xc. Visualization<\/strong><\/p>\n<p>Visualization tools <span class=\"adverb\">graphically<\/span> display data to <span class=\"complexword\">facilitate<\/span> a better understanding of its meaning. Graphical capabilities range from simple scatter plots too complex multi-dimensional representations.<\/p>\n<p><strong>xci. Windowing<\/strong><\/p>\n<p>Used when training a model with time series data. A window is the period of time for each training case.<\/p>\n<p>For example:<\/p>\n<p><span class=\"adverb\">Firstly<\/span>, if we are having weekly stock price data. As that data covers fifty weeks. Then we have to set the window to five weeks. Further, the first training case uses weeks one through five and compares its prediction to week six. Moreover, the second case uses weeks two through six to predict week seven, and so on.<\/p>\n<p>So, this was all about Data Mining Terminologies. Hope you like our explanation.<\/p>\n<h3 class=\"public-DraftStyleDefault-block public-DraftStyleDefault-ltr\">Conclusion<\/h3>\n<p>As a result, we have studied Data Mining Terminologies. As these terminologies for data mining will help you to understand each and every small concept related to data mining. Furthermore, if you feel any query feel free to ask in a comment section.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Data Mining Tutorial, we will study Data Mining Terminologies. We will cover each and every Data Mining Terminologies related to every domain. Moreover, we will discuss some predictive analytics terms used in&#46;&#46;&#46;<\/p>\n","protected":false},"author":6,"featured_media":34252,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[3348,3352,3371,14616,15684],"class_list":["post-7460","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-mining","tag-data-mining-definition","tag-data-mining-glossary","tag-data-mining-terminologies","tag-terminologies-for-data-mining","tag-what-is-data-mining"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Mining Terminologies and Predictive Analytics Terms - DataFlair<\/title>\n<meta name=\"description\" content=\"Data Mining terminologies and Predictive Analytics Terms - Learn data mining keywords &amp; meaning like mean, median, mode, outlier, classification\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Mining Terminologies and Predictive Analytics Terms - DataFlair\" \/>\n<meta property=\"og:description\" content=\"Data Mining terminologies and Predictive Analytics Terms - Learn data mining keywords &amp; meaning like mean, median, mode, outlier, classification\" \/>\n<meta property=\"og:url\" content=\"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/\" \/>\n<meta property=\"og:site_name\" content=\"DataFlair\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DataFlairWS\/\" \/>\n<meta property=\"article:published_time\" content=\"2018-02-08T04:11:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-05-28T09:01:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"DataFlair Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:site\" content=\"@DataFlairWS\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"DataFlair Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Mining Terminologies and Predictive Analytics Terms - DataFlair","description":"Data Mining terminologies and Predictive Analytics Terms - Learn data mining keywords & meaning like mean, median, mode, outlier, classification","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/","og_locale":"en_US","og_type":"article","og_title":"Data Mining Terminologies and Predictive Analytics Terms - DataFlair","og_description":"Data Mining terminologies and Predictive Analytics Terms - Learn data mining keywords & meaning like mean, median, mode, outlier, classification","og_url":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/","og_site_name":"DataFlair","article_publisher":"https:\/\/www.facebook.com\/DataFlairWS\/","article_published_time":"2018-02-08T04:11:00+00:00","article_modified_time":"2021-05-28T09:01:23+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg","type":"image\/jpeg"}],"author":"DataFlair Team","twitter_card":"summary_large_image","twitter_creator":"@DataFlairWS","twitter_site":"@DataFlairWS","twitter_misc":{"Written by":"DataFlair Team","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#article","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/"},"author":{"name":"DataFlair Team","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89"},"headline":"Data Mining Terminologies and Predictive Analytics Terms","datePublished":"2018-02-08T04:11:00+00:00","dateModified":"2021-05-28T09:01:23+00:00","mainEntityOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/"},"wordCount":2663,"commentCount":0,"publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg","keywords":["data mining definition","data mining glossary","data mining terminologies","terminologies for data mining","what is data mining"],"articleSection":["Data Mining Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/","url":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/","name":"Data Mining Terminologies and Predictive Analytics Terms - DataFlair","isPartOf":{"@id":"https:\/\/data-flair.training\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#primaryimage"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#primaryimage"},"thumbnailUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg","datePublished":"2018-02-08T04:11:00+00:00","dateModified":"2021-05-28T09:01:23+00:00","description":"Data Mining terminologies and Predictive Analytics Terms - Learn data mining keywords & meaning like mean, median, mode, outlier, classification","breadcrumb":{"@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#primaryimage","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2018\/02\/Introduction-to-Data-Mining-Terminologies-01-1.jpg","width":1200,"height":628,"caption":"Data Mining Terminologies and Predictive Analytics Terms"},{"@type":"BreadcrumbList","@id":"https:\/\/data-flair.training\/blogs\/data-mining-terminologies\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Home","item":"https:\/\/data-flair.training\/blogs\/"},{"@type":"ListItem","position":2,"name":"Data Mining Tutorials","item":"https:\/\/data-flair.training\/blogs\/category\/data-mining\/"},{"@type":"ListItem","position":3,"name":"Data Mining Terminologies and Predictive Analytics Terms"}]},{"@type":"WebSite","@id":"https:\/\/data-flair.training\/blogs\/#website","url":"https:\/\/data-flair.training\/blogs\/","name":"DataFlair","description":"Learn Today. Lead Tomorrow.","publisher":{"@id":"https:\/\/data-flair.training\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/data-flair.training\/blogs\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/data-flair.training\/blogs\/#organization","name":"DataFlair","url":"https:\/\/data-flair.training\/blogs\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/","url":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","contentUrl":"https:\/\/data-flair.training\/blogs\/wp-content\/uploads\/sites\/2\/2016\/07\/Data-Flair.png","width":106,"height":48,"caption":"DataFlair"},"image":{"@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DataFlairWS\/","https:\/\/x.com\/DataFlairWS","https:\/\/www.linkedin.com\/company\/dataflair-web-services-pvt-ltd\/","https:\/\/www.youtube.com\/user\/DataFlairWS"]},{"@type":"Person","@id":"https:\/\/data-flair.training\/blogs\/#\/schema\/person\/2c58ecb4f73a39f0ef993f1ddfcd7b89","name":"DataFlair Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1ce4a0e3e542444fc73bbebf83e89e8b73e2d95ccb1fcee64da9945f078b97c5?s=96&d=mm&r=g","caption":"DataFlair Team"},"description":"The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.","url":"https:\/\/data-flair.training\/blogs\/author\/dfteam2\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/7460","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/comments?post=7460"}],"version-history":[{"count":8,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/7460\/revisions"}],"predecessor-version":[{"id":96234,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/posts\/7460\/revisions\/96234"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media\/34252"}],"wp:attachment":[{"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/media?parent=7460"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/categories?post=7460"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/data-flair.training\/blogs\/wp-json\/wp\/v2\/tags?post=7460"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}