Data Mining and Knowledge Discovery Database(Kdd Process)
Free Machine Learning courses with 130+ real-time projects Start Now!!
In this Article for Data Mining, we will study Data Mining and Knowledge Discovery. Also, will learn Knowledge discovery database and aspects in Data Mining. Further, we will try to cover Issues in data mining, Elements of Data Mining and Knowledge Discovery, and Kdd Process.
So, let’s start Data Mining and Knowledge Discovery Database(Kdd Process).
Data Mining and Knowledge Discovery
What is Knowledge Discovery?
Some people don’t differentiate data mining from knowledge discovery. While others view data mining as an essential step in the process of knowledge discovery. Here is the list of steps involved in the kdd process in data mining −
1. Data Cleaning − Basically in this step, the noise and inconsistent data are removed.
2. Data Integration − Generally, in this step, multiple data sources are combined.
3. Data Selection − Basically, in this step, data relevant to the analysis task are retrieved from the database.
4. Data Transformation −In this step, data is transformed into forms appropriate for mining. Also, by performing summary or aggregation operations.
Technology is evolving rapidly!
Stay updated with DataFlair on WhatsApp!!
5. Data Mining − Generally, In this, intelligent methods are applied in order to extract data patterns.
6. Pattern Evaluation − Basically in this step, data patterns are evaluated.
7. Knowledge Presentation − Generally, in this step, knowledge is represented.
Knowledge Discovery Database In Data Mining
The process of finding and interpreting patterns from data involves the repeated application of the following steps:
Developing an understanding of:
- The application domain
- Relevant prior knowledge
- The goals of the end-user
Creating a target dataset:
Selecting a data set, or focusing on a subset of variables, or data samples, on which discovery is to be performed.
Data cleaning and preprocessing:
- Removal of noise or outliers.
- Collecting necessary information to model or account for noise.
- Strategies for handling missing data fields.
- Accounting for time sequence information and known changes.
Data reduction and projection:
- Finding useful features to represent the data depending on the goal of the task.
- Using dimensionality reduction methods to reduce the effective number of variables. That is under consideration or to find invariant representations for the data.
Choosing the data mining task:
- Deciding whether the goal of the KDD process is classification, regression, clustering, etc.
Choosing the data mining algorithm(s):
- Selecting method(s) to be used for searching for patterns in the data.
- Deciding which models and parameters may be appropriate.
- Matching a particular data mining method with the criteria of the KDD process.
Data Mining:
- Searching for patterns of interest in a particular representational form. Such representations as classification rules or trees, regression, clustering, and so forth.
- Interpreting mined patterns.
- Consolidating discovered knowledge.Â
Aspects Of Data Mining
a. Data Integration
First of all the data is collected and integrated from all the different sources.
b. Data Selection
Generally, we may not all the data we have collected in the first step. Also, in this step, we select only those data which we think useful for data mining.
c. Data Cleaning
Generally, the data we have collected is not clean and may contain errors, missing values, noisy or inconsistent data. Therefore we need to apply different techniques to get rid of such anomalies.
d. Data Transformation
Basically, the data even after cleaning is not ready for mining. Also, we need to transform them into forms appropriate for mining. Thus, the techniques used to do this are smoothing, aggregation, normalization etc.
e. Data Mining
As now in this step, we are ready to apply data mining techniques on the data. Basically, it is to discover the interesting patterns. Hence, clustering and association analysis are among the many different techniques present. Also, as we used for data mining.
f. Pattern Evaluation and Knowledge Presentation
Generally, this step includes visualization, transformation, removing redundant patterns from the patterns we generated.
g. Decisions / Use of Discovered Knowledge
As this step is beneficial to us. Also, it helps to use the knowledge acquired to take better decisions.
Issues In Data Mining
Issues that need to be addressed by any serious data mining package are:
i. Uncertainty Handling
ii. Dealing with Missing Values
ii. Dealing with Noisy data
iv. The efficiency of algorithms
v. Constraining Knowledge Discovered to only Useful
vi. Incorporating Domain Knowledge
vii. Size and Complexity of Data
viii. Data Selection
ix. Understandability of Discovered Knowledge: Consistency between Data and Discovered Knowledge
Five Major Elements of Data Mining
i. Extract, transform and load transaction data onto the data warehouse system.
ii. Basically, it stores and manages the data in a multidimensional database system.
iii. Generally, provide data access to business analysts and information technology professionals.
iv. As basically it analyzes the data by application software.
v. Basically, it shows the data in a useful format, such as a graph or table.
Levels Of Analysis in Data Mining
Following are levels of analysis in Data Mining And Knowledge Discovery Databases.
a. Artificial Neural Networks
Basically, there are present non-linear predictive models. That learn through training and resemble biological neural networks in structure.
b. Genetic Algorithms
Generally, there are optimization techniques that use this process. Such as genetic combination, mutation. Also, natural selection in a design based on the concepts of natural evolution.
c. Nearest Neighbor Method
Basically, it is a technique that classifies each record in a dataset. Also, as it is based on a combination of the classes of the k record(s) most similar to it in a historical data set.
d. Rule Induction
Generally, the extraction of useful if-then rules from data based on statistical significance.
e. Data Visualization
Basically, the visual interpretation of complex relationships in multidimensional data. Also, we use graphics tools to illustrate data relationships.
Conclusion
As a result, we have studied Data Mining and Knowledge Discovery. Also, learned Aspects of Data Mining and knowledge discovery, Issues in data mining, Elements of Data Mining and Knowledge Discovery, and Kdd Process. etc. As this, all should help you to understand Knowledge Discovery in Data Mining. Furthermore, if you have any query, feel free to ask in a comment section.
Did you know we work 24x7 to provide you best tutorials
Please encourage us - write a review on Google