Data Mining and Knowledge Discovery Database(Kdd Process)

Free Machine Learning courses with 130+ real-time projects Start Now!!

In this Article for Data Mining, we will study Data Mining and Knowledge Discovery. Also, will learn Knowledge discovery database and aspects in Data Mining. Further, we will try to cover Issues in data mining, Elements of Data Mining and Knowledge Discovery, and Kdd Process.

So, let’s start Data Mining and Knowledge Discovery Database(Kdd Process).

Data Mining and Knowledge Discovery Database(Kdd Process)

Data Mining and Knowledge Discovery Database(Kdd Process)

Data Mining and Knowledge Discovery

What is Knowledge Discovery?

Some people don’t differentiate data mining from knowledge discovery. While others view data mining as an essential step in the process of knowledge discovery. Here is the list of steps involved in the kdd process in data mining −

1. Data Cleaning − Basically in this step, the noise and inconsistent data are removed.

2. Data Integration − Generally, in this step, multiple data sources are combined.

3. Data Selection − Basically, in this step, data relevant to the analysis task are retrieved from the database.

4. Data Transformation −In this step, data is transformed into forms appropriate for mining. Also, by performing summary or aggregation operations.

5. Data Mining − Generally, In this, intelligent methods are applied in order to extract data patterns.

6. Pattern Evaluation − Basically in this step, data patterns are evaluated.

7. Knowledge Presentation − Generally, in this step, knowledge is represented.

Knowledge Discovery Database In Data Mining

The process of finding and interpreting patterns from data involves the repeated application of the following steps:

Developing an understanding of:

  • The application domain
  • Relevant prior knowledge
  • The goals of the end-user

Creating a target dataset:

Selecting a data set, or focusing on a subset of variables, or data samples, on which discovery is to be performed.

Data cleaning and preprocessing:

  • Removal of noise or outliers.
  • Collecting necessary information to model or account for noise.
  • Strategies for handling missing data fields.
  • Accounting for time sequence information and known changes.

Data reduction and projection:

  • Finding useful features to represent the data depending on the goal of the task.
  • Using dimensionality reduction methods to reduce the effective number of variables. That is under consideration or to find invariant representations for the data.

Choosing the data mining task:

  • Deciding whether the goal of the KDD process is classification, regression, clustering, etc.

Choosing the data mining algorithm(s):

  • Selecting method(s) to be used for searching for patterns in the data.
  • Deciding which models and parameters may be appropriate.
  • Matching a particular data mining method with the criteria of the KDD process.

Data Mining:

  • Searching for patterns of interest in a particular representational form. Such representations as classification rules or trees, regression, clustering, and so forth.
  • Interpreting mined patterns.
  • Consolidating discovered knowledge. 

Aspects Of Data Mining

Aspects of Data Mining and Knowledge Discovery

Aspects of Data Mining and Knowledge Discovery

a. Data Integration

First of all the data is collected and integrated from all the different sources.

b. Data Selection

Generally, we may not all the data we have collected in the first step. Also, in this step, we select only those data which we think useful for data mining.

c. Data Cleaning

Generally, the data we have collected is not clean and may contain errors, missing values, noisy or inconsistent data. Therefore we need to apply different techniques to get rid of such anomalies.

d. Data Transformation

Basically, the data even after cleaning is not ready for mining. Also, we need to transform them into forms appropriate for mining. Thus, the techniques used to do this are smoothing, aggregation, normalization etc.

e. Data Mining

As now in this step, we are ready to apply data mining techniques on the data. Basically, it is to discover the interesting patterns. Hence, clustering and association analysis are among the many different techniques present. Also, as we used for data mining.

f. Pattern Evaluation and Knowledge Presentation

Generally, this step includes visualization, transformation, removing redundant patterns from the patterns we generated.

g. Decisions / Use of Discovered Knowledge

As this step is beneficial to us. Also, it helps to use the knowledge acquired to take better decisions.

Issues In Data Mining

Issues that need to be addressed by any serious data mining package are:

i. Uncertainty Handling

ii. Dealing with Missing Values

ii. Dealing with Noisy data

iv. The efficiency of algorithms

v. Constraining Knowledge Discovered to only Useful

vi. Incorporating Domain Knowledge

vii. Size and Complexity of Data

viii. Data Selection

ix. Understandability of Discovered Knowledge: Consistency between Data and Discovered Knowledge

Five Major Elements of Data Mining

Elements of Data Mining and Knowledge Discovery

Elements of Data Mining and Knowledge Discovery

i. Extract, transform and load transaction data onto the data warehouse system.

ii. Basically, it stores and manages the data in a multidimensional database system.

iii. Generally, provide data access to business analysts and information technology professionals.

iv. As basically it analyzes the data by application software.

v. Basically, it shows the data in a useful format, such as a graph or table.

Levels Of Analysis in Data Mining

Following are levels of analysis in Data Mining And Knowledge Discovery Databases.

a. Artificial Neural Networks

Basically, there are present non-linear predictive models. That learn through training and resemble biological neural networks in structure.

b. Genetic Algorithms

Generally, there are optimization techniques that use this process. Such as genetic combination, mutation. Also, natural selection in a design based on the concepts of natural evolution.

c. Nearest Neighbor Method

Basically, it is a technique that classifies each record in a dataset. Also, as it is based on a combination of the classes of the k record(s) most similar to it in a historical data set.

d. Rule Induction

Generally, the extraction of useful if-then rules from data based on statistical significance.

e. Data Visualization

Basically, the visual interpretation of complex relationships in multidimensional data. Also, we use graphics tools to illustrate data relationships.

Conclusion

As a result, we have studied Data Mining and Knowledge Discovery. Also, learned Aspects of Data Mining and knowledge discovery, Issues in data mining, Elements of Data Mining and Knowledge Discovery, and Kdd Process. etc. As this, all should help you to understand Knowledge Discovery in Data Mining. Furthermore, if you have any query, feel free to ask in a comment section.

Did you like our efforts? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *