Exploratory Data analysis In R – Use And Terminologies

FREE Online Courses: Elevate Your Skills, Zero Cost Attached - Enroll Now!

In this blog, we will learn about the exploratory data analysis in R. Also, we will discuss the basic statistical properties. Moreover, we will look at the Exploratory graph and its use. At last, we will discuss some important Terminologies of EDA.

So, let’s start Exploratory Data Analysis in R.

Introduction to Exploratory Data Analysis in R

To summarize the main characteristics of data analysis in R, EDA is the only approach with the help of descriptive statistics and visual methods. It is not a formal process that contains a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA, you should feel free to investigate every idea that occurs to you. So, some of these ideas will pan out, and some will be dead ends.

Why do We Use Exploratory Graphs in Data Analysis?

  • To understand data properties
  • For finding patterns in data
  • To suggest modeling strategies
  • To “Debug” analyses

Terminologies in EDA

So, following are some important Terminologies in Exploratory Data Analysis in R, let’s discuss them in detail

i. Variable

It is a quantity, quality, or property that you can measure.

Types of variables

a. Qualitative Variables

Variables take on values that are names or labels.
Ex. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier.
Types of Qualitative Variables
1. Nominal: Basically, it displays graphical data — all orderings are equally meaningful.
Ex. a student’s religion (Atheist, Christian, Muslim, Hindu, …) is nominal.
2. Ordinal: A categorical variable whose categories can be meaningfully ordered is called ordinal.
Ex. a student’s grade in an exam (A, B, C or Fail) is ordinal.
b. Quantitative Variables 

Variables that can measure on a numeric or quantitative scale.
Ex. Age, count of anything etc.
Types of Quantitative Variables:
1. Discrete: A discrete variable is one that cannot take on all values within the limits of the variable.
Ex. The number of children is a discrete numerical variable (a count). The variable cannot have the value 1.7
2. Continuous: In this, the variable can take on any value between two specified values.
Ex. age of a human: 25 years, 10 months, 2 days, 5 hours

ii. Value

It is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

iii. Observation

It is a set of measurements made under similar conditions. An observation will contain several values, each associated with a different variable. I’ll sometimes refer to an observation as a data point.

iv. Tabular data

Basically, it is a set of values, each associated with a variable and an observation. Tabular data is tidy if each value is placed in its own “cell”, each variable in its own column and each observation in its own row.

v. Dataset

Following are the components of a data/dataset:
  • Basically, a data set is represented as a matrix
  • There is a row for each unit
  • There is a column for each variable
  • A unit is an object which we use to measure, such as a person, or a thing
  • A variable is a characteristic of a unit. We use it to assign a number or a category

a. Dimensionality of Data Sets

  • Univariate: Measurement made on one variable per subject
  • Bivariate: Measurement made on two variables per subject
  • Multivariate: Measurement made on many variables per subject

Visualizations with R

Apart from descriptive statistics, EDA in R is also heavily dependent on data visualization techniques in order to gain insights and other important information from the data. Data Visualizations like box plots, violin plots, pie charts, bar graphs etc can be used to provide a visual representation of the relationship between the variables. These plots and graphs help in uncovering hidden patterns, trends and insights from the data which might further assist in performing EDA.

EDA for Data Preprocessing and Feature Engineering

Exploratory Data Analysis also plays a huge role in Data preprocessing and Feature Engineering. By making use of concepts like missing value imputation, outlier analysis and feature transformation, the data is converted into a format which will be better understood by Machine Learning algorithms. This step eventually contributes to the betterment of the accuracy of Machine Learning algorithms.

Numerical Summaries of Data

Numerical measures are very useful in situations that require decision making and inferences.
So, this was all in Exploratory Data Analysis in R. Hope you like our explanation.

Conclusion

Hence, in this Exploratory Data Analysis in R, we have studied the complete concept of Exploratory Data Analysis (EDA). Also, we learned about basic statistics in R. Moreover, we discussed some important terminologies in Exploratory Data Analysis in R. Furthermore, if you feel any query, feel free to ask in the comment section.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *