Exploratory Data analysis In R – Use And Terminologies

1. Objective – R Exploratory Data Analysis

In this blog, we will learn about the exploratory data analysis in R. Also, we will discuss the basic statistical properties. Moreover, we will look at the Exploratory graph and its use. At last, we will discuss some important Terminologies of EDA.

So, let’s start Exploratory Data Analysis in R.

Exploratory Data analysis In R – Use And Terminologies

2. Introduction to Exploratory Data Analysis in R

To summarize the main characteristics of data analysis in R, EDA is the only approach with the help of descriptive statistics and visual methods.
It is not a formal process that contains a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA, you should feel free to investigate every idea that occurs to you. So, some of these ideas will pan out, and some will be dead ends.

3. Why do We Use Exploratory Graphs in Data Analysis?

• To understand data properties
• For finding patterns in data
• To suggest modeling strategies
• To “Debug” analyses

4. Terminologies in EDA

So, following are some important Terminologies in Exploratory Data Analysis in R, let’s discuss them in detail

i. Variable

It is a quantity, quality, or property that you can measure.
Types of variables

a. Qualitative Variables

Variables take on values that are names or labels.
Ex. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier.
Types of Qualitative Variables
1. Nominal: Basically, it displays graphical data — all orderings are equally meaningful.
Ex. a student’s religion (Atheist, Christian, Muslim, Hindu, …) is nominal.
2. Ordinal: A categorical variable whose categories can be meaningfully ordered is called ordinal.
Ex. a student’s grade in an exam (A, B, C or Fail) is ordinal.

b. Quantitative Variables

Variables that can measure on a numeric or quantitative scale.
Ex. Age, count of anything etc.
Types of Quantitative Variables:
1. Discrete: A discrete variable is one that cannot take on all values within the limits of the variable.
Ex. The number of children is a discrete numerical variable (a count). The variable cannot have the value 1.7
2. Continuous: In this, the variable can take on any value between two specified values.
Ex. age of a human: 25 years, 10 months, 2 days, 5 hours

ii. Value

It is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

iii. Observation

It is a set of measurements made under similar conditions. An observation will contain several values, each associated with a different variable. I’ll sometimes refer to an observation as a data point.

iv. Tabular data

Basically, it is a set of values, each associated with a variable and an observation. Tabular data is tidy if each value is placed in its own “cell”, each variable in its own column and each observation in its own row.

v. Dataset

Following are the components of a data/dataset:
• Basically, a data set is represented as a matrix
• There is a row for each unit
• There is a column for each variable
• A unit is an object which we use to measure, such as a person, or a thing
• A variable is a characteristic of a unit. We use it to assign a number or a category

a. Dimensionality of Data Sets

• Univariate: Measurement made on one variable per subject
• Bivariate: Measurement made on two variables per subject
• Multivariate: Measurement made on many variables per subject

5. Numerical Summaries of Data

Numerical measures are very useful in situations that require decision making and inferences.
So, this was all in Exploratory Data Analysis in R. Hope you like our explanation.

6. Conclusion

Hence, in this Exploratory Data Analysis in R, we have studied the complete concept of Exploratory Data Analysis (EDA). Also, we learned about basic statistics in R. Moreover, we discussed some important terminologies in Exploratory Data Analysis in R. Furthermore, if you feel any query, feel free to ask in the comment section.