R Statistics – Statistical Programming in R
In this tutorial, we will go to learn R statistics in detail. Along with we will also cover the types of R Statistics. Moreover, we will learn many objects in detail present which is used for statistics. Also, we use graphs and images that helps in easy understanding.
So, let’s start R Statistics Tutorial.
2. Introduction to R Statistics
R Statistics concerns data; their collection, analysis, and interpretation. It has the following two types:
Descriptive statistics concerns the summarization of data. Also, we have a dataset. And we would like to describe the data set in many ways. Basically, this entails calculating numbers from the data, called descriptive measures.
percentages, sums, averages, and so forth.
Inferential statistics do more. There is an inference associated with the data set. Also, a conclusion is drawn about the population from which the data originated.
3. Types of Data in Statistics
Whenever we are working with statistics. It’s very important to recognize the different types of data:
- numerical (discrete and continuous)
Data are the actual pieces of information that you collect through your study.
Most data fall into one of two groups: numerical or categorical:
i. Numerical Data
It contains data which have to mean as a measurement. Such as a person’s height, weight, IQ, or blood pressure.
Numerical data can be further broken into two types:
a. Discrete data – It represents items that can be counted. Basically, they take on possible values that can be listed out. The list of possible values may be fixed or it may go to infinity.
b. Continuous data – It represents measurements. Also, their possible values cannot be counted. Although, it can only be described using intervals on the real number line.
ii. Categorical Data
We use it too represents characteristics. Such as :
a person’s gender, marital status, hometown.
It can take on numerical values:
Such as “1” indicating male, and
“2” indicating female.
But those numbers don’t have mathematical meaning. You couldn’t add them together.
Qualitative data is another name for categorical data. Moreover, it is called as Yes/No.
There is one more data called Ordinal Data. Let’s begin to learn this:
iii. Ordinal data
It mixes numerical and categorical data. The data falls into the category, but the numbers that are placed on the categories must have to mean.
We have to rate a restaurant on a scale of 0 to 4 stars gives ordinal data.
They are often treated as categorical. We have to order the groups whenever it requires creating graphs and charts.
4. Distance Measures (Similarity, dissimilarity, correlation)
We consider it as mathematical approaches. Also, it helps us to measure the distance between the objects. Also, we use computing distance to compare the objects. Now, we can conclude three different standpoints on the basis of comparison such as:
Similarity- it is a measure that ranges from 0 to 1 [0, 1]
Dissimilarity- it is measured that range from 0 to INF [0, Infinity]
Correlation- it is measures that range from +1 to -1 [+1, -1]
i. What is a correlation?
If r is close to 0, it means there is no relationship between the objects.
When r is positive, it means that as one object gets larger the other gets larger.
r is negative means one gets larger, the other gets smaller.
r value =
+.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship
In today’s world, there are several methods for computing correlation measures ‘r’. Also, out of which Pearson’s correlation coefficient has commonly used a method.
So, let’s first understand this:
ii. What is Pearson’s Correlation Coefficient?
R Correlation is a technique. Also, we use it for investigating the relationship between two quantitative, continuous variables.
age and blood pressure.
Values of Pearson’s correlation coefficient
Pearson’s correlation coefficient (r) for continuous (interval level) data ranges from -0.4 to +0.4.
Graphically correlations look like:
If r = -0.4, data lie on a perfectly straight line with a negative slope.
For r = +0.4, data lie on a perfectly straight line with a positive slope
When r = 0, no linear relationship between the variables
Positive correlation – In this, both variables increase or decrease together.
Negative correlation – In this correlation as one variable increases, so the other decreases.
iii. Formulas and Methods to Calculate Distance Measure
Now, we will learn more complex objects that are objects with multiple attributes.
- Euclidean distance
- Taxicab or Manhattan distance
- Cosine similarity
- Mahalanobis distance
- Pearson’s Correlation Coefficient(discussed in above paragraph)
a. Euclidean distance –
It is a classical method. Also, it helps to compute a distance between two objects A and B in Euclidean space.
b.Taxicab or Manhattan distance –
It is like a Euclidean distance. Although, there is only one difference. That we can calculate the distance by traversing. Also, we have to do traversing the vertical & horizontal line in the grid-based system.
Manhattan distance used to calculate a distance between two points. Geographically we use it to separate by the building blocks in the city.
c. Minkowski –
This distance is a metric on Euclidean space. We can also consider it as s a generalized of Euclidean and Manhattan distance.
Where r is a parameter.
When r =1
It tends to compute Manhattan distance.
When r =2
It tends to compute Euclidean distance.
When r =∞
It tends to compute Supremum.
d. Cosine Similarity –
it is a measure that calculates the cosine of the angle between two vectors. Basically, this metric is a measurement of orientation and not size. Also, we can use it as a comparison between documents the angle between them.
e. Mahalanobis distance –
It is used to measure a distance between the two groups of object. Also, we can graphically represent an idea of distance measure. Although, it helps in better understanding. According to above data in picture, It can calculate a distance between the Group1 & Group2. Basically, we can use this type of distance measure. Also, it is helpful for classification and clustering.
So, this was all about R Statistics. Hope you like our explanation.
5. Conclusion – R Statistics
Hence, we have done a detailed study of statistics. Also, learned about each and every object of statistics. Moreover, we have studied distance measures. I hope this above content will help you in better understanding about R statistics. Still, if you have any query, ask in the comment tab.