SciPy Stats – Statistical Functions in SciPy

Free NumPy course with real-time projects Start Now!!

The SciPy library consists of a package for statistical functions. The scipy.stats is the SciPy sub-package. It is mainly used for probabilistic distributions and statistical operations. There is a wide range of probability functions. The statistical functionality is expanding as the library is open-source.

We have functions for both continuous and discrete variables and can work with different types of distributions like the binomial, uniform, and continuous. We can also perform the T-test and determine the T-score. Let us learn more about SciPy Stats.

SciPy Stats

It consists of a large number of probability distribution and statistical functions. We can display all the available functions using the inf(stats)command. We can also display a list of random variables from the docstring of the package.

SciPy Stats consists of the following three classes:

1. rv_continuous

It is a generic base class through which we can construct specific distribution sub-classes and instances for continuous random variables.

2. rv_discrete

It is a generic base class through which we can construct specific distribution sub-classes and instances for discrete random variables.

3. rv_histogram

We can use it to generate specific distribution histograms. It can also be inherited from the class.

There are functions available in SciPy which we can import and then perform the operations. These functions inherent properties of either of the classes available in the package. We generally have rv_continuous and rv_discrete to implement two different distributions.

Normal Continuous Random Distribution in SciPy

In this type of probability distribution, the variable can take any value. Hence, it is known as a continuous random variable.

Here we import the norm function which inherits from the rv_continuous class. The functions include methods and details to work on the specific continuous distribution.

We use the norm function to calculate the cdf on an array.

from scipy.stats import norm
import numpy as np
a=np.array([2,-1,4,1,3,0])
print(norm.cdf(a))
 

Output

[0.97724987 0.15865525 0.99996833 0.84134475 0.9986501 0.5 ]

We can also find the median of the distribution using the Percent Point Function. PPF is actually the inverse value of CDF.

from scipy.stats import norm
import numpy as np
a=np.array([0.97724987,0.15865525,0.99996833, 0.84134475, 0.9986501,0.5])
print(norm.ppf(a))

Output

[ 2.00000004 -1.00000002 4.00000928 1.00000002 2.99999956 0. ]

Uniform Distribution in SciPy

Similarly, we can generate a uniform distribution. We need to import the uniform function and then generate the CDF of the array.

We can increase the functionality with the use of scale and loc keyword. The scale keyword defines the standard deviation and the loc defines the mean value.

from scipy.stats import uniform
a=np.array([9,8,7,3,2])
print (uniform.cdf(a, loc =5 , scale = 3))

Output

[1. 1. 0.66666667 0. 0. ]

Binomial Distribution in SciPy

We can generate a binomial distribution by importing binom the instance of rv_discrete class. It consists of methods and details from class.

from scipy.stats import binom
a=np.array([9,8,7,3,2])
print (binom.cdf(a,n=2,p=5))

Output

[1. 1. 1. 1. 1.]

SciPy Descriptive Statistics

We use descriptive statistical functions to decode certain values from the output. These functions evaluate min, max, mean values from the input NumPy arrays. Some of the functions in stats are:

  • describe()- it returns descriptive stats of the arrays
  • gmean()- it returns the geometric mean along a specific axis of an array
  • hmean()- it returns the harmonic mean along a specific axis of an array
  • sem()- it returns the standard error mean of the mean
  • kurtosis()- it returns the kurtosis value of an array
  • mode()- it returns the mode of an array
  • skew()- it is to perform the skew test on an array
  • zscore()- it returns the z-score relative to the mean and standard deviation values.

T-Test in SciPy

We perform the T-test to evaluate the difference between the mean (average) values of two arrays. We consider the value of T-Test as a significant difference in the two data sets.

T-score

T-score is the concept of relativity. We compute the ratio between the two sets of data. The T-score value describes the difference in arrays. The smaller the value, the more similar are the arrays and vice versa.

The two data sets for comparison can be of any type. The two arrays can even follow dissimilar distribution patterns.

from scipy import stats  
a = stats.norm.rvs(loc = 2, scale = 1, size = (10,5))  
print(stats.ttest_1samp(a,2.0))  

Output

Ttest_1sampResult(statistic=array([-0.82238541, 0.86996127, -0.62452709, -0.40478003, 1.41334689]), pvalue=array([0.43210532, 0.40692533, 0.54778478, 0.69509088, 0.19119386]))

Summary

The stats module is a very important feature of SciPy. It is useful for obtaining probabilistic distributions. SciPy Stats can generate discrete or continuous random numbers. It also consists of many other functions to generate descriptive statistical values.

We can deal with random, continuos, and random variables. We have functions for working with various types of distributions. Also, we can perform the T-test on the data to evaluate the mean value. We have descriptive statistics for in-depth operations.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *