Mathematical Building Blocks of Neural Networks

Free Machine Learning courses with 130+ real-time projects Start Now!!

In this article, we will see the basic building blocks of neural network like scalar, matrices, vectors, calculus, probability etc. Let’s start!!!

Introduction

Mathematics forms the core of machine learning as all the machine learning principles are based on it. To learn Machine Learning one must have a good understanding of Vectors, Matrices, Probability and Statistics and a bit of calculus (derivatives and partial derivatives).

To implement any machine learning model we will not be doing all the math by ourselves, the computer will do it for us. But we still need a solid understanding of these concepts to understand and control the behaviour of the models we build. Each of the aforementioned topics can be composed into separate courses, such is their vastness and applications. However, we will try to cover the parts which are essential for us before we take a deep dive into Deep Learning.

1. Scalars:

A scalar is basically a real number. We can initialise a scalar as an integer or a float value stored in a variable or as a vector of only one element.

2. Matrices:

A rectangular representation of numbers in rows and columns is called a matrix. If a matrix has n rows and m columns it is said to be an order “n x m” matrix (read as n by m matrix). Matrices are used to represent a dataset systematically with the rows representing different data points and the columns representing their different parameters. For example, if we have a information about age and height of three people we can represent it as below:

AgeHeight(cm)
person123180
person212175
person345160

For computational purposes this data can be represented in the matrix form.
[ [23 180]
[12 175]
[45 160]]

Now we will build the same matrix using PyTorch:

import torch

In [5]:

a=torch.tensor([[23,180],[12,175],[45,160]])

In [6]:

a

Out[6]:

tensor([[ 23, 180],
[ 12, 175],
[ 45, 160]])

Operations on Matrices:-

1. Addition:

Addition of two or more matrices refers to adding the corresponding elements of the given matrices and storing the result in a new matrix. The order of the matrices to be added must be the same. We can do this operation using PyTorch.

import torch

In [5]:

a=torch.tensor([[23,180],[12,175],[45,160]])

In [6]:

a

Out[6]:

tensor([[ 23, 180],
[ 12, 175],
[ 45, 160]])

In [7]:

b=torch.tensor([[12,100],[67,190],[44,185]])

In [8]:

b

Out[8]:

tensor([[ 12, 100],
[ 67, 190],
[ 44, 185]])

In [9]:

a+b

Out[9]:

tensor([[ 35, 280],
[ 79, 365],
[ 89, 345]])

2. Subtraction:

Similar to addition, subtraction refers to element-wise subtraction of two matrices.

a-b

Out[10]:

tensor([[ 11, 80],
[-55, -15],
[ 1, -25]])

3. Multiplication:

To multiply two matrices the number of columns of the first matrix should be equal to the number of rows of the second matrix. The i’th rows of the first matrix are multiplied by the j’th columns of the second matrix element wise and then they are added to obtain the entry at the location i,j of the resultant matrix. We can perform matrix multiplication using PyTorch as follows:

In [13]:

torch.mm(a,b)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11472/3228927151.py in <module>
----> 1 torch.mm(a,b)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

Here we can see that we get an error when we try to perform matrix multiplication on the two matrices defined above.

c=torch.tensor([[23,34,5],[56,357,8]])

In [19]:

torch.mm(a,c)

Out[19]:

tensor([[10609, 65042, 1555],
[10076, 62883, 1460],
[ 9995, 58650, 1505]])

4. Transpose:

To find the transpose of a matrix we interchange it’s rows and columns i.e, i’th row becomes i’th column and the j’th column becomes j’th row.

c

Out[20]:

tensor([[ 23, 34, 5],
[ 56, 357, 8]])

In [23]:

c.t()

Out[23]:

tensor([[ 23, 56],
[ 34, 357],
[ 5, 8]])

5. Inverse:

Inverse of a matrix is a matrix which when multiplied to it gives an identity matrix i.e, A*inv(A)=I
In [45]:

d=torch.tensor([[12.,34,5],[44,7,6],[10,8,43]])

In [46]:

d

Out[46]:

tensor([[12., 34., 5.],
[44., 7., 6.],
[10., 8., 43.]])

In [48]:

e=torch.inverse(d)

3. Vectors:

A matrix having only one column is called a vector. A vector represents the values of all the parameters of a single data point. For example, in the above example person 1 can be represented by a vector [23,180].

Operation on vectors:

1. Addition

Addition of the corresponding elements of two vectors of the same size.

A=torch.tensor([1,2,3])
B=torch.tensor([4,5,6])

In [51]:

A+B

Out[51]:

tensor([5, 7, 9])

2. Dot Product

Multiplication of the corresponding elements of two vectors and then summing the results.
In [52]:

torch.dot(A,B)

Out[52]:

tensor(32)

In [ ]:

3. Scalar Multiplication:

Multiplying all the elements of a vector by a scalar.
In [53]:

3*A

Out[53]:

tensor([3, 6, 9])

4. Probability:

The likelihood of an event happening is called its probability.

Terminologies is Probability:

i. Sample Space:

Set of all the possible outcomes of an event.

ii. Random Variable:

A variable that associates a value to every possible outcome.

iii. Probability Mass Function:

Gives the probability of every possible outcome.

iv. Mean:

Average of the observed data, here, possible outcomes.

v. Variance:

A Measure of how far apart are all the outcomes from the mean, i.e it gives the spread of the data.

vi. Conditional Probability:

Probability of an event A when event B has already occurred. It is represented as P(A|B), read as probability of A given B.

Baye’s rule: P(A|B)=P(B|A)*P(A)/P(B)

5. Statistics:

Statistics deals with collection, analysis, representation and interpretation of data.

Sampling– Process of selection of subset of the statistical population for the purpose of analysis.

Sampling is of two types:

  • Probabilistic
  • Non-Probabilistic

For Machine Learning we are concerned only with probabilistic sampling which is further divided into three types.

Random:

When the sampling is done at random without any bias.

Systematic:

When selection starts at a random point but have fixed periodic interval.

Stratified:

Dividing the entire population based on some common properties and selecting the samples accordingly.

Now that we have collected the samples we can analyze it using different statistical methods namely descriptive and inferential statistics.

Descriptive Statistics:

It involves finding descriptive parameters that summarises the given sample like mean, variance, standard variation etc.

Inferential Statistics:

It involves forming a generalized view of the samples.

Hypothesis Testing:

A hypothesis is the proposed explanation of a phenomenon. In statistics we form a null hypothesis which states that a particular parameter does not affect the target variable. We can not accept the null hypothesis. Either we can reject it or fail to reject it. We form another hypothesis that negates the null hypothesis.

Suppose we want to determine if regular exercising can help us fight Covid. The null hypothesis would be that it does not help us fight Covid. And the alternate hypothesis would be that it does. Now we can perform different statistical methods and try to arrive at a conclusion.

6. Calculus:

The derivative of a single variable function represents how an infinitesimal change in the independent variable would affect the dependent variable.

Now if a function is multivariate, it depends on more than one variable. To study the effect of infinitesimal change in one variable on the dependent variable we keep all the other independent variables constant and vary the variable under consideration by a very small amount. This is called the partial derivative of the function with respect to the variable we are changing.

Summary

The deep learning models are based on probability, statistics and calculus and the data are represented as vectors and matrices which help us in manipulating and predicting the target variable. All the mathematical operations that will be required can be computed using a single line of code in python and PyTorch also has some libraries and methods which are aimed at solving these problems.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *