Convolutional Neural Networks tutorial – Learn how machines interpret images
Free Machine Learning courses with 130+ real-time projects Start Now!!
Do you know researchers are continuously working to train machines to think as humans do, and they are succeeding too? The fundamental behind this is Neural Networks. Earlier DataFlair has shared an excellent tutorial on Recurrent Neural Networks, and today, we come to you with this Convolutional Neural Networks Tutorial. You will study how convolutional neural networks have become the backbone of the artificial intelligence industry and how CNNs are shaping industries of the future.
“Computers are able to see, hear and learn. Welcome to the future.”
-Dave Waters
Machine Learning has many algorithms that are responsible for imparting intelligence to the systems. There are a wide variety of applications of Machine Learning. One such application is that of computer vision. The goal of recognizing faces and other images are well performed with the help of a special type of neural networks called convolutional neural networks (CNNs). In this blog, we will see the working behind these CNNs as well as their history, and their applications.
What is CNN?
Convolutional Neural Networks are a type of Deep Learning Algorithm that take the image as an input and learn the various features of the image through filters. This allows them to learn the important objects present in the image, allowing them to discern one image from the other. For example, the convolutional network will learn the specific features of cats that differentiate from the dogs so that when we provide input of cats and dogs, it can easily differentiate between the two. One important feature of Convolutional Neural Network that sets it apart from other Machine Learning algorithms is its ability to pre-process the data by itself. Thus, you may not spend a lot of resources in data pre-processing. During cold-start, the filters may require hand engineering but with progress in training, they are able to adapt to the learned features and develop filters of their own. Therefore, CNN is continuously evolving with growth in the data.
Another important point that can be noted in CNNs is the consideration of the spatial pyramids in the data. This means that CNNs are capable of capturing the spatial and temporal relationship in an image by means of the features that are organized hierarchically. Thus, the deeper the network goes, the more complex patterns it can define by training on simpler patterns found in the previous layers. This hierarchical learning makes CNNs very useful in any tasks that require a high level of detail in the visual data.
If there is any machine learning concept which you are not aware of, then you must save this DataFlair Machine Learning Tutorial Series. Bookmark now and learn any time.
Now, after knowing what convolutional networks are, let’s move on towards the working of CNN.
How do Convolutional Neural Networks Work?
CNNs have similar performance to the ordinary fully connected Neural Networks. These convolutional networks have weights that can learn from the input and biases. Every neuron connected in the network receives an input and performs a dot product on it. This proceeds in a non-linear fashion. There is a singular differentiable score function at the end. This function consists of scores that we obtain from the various layers of the neural network. Finally, a loss function at the end to evaluate the performance of the model. The convolutional neural network is different from the standard Neural Network in the sense that there is an explicit assumption of input as an image.
This assumption helps the architecture to definition in a more practical manner. For example, unlike the linear arrangement of neurons in a simple neural network. These neurons have an overall structure of three dimensions – Length, Width, and Height. For instance, images in the CIFAR 10 dataset will contain images of dimensions 32x32x3 and the final output will have a singular vector of the images of dimensions 1x1x10. The architecture of the Convolutional Neural Network is as follows –
- INPUT – As discussed above, a typical image in the CIFAR 10 data will hold images if dimensions 32x32x3 where the depth denotes the number of channels (RGB) in the image.
- CONV layer is responsible for computing the dot product between the weights of the neuron and the region of the input image to which share a connection. Here, the dimensions become 32x32x12 denoting the 12 filters which the neural network makes use of.
- The third layer consists of RELU which is for the application of the activation function to our resultant dot product. The size of this result does not change.
- The fourth POOL layer will downsample the spatial dimensions of the image i.e. width and height. This will reduce the dimensions to 16x16x12.
- The fully connected layer will be responsible for computing the class score, leading to a final volume of 1x1x10. The 10 here represent the categories of CIFAR-10.
Have you checked our latest blog on Artificial Neural Networks?
The most important layer in the architecture of CNN is that of Convolutional Layer. The essential component of the CONV layer comprises of a learnable filter. Each filter on the CONV net has a size of 5x5x3. When the forward pass takes place, we perform sliding on each filter across the width as well as the depth of the input volume and finally we compute the dot product. This dot product will ultimately result in a 2-dimensional activation map which provides us with a response of the filter at every spatial position.
The network will then learn the various filters that come along the visual features such as an edge or blotches of some color in the first layer and generate a honey-comb like a pattern in the higher layer of the network. Each of the 12 filters in the ConvNet will produce a 2-dimensional activation map which we stack to produce the output volume.
Applications of Convolutional Neural Networks
- Convolutional Neural Networks or CNNs were developed for image recognition and therefore, are mostly in the field of computer vision where they are used for classifying images, segmenting them and also performing localization on the images.
- Videos are different from images in the sense that they have a temporal dimension. While more complicated than images, We can tweak these CNNs to accommodate these types of streaming visual inputs. Most frequently, convolutional networks are becoming highly popular with other algorithms like LSTM and Boltzmann Machines for boosting their performance at handling video input.
- Apart from visual inputs, CNNs are also being utilized in the field of Natural Language Processing for semantic parsing, sentence modeling, prediction as well as classification.
- More recently, companies like Google have been using CNNs alongside Recurrent Neural Networks and LSTMs for speech recognition.
- CNNs are also being used in drug discovery where they prove to be an efficient tool for identifying the interaction between the molecules and the biological proteins for the identification of potential treatments.
Summary
So, summarizing the article of convolutional neural networks. It is the most popular and important algorithm for working with image data. We learned how CNN is used to several other regions beyond image data to increase its range of applications. We also learned how CNNs work and how they can carry out a variety of operations. Hope you enjoyed reading this. If there is anything you want to ask or share about Convolutional Neural Networks, comment below.
What next? DataFlair brings an important machine learning project for you – Credit Card Fraud Detection using ML. Try this now!!
Did you like this article? If Yes, please give DataFlair 5 Stars on Google
Can i get the example code of CNN for the two type of data for classification
ok
it is an error typing