PyTorch Interview Questions

Free Machine Learning courses with 130+ real-time projects Start Now!!

In this article, you will see some of the top questions with answers on PyTorch. These will help you in clearing the PyTorch Interviews. Let’s start!!

Basic PyTorch Interview Questions

1. What is a Neural Network?

Inspired by the human brain, Artificial Neural Networks, or simply Neural Networks, is a Machine Learning algorithm with interconnected nodes, just like the brain’s neurons, with each neuron interpreting some information about the input given to it.

2. What is Gradient Descent?

Gradient descent is a method of minimising loss by calculating the gradient of the loss curve against the parameter under consideration, starting at a random point and gradually moving towards the minima.

3. Explain the basic structure of a neural network.

A Neural Network has three kinds of layers, namely input, hidden, and output layers. The input layer takes the network’s input, and the output layer produces the output. The hidden layer may comprise one or more layers, with each layer capturing different characteristics of the input data.

4. What is a loss function?

A loss function represents the distance between the predicted output and the actual output, thereby giving the direction in which the parameters need to be tweaked to fit the training samples.

5. What is a computational graph?

Computational Graphs are directed graphs with each node being occupied by an operation or a variable.

6. What is a stride in a Convolutional Neural Network?

Stride is the number of pixel shifts in consecutive iterations of a convolutional layer.

7. Define backpropagation.

Once the output has been calculated, the propagation of the loss back to the network with the aim of reducing the loss by optimising the model parameters is called backpropagation.

8. What is the role of autograd in PyTorch?

Autograd is the feature that makes using PyTorch convenient by automatically calculating the gradient of the loss function with respect to all the parameters and facilitates in optimising the weights associated with all the nodes.

9. What is a recurrent neural network?

Neural Network that takes into consideration both the present input and the previous state of the model is called a Recurrent Neural Network.

10. What is an RNN used for?

We can use Recurrent Neural Networks in the application where the present state of the network depends on the present input as well as the past state of the model. These applications include sentence completion, sentiment analysis, speech recognition, language translation etc.

11. What is the need for LSTM in Recurrent Neural Networks?

In recurrent Neural Networks, sometimes the error may be very small, and while backpropagating, it may get ignored, and the weights will not get changed, failing to accommodate the new word. Therefore, we use LSTM (Long Short Term Memory), which can forget unnecessary state information and remember only the relevant ones.

12. What is an activation function?

An activation function decides output a neuron will produce based on the given input. It is basically a mathematical function which is applied to the input.

13. Explain the significance of activation function in a Neural Network.

An activation function is the most important part of a neuron in a neural network. It defines the functionality of the neuron with which it is associated. It converts the input into another form which helps identify the characteristic of the input.

14. What is an epoch?

The number of times an entire training data has been passed through the neural network is called its epoch.

15. Define stride with reference to a convolutional neural network.

Stride is the number of pixel shifts over the input matrix by the filter matrix between two consecutive convolution operations.

16. List a few advantages of PyTorch.

The advantages of PyTorch are as follows:

a. PyTorch is easy to use and simple to implement as it employs Python as its development language.

b. Also, it has a very easy-to-understand syntax making it Pythonic.

c. PyTorch uses dynamic computational graphs, making it very flexible and robust.

d. There are many models which are already trained and available for us to use on the cloud servers reducing our work and increasing efficiency.

e. It is easy to debug as we can use Python’s debugging tools.`

17. Define a Kernel.

A kernel is a single block or multiple blocks of memory where all the operations of a model are carried out.

18. What are the datatypes that can be stored in a tensor?

A PyTorch tensor can support integers, floats and boolean values.

Intermediate PyTorch Interview Questions

19. How does a convolution network work?

In a convolution network, some or all of the hidden layers are convolutional layers. These layers specialize in identifying different patterns like edges, contours, circles etc. They have filter matrices that convolve over the input matrix forming an activation map that holds information about the patterns of the images fed into the network.

20. How is a tensor stored in the storage?

A tensor is a multidimensional array, but in storage, it is stored just like a python list in one dimension, occupying contiguous memory space with additional information about the dimensionality, offset and pre-dimensional strides. So if we transpose any tensor, then the storage locations of the elements do not change. Only the dimension information changes.

21. Why is padding required in CNN?

In the convolution process, the pixels at the fringes of the input matrix contribute to fewer elements of the activation map than the pixels at the centre. This may undermine the information about the patterns present at the outer layer of the matrix, which might be important. Therefore, padding is required so that these fringe elements are adequately represented.

22. Why do we need to set the gradient to zero after each epoch?

For training a new batch, firstly, we have to set the gradient to zero so that the gradients in the previous iteration do not accumulate in the current iteration. If we let the gradients accumulate, then the network will find it difficult to optimise the weights making our model inefficient.

23. Why are feedforward networks inefficient in text completion tasks?

Feed Forward Networks cannot establish relationships between the current and the past inputs, failing to recognise the context of the word. Therefore, we cannot use a feedforward network for such tasks.

24. Why do we need to shuffle the dataset before passing it through a neural network?

Sometimes the data we have for training purposes may not be random. For example, in making a cats vs dogs model, we may have all the images of cats and dogs separated out into different groups. In training our model using such data, the model may learn to accommodate only one group, and every test image given to such a model may turn out to be of only one category. So we need to shuffle our data so that it accounts for all the groups available.

25. Why are pooling layers required in a convolution network?

The pooling layers reduce the dimensionality of the activation map, thereby reducing the number of parameters that need to be learned and, consequently, the training time.

26. What do you mean by Distributed Data Parallel?

If we have a model that is small enough to fit in one GPU, we can train it on a larger dataset by copying the model on multiple GPUs and running them in parallel on a very large dataset, with each GPU training on a subset of the dataset and then computing the gradients and synchronising them.

27. What is the maximum dimension of a tensor that can be convolved using PyTorch?

PyTorch provides up to 3 dimensional convolutional layers. However, higher dimensional convolution layers can be custom-built using PyTorch.

28. Can we apply CNN to problems that are not related to images?

Though CNNs were developed to tackle image processing tasks, they can be used on a wide variety of problems like text completion, sentiment analysis and so on. It sometimes even outperforms RNN, which specialises in such tasks.

Advanced Questions on PyTorch Interview Questions

1. Will the following command work? If not, explain why?

model=nn.Sequential(nn.Conv2d(1,10,5,padding=2),
                    	nn.ReLU(),
                    	nn.AvgPool2d(2,stride=2),
                   	 
                    	nn.Conv2d(10,20,5,padding=0),
                    	nn.ReLU(),
                    	nn.AvgPool2d(2,stride=2),
                   	 
                    	nn.Flatten(),
                    	nn.Linear(500,250),
                    	nn.ReLU(),
                    	nn.Linear(255,100),
                    	nn.ReLU(),
                    	nn.Linear(100,10)
                   	)

Ans – The given command will not work and show an error because the number of the out-channels of the first linear layer is not equal to the number of in-channels of the second linear layer.

2. How do we optimise a model in pytorch? Support your answer with an example.

To optimise our model, we first import torch.optim and invoke optim.step() at the time of training.

First of all, we initialise the hyperparameter of the model and invoke the desired optimiser.

l_rate=0.0001
batch_size=64

epoch=10
cnn_model=ConvNet()
 
 
cel=nn.CrossEntropyLoss()
optimizer=optim.Adam(cnn.parameters(),l_rate)

Then we train the model using the training loop setting the gradient to zero (using optimiser.zero_grad()), backpropagating (loss.backward) and optimising (optimizer.step) in every iteration of the loop.

def training(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)
 
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
 
        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

3. Explain backpropagation mathematically.

After the output is calculated, loss is computed with respect to the parameters in the last layer of the network.

მLoss/მw1=მf1/მw1 * მLoss/მf1

Also;
 
მLoss/მf1=მf2/მf1*მLoss/მf2

Using the chain rule;
მLoss/მf1=მf2/მf1 *მf3/მf2*მf4/მf3*...............მfL/მfL-1

After the penultimate layer adjusts its weights depending on the loss, the layer before that is adjusted, and the process goes on. This process of computing the output using the present weights and then propagating backwards to adjust these weights layer by layer is called Backpropagation.

4. In Linear Regression, is there a better way of calculating the model parameters than the gradient descent?

Gradient Descent is a very efficient algorithm for minimising the loss. However, it does not give the exact minima, and it might also take a while to reach the minima if the learning rate is low. The optimal values of the parameter can also be computed using the normal equation of the matrix.

Ө=(XT*X)-1.XT*Y, where X and Y are the input and output matrices, respectively and Ө is the required vector of optimal parameters. This method gives the absolute optimal values, and the computation time depends on the size of the matrix.

5. Explain the parameters of a tensor?

A tensor has the parameters: data, grad, grad_fn, is_leaf and requires_grad.

data	Stores the data in the tensor, a number or a single or multidimensional array.
grad	Stores the gradient of the tensor.
grad_fn	A pointer that points to a node in the backward graph.
is_leaf	Either true or false. Indicates if the tensor is a leaf of the graph.
requires_grad	Indicates if it is required to calculate the gradient or not.

6. If a=tensor([2,3],requires_grad=True) and b=tensor([5,6],requires_grad=False) what will be the value of parameters of c=a+b?

The parameters of the tensor c are as follows.

data=(7,9)
grad=None
grad_fn=Addbackward
is_leaf=None
requires_grad=True

7. What is the hypothesis function of Logistic Regression?

The hypothesis function of Logistic Regression can be defined as follows.
, where X is the feature matrix of the input dataset.

8. In the following code snippet, explain the significance of super(….).init() in line 3.

class ConvNet(nn.Module):
    def __init__(self):
            super(ConvNet,self).__init__()

Ans – The model in the above example inherits the nn.Module class of PyTorch. However, the features of the constructor of the parent class are not initialised for a subclass unless specified otherwise. Therefore, the command super(model,self).__init__ is used to initialise the parameters of the parent class for the child class.

9. Suppose we have the following sentences available to us:

<start> There is a house. <end>
<start> Everything is scattered here and there. <end>
<start> There is the boy you were looking for. <end>

Which word is more likely to continue the sentence “ I saw a house there ..”?

Ans – The words appearing after “there” are “is” and “ <end>” with the probability ⅔ and ⅓. Therefore the sentence is more likely to continue with “is”, though it may not be grammatically correct.

10. What can you infer from the following command?

train=datasets.MNIST(“”, train=True,download=True,transform=transforms.Compose([transforms.ToTensor()]))
test=datasets.MNIST(“”, train=False,download=True,transform=transforms.Compose([transforms.ToTensor()]))

Ans – In the above command, the MNIST dataset has been loaded.

train=True/False-– differentiates the training and test datasets.
download=True– download the dataset if it is not already available in the disk. transform=transforms.Compose([transforms.ToTensor()])— transforms the dataset into tensors so that it could be loaded on a GPU if needed.

Summary

Hope you liked the article and find it useful for your next PyTorch interview.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google