ResNet-50: Introduction

Srinivas Rahul Sapireddy
4 min readJun 30, 2023

ResNet50 is a variant of the ResNet model with 48 Convolution layers along with 1 MaxPool and 1 Average Pool layer.

Why Resnet?

The vanishing gradient problem makes Deep learning neural networks hard to train. We use backpropagation to update the neural network weights using the chain rule of derivatives. The repeating multiplications will make the weights extremely small while reaching earlier layers.

Residual networks use the concept of skip connections. Using these skip connections, we can resolve the problem of vanishing gradient.

Skip Connections

In a typical network, the convolutional layers are stacked one after the other. In skip connections, the traditional layers are staked one after the other, but here we add the original input to the output of the convolutional block. This is called skip connection. Using this skip connection, the residual networks solve the vanishing gradient problem. Because some layers are skipped, the value will not reach a minimum value as we are missing some.

Skip Connections were introduced to solve various problems in various architectures. Skip connections solved the degradation problem in the case of ResNets, whereas it ensured feature reusability in the case of DenseNets.

Residual Block

Y = F(X) + X, make F(X) = 0 so that we can make Y = X

F(x) — Calculating the loss function.

We calculate the loss function using the input and get the output Y. In traditional networks, input travels layer by layer, and we calculate the output value. As we can see in the figure, we are adding the actual input to the output using skip connections.

The overall accuracy will increase if the network learns from the difference between the input and output.

The logic behind Residual Networks is to make (Y = X)

X is input, Y is output, Y=F(x) in traditional neural networks.

If we make F(x)=0, then it is easy to make the input equal to the output

Y = X + F(x)

Y = X + 0

Y = X

We learn from Y in typical networks, but in residual networks, we learn from F(x), and our target is to make F(x) = 0. Then only we can make input equal to output.

Resnet Layer Structure

We have 50 layers in this ResNet architecture.

1 + 9 + 12 + 18 + 9 + 1 = 50 layers

Resnet 50 with skip connections

Several filters and strides are predefined for Resnet 50 architecture. If we calculate the total number of layers, we will get 50 layers. Stride here is used to reduce the size of the image. Here we add one layer’s output to a particular layer’s input using skip connections.

Calculating the output of the first layer of ResNet:

Filter size = 7*7 with 64 such filter maps

Stride = 2

Padding = 3

(n+2p-f/s) + 1 = (300+2*3–7/2) + 1 = 150*150*64 is the output size

Here we can see that the image size is reduced from 300*300 to 150*150 because we use stride 2. This will be the input to the next layer.

Max Pooling Layer:

Here we have 3x3 max pooling operation with stride 2 and padding 1. So, the image size will be reduced to 75*75 from 150*150. We can see that the image size is reduced from 300*300 to 75*75 with 64 filters in each.

Resnet Blocks

[1] Identity Block

We use an identity block when the image’s input and output sizes are similar. When the image’s input and output sizes differ, we use a convolutional block.

When the input to the network and output of the network is identical, then use the identity block.

[2] Convolutional Block

When the input to the network and output of the network is not the same, then use the convolution block.

We add a convolutional layer to the identity block to make the input size equal to the output.

There are 2 options for matching the output size:

[1] padding the input.

[2] perform 1*1 convolutions.

Uses of ResNet Architecture

[1] Image classification

[2] Object localization

[3] Object detection