🔌 Toolbox of short, reusable pieces of code and knowledge.
One of the tedious parts when working with constructing neural networks is figuring out the right shape and size given your inputs. This note summarizes this math as noted in the CS231N notes. The wikipedia page also has some nice arithmetic.
Size-wise, we aim to preserve the dimensions of the input after passing it through a convolutional layer.
and produces a volume of size \(W_{2} \times H_{2} \times D_{2}\)
Choosing hyperparameters: General rule of thumb: \(F=3, S=1, P=1\) To preserve input size, ensure that $P=(F-1) / 2$.
In CNN architectures, we usually insert a few pooling layers in between the convolutional layers. These are responsib le for reducing the spatial size of the input, therefore decreasing the number of parameters and preventing overfitting. As the CS 231N notes put it: “pool layers are in charge of downsampling the spatial dimensions of the input.”.
and produces a volume of size \(W_{2} \times H_{2} \times D_{2}\)
General rule of thumb: $F=2, S=2$.
Note: there are many types of pool layers. It is common to use MaxPool2d
.
Activation functions connect the output of a previous layer to the input of the next. See here.
In general, there are two main layers of a neural network where an AF might be applied: hidden layers and output layers.
tanh
, sigmoid
, or ReLU
. For ConvNets, we tend to use ReLU
and that is what we will use here. Regarding the size, ReLU
leaves the size and shape of the input volume unchanged.linear
, logistic (sigmoid)
, softmax
. Our choice will depend solely on the task we are performing. For regression, we usually just output the value we are trying to predict directly using a linear layer/activation. For binary classification, we use a sigmoid activation. And for multi-class classification, we use a softmax activation.Machine learning mastery has a nice diagram for deciding:
INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC
It has become increasingly common to “drop out” a few neurons between layers to reduce overfitting (regularization). Such layers are called dropout layers.
A dropout layer does not change the input size.