userguide example CNN on MNIST

User 5217 | 5/22/2016, 4:19:09 PM

When printing net.layers following the userguide example of Neuralnet Classifier, the convolution layer looks like this: layer [0]: CovolutionLayer initrandom = xavier padding = 1 stride = 2 numchannels = 32 num_groups = 1 kernel size = 3 In orde to calculate the outputsize, I am using standard formule [ (H-K+2P)/S +1 ; (W-K+2P)S +1 )] H = height inputimage W = width inputimage K = kernel size P = zero-padding S = stride http://slideplayer.com/slide/9059497/ (slide 26) Input images have size (28x28x1) so when I fill in the formule it gives: [ (28-3+2*1)/2 ] +1 = 14,5 --> How is it that this is not an integer??

Comments

User 5159 | 5/23/2016, 10:19:30 PM

Hi @Emmal, in implementation, it is at https://github.com/dmlc/mxnet/blob/master/src/operator/convolution-inl.h#L350, fro example, height is: h + 2 * pad_h - (1 * (kernel_h - 1) + 1) / stride_h + 1.

In your example, it will be: 28 + 2 1 - (1 2 + 1) / 2 + 1 = (28 + 2 - 3) / 2 + 1 = 27 / 2 + 1. Note all number are int type, then 27 / 2 = 13 but not 13.5. 13 + 1= 14 will be final result.


User 5217 | 5/24/2016, 9:29:02 AM

Thanks for your reply! I am writing my masterthesis about CNN's, hence my question.

But when I try to draw this to see the amount of neurons (with receptive field of 3, stride of 2 and zero-padding 0), it seems like this does not fit the amount of pixels?


User 5159 | 5/24/2016, 6:48:14 PM

Maybe you can try to draw a 1D convolution example, it is more clear than 2D


User 5217 | 5/25/2016, 8:21:12 AM


User 5159 | 5/25/2016, 11:33:06 PM

Hi Emmal, I think you made a mistake. There is only 1 neuron if channel is set to 1.


User 5217 | 5/26/2016, 6:26:28 AM

How does the convolution layer works when only one neuron is used? Do you know a source with a brief explanation? Or a visualization? Thanks for your help :)


User 5222 | 5/26/2016, 7:51:53 AM

My calculation is the same as @EmmaL . It's not a whole number, Try to adapt the hyper-parameters to make the 3D layers fit together


User 5159 | 5/26/2016, 7:45:46 PM

The shape of convolution weight is: (numchannelout, numchannelin, kernelx, kernely). The neuron number is determined by numchannelout. Assume kernelx = 1 and kernely = 1 and numchannelout=1, numchannelin = 3, the weight is a vector of 3 number. What it does is: do dot production on each pixel of input, mapping input from 3 dimension vector to 1 dimension vector. If you set out_channel to larger number, there will be more neuron, sliding over input.