Invalid Convolution Layer handling

User 2042 | 6/10/2015, 5:44:35 AM


I am using Graphlab Create v1.4.0.gpu and first of all I want to tell that it's a great tool. There have been visible improvements from v1.3.0. But there's something that I don't get right now in graphlab.deeplearning.NeuralNet. How does it actually handle invalid convolution layer in a neural network? To give an example, let's say I have 256x256 images as the input of the net, and after that I want a convolution layer with kernel size 3, stride 2 and no padding. This wouldn't be structurally possible, since the output size, (256-3)/2 + 1 is not an integer. But it somehow works without giving any errors, and it even finishes the training successfully, the neural net delivering decent results. So I would like to know what does it do in case of invalid network architecture. Does it skip some pixels of the input in order for the output size to 'fit'? IF it does this, it should at least be reported in some way. Even better would be to automatically display the sizes of all the layers at the start of the training process. And something else I would like to point is that the activation layer between successive conv layers should be more visible. I suppose that if you stack two conv layers and don't set and activation function explicitly between, sigmoid will be used, but this should at least be displayed at the start of the training process. I say this because I tried to explicitly put relu layers after conv layers and the network gives really really poor performance, and it should at least perform similarly, as relu has been proved to be better thatn sigmoid and tanh in convolution networks.

Thank you.


User 940 | 6/12/2015, 1:50:33 PM

Hi @"Ionel Alexandru Hosu" ,

First, to answer your questions. The input is padded automatically in this case. If you don't specify activation functions, there will simply be a linear activation function. Therefore, it is important to specify the non-linearity.

Thanks for the suggestions, we will take your comments into account when making plans for improvements.

Cheers! -Piotr

User 2042 | 6/17/2015, 1:52:56 PM

Hi Piotr,

Thank you for your answers. I am afraid that it is still unclear for me. Padding means adding zeros on all the margins of the image, so instead of 256x256 we would have an input of 258x258, if the padding dimension is 1. However, even with the padded input, (258-3)/2 + 1 is still not an integer. The only thing that would make the output an integer would be to pad the input only to the left / only to the right, but this is something I didn't hear about in my experience with CNNs. The correct move would be to resize the input images from the beginning to an odd number of pixels, like 257x257 for example.

On another note, I implemented the same network (using relu) in other frameworks and the results were much better, as they usually should be when using relu vs something else. I would suggest taking a look at why this happens, because I think that it did not happen only in my case.

Cheers and keep making it better! Ionel

User 1190 | 6/22/2015, 8:33:01 PM

Hi @"Ionel Alexandru Hosu"

Thanks for your comments. Would you mind posting the sample code of the network you are using and the expected accuracy of your reference implementation? We are interested in investigating it more.

Thanks, -jay