How to design network architecture manually

User 5222 | 5/25/2016, 10:58:32 AM

Why every convnet I design by hand tends to learn very poorly, even for training data

Comments

User 5159 | 5/25/2016, 5:21:54 PM

Designing convnet requires some experience in both structure choosing and hyper parameter choosing. I suggest to replicate famous structure first, eg AlexNet, Inception-BN, then try to improve a little bit based on it.


User 5222 | 5/25/2016, 5:43:29 PM

I did replicate the convnet from IDSIA, by Schmidhuber and collaborators. The structure and hyperparams are exactly the same. But the accuracy rate still lag far behind the one chosen from the data by Dato


User 5222 | 5/25/2016, 5:44:11 PM

I did replicate the convnet from IDSIA, by Schmidhuber and collaborators. The structure and hyperparams are exactly the same. But the accuracy rate still lag far behind the one chosen from the data by Dato


User 5159 | 5/25/2016, 11:27:45 PM

There is minor difference between toolkit. For example, some toolkit normalizes gradient at loss layer, some are not. In this case, although learning rate looks same but in real it is 100 times difference. Please try smaller learning rate. Also if possible, please post your network in the forum so I can help you take a look.


User 5222 | 5/26/2016, 4:36:57 AM

Here it is: layer[0]: ConvolutionLayer initrandom = gaussian padding = 0 stride = 1 numchannels = 100 numgroups = 1 kernelsize = 3 layer[1]: RectifiedLinearLayer layer[2]: MaxPoolingLayer padding = 0 stride = 2 kernelsize = 2 layer[3]: ConvolutionLayer initrandom = gaussian padding = 0 stride = 1 numchannels = 150 numgroups = 1 kernelsize = 4 layer[4]: RectifiedLinearLayer layer[5]: MaxPoolingLayer padding = 0 stride = 2 kernelsize = 2 layer[6]: ConvolutionLayer initrandom = gaussian padding = 0 stride = 1 numchannels = 250 numgroups = 1 kernelsize = 3 layer[7]: RectifiedLinearLayer layer[8]: MaxPoolingLayer padding = 0 stride = 2 kernelsize = 2 layer[9]: FlattenLayer layer[10]: FullConnectionLayer initsigma = 0.01 initrandom = gaussian initbias = 0 numhiddenunits = 200 layer[11]: RectifiedLinearLayer layer[12]: FullConnectionLayer initsigma = 0.01 initrandom = gaussian initbias = 0 numhidden_units = 43 layer[13]: SoftmaxLayer

I replicated the ConvNet from the winner's paper: http://people.idsia.ch/~ciresan/data/ijcnn2011.pdf


User 5222 | 5/26/2016, 9:11:17 AM

Here come the parameters: network parameters learning_rate = 0.001 momentum = 0.9 end network parameters


User 5222 | 5/26/2016, 9:21:01 AM

@Bing , I tried to reduce to learning rate by 2 magnitudes as you suggested But the model still refused to learn anything new, even from the training set


User 5159 | 5/26/2016, 7:48:37 PM

I think you may try to add BatchNormalization after each convolution to see whether it works. Using BatchNormalziation will reduce the difficulty in choosing initialization. As your network is very wide, I think you can try Xavier initialization first, then try BatchNormalization


User 5222 | 5/27/2016, 10:33:09 AM

@Bing , the Xavier init doesn't work out I'll replace the ReLU by Batch Normlz as you suggest Will report the results to you as soon as I get them


User 5159 | 5/30/2016, 2:20:29 AM