Neuralnet_classifier: Epoch vs. iteration

User 3125 | 1/25/2016, 7:12:31 PM

Hi, I wanted to experiment a bit with the learning rate schedule and the different parameters for exponential decay. However, I got confused by the usage of the words iteration and epoch in the documentation and the output of the neuralnet_classifier.create function.

My understanding (I could be wrong) from the literature is that an epoch is a full pass of all training examples, while an iteration is a full pass of a batch (whatever it's size). However, the output I got from the neuralnet_classifier.create function in the progress table suggests that iteration and epoch are used as synonyms, as the value in iteration column only changes once the number in the examples column has reached the size of my training set.

What does that mean for the learningrateschedule, learningratestep and learningrategamma parameters? The documentation for exponentialdecay says that the learning rate is decreased over iterations. I guess this means a decrease happens each time the iteration number increases? The other two parameters and the formula given (newlr = lr * lrgamma^(epoch/lrstep)) to calculate the updated learning rate refer to epochs. Would it be correct to assume that epoch and iteration are used as synonyms? Thus, reducing the step size of the imagenet net from 100000 to 100 and leaving the gamma at 0.1 would give me a learning rate of approx. 0.000977 at epoch (iteration) 2 and 0.000955 at epoch 3 when starting with a learning rate of 0.001?

Comments

User 940 | 1/26/2016, 4:41:22 PM

Hi @x2748 ,

Thank you for pointing out the inconsistency in our documentation. In our code, iteration is equivalent to epoch, where both are the full data. This is for consistency with other tools in our product.

I hope this helps!

Cheers! -Piotr


User 3125 | 2/5/2016, 8:59:33 AM

Hi,

thanks so far but it seems I got something wrong or overlooked something as I tried to train a net with some adjusted parameters. I hope you can give some advice.

My goal was to increase the batch size to better utilize the gpu memory (I use two 4 GB GTX 980). With a batch size of 500 I now utilize about 99% on the first one and 98% on the second one.

In a paper from Alex I read that when increasing the batch size by a factor k you should also increase the learning rate by that factor, which gave me a new learning rate of 0.033.

The first time I trained a neural net with graphlab I used the standard settings as described in your tutorial (https://dato.com/learn/gallery/notebooks/buildimagenetdeeplearning.html) except for an increased batch size (340 instead of 150). With regard to learning rate decay the relevant settings from the model.conf file were:

learningrate: 0.01 learningrategamma: 0.1 learningrate_step: 100000

The last one I found strange as your documentation says "update the learning rate every learningratestep number of epochs. (default 1)" as I would interpret this setting: as update the learning rate every 100000 epochs which does not make any sense, as the max_iterations parameter was set to 35 and we already established abp0ve that iteration and epoch are used as synonyms.

Coming from this formula given in the documentation

lr * lrgamma^(epoch/lrstep))

could you please clarify how the learning rate decay progresses with the above settings? Could it be that here the learning rate step actually does not refer to epochs but instead to the batch size (meaning 1 "epoch" in this sense is a pass over the images in the batch, meaning with 1.2 mil pictures in total an adaptation of the learning rate takes place every 100,000 learningratestep / (1,200,000 pictures/ 150 batch size) "epochs")?

Do you have a suggestion for some sensible settings with my configuration (especially with a batch size of 500)?

Some other suggestions:

Would it be possible to get a print out of the parameters used together with the model checkpoint file in a future version? Would it be possible to get a print out of the progress after each epoch instead of after the training has completed?