Training a Deep Learning model

User 1319 | 3/6/2015, 3:05:42 AM

Hi,

I am training a deep learning model (for image classification) with a modified <i class="Italic">Imagenet</i> net (I changed the <i class="Italic">numhiddenunits</i> in layer 22 to match the number of classes in my application). I'm using GraphLab Create 1.3 with EC2 's Nvidia GRID K520 GPU (3072 cores, 8GB). The training set is around 30K image (resized 256 by 256) with 5 classes.

I would greatly appreciate it if you kindly answer the following questions:

1- Is it possible to retrain a deep learning model by passing a trained network to the <i class="Italic">net</i> argument in the <i class="Italic">gl.neuralnet_classifier.create</i>?. This saves a lot of time if more iterations are required to improve the model accuracy, or new observations become available.

2- What is the stopping criteria for GraphLab's deep learning neural network?

3- From your experience, what would be an educated guess for the <i class="Italic">max_iterations</i> ?

4- Given the same number of observations for two datasets, does the difference in number of classes between the two datasets greatly increase the <i class="Italic">max_iterations</i> (for example, two datasets with 50K image each, but one has 10 classes and the other 100 classes).

5- I've noticed that <i class="Italic">gl.neuralnetclassifier.create </i> does not have <i class="Italic">classweights</i> argument . Does <i class="Italic">gl.neuralnet_classifier.create</i> handle an imbalanced dataset automatically, or do I have to under-sample my dataset to balance the classes before I train the model?

6- In my model, I kept <i class="Italic">validation='auto' </i>, Is the 5% validation set chosen using stratified random sampling (to maintain similar class distribution)?.

7 - To make the results of deep learning models reproducible , is it possible to have <i class="Italic">seed</i> argument in the <i class="Italic">gl.neuralnetclassifier.create</i> (similar to <i class="Italic">randomsplit(0.8, seed=5)</i> ). This would also be useful for other stochastic models (e.g. Boosted Trees).

Thank you for your time.

Tarek

Comments

User 1190 | 3/6/2015, 7:13:16 PM

Hi @Tarek,

Thank you for your detailed questions.

  1. No. We currently provide extract features for the same model to be used for different problem domains. We are planning to add retraining or warm start soon.

  2. max_iterations.

  3. It's really problem dependent. You probably have to watch and monitor the validation accuracy. A good range to start would be 20. For more complicated problem like the imagenet, we use 45 iterations.

  4. Again, it is problem dependent and architecture dependent. Say if you only modify the last layer to adapt to the number of classes to your problem, then from the model perspective, the class number difference only affects the last layer (output layer). The more classes you have, the more parameters in the last layer. However, often you don't want to just modify the last layer if the problem size is vastly different not just in the number of classes. Imagenet dataset has 1 million images and 1000 classes, and the architecture has 3-4 convolution + pooling and 2 + 3 full connected layers and the last layer has 1000 hidden units. That's a lot of parameters and require a lot of data to train. Say you have a dataset with 30K images, and 10 classes, then you probably want to simplify the network in addition to adjusting the last output layer.

  5. No, it doesn't have special handle of class weights.

  6. Currently it is uniform sampling. We are changing it to stratified sampling. Right now as a work around you can use the randomsampleby_user (where user is class label in this case), in the recommender toolkit to generate your own validation set.

  7. This is difficult in that parallel execution are involved and random seed is not enough to guarantee same results during model training.

Thanks -jay


User 1319 | 3/7/2015, 12:03:51 AM

Hi @Jay ,

Thank you for your quick and informative reply.

Just quick questions,

1 - Correct me if I am wrong, we may compensate for a relatively small data set by increasing the max_iterations to train our complex net.

2 - Do you plan to add class-weights to the GraphLab Create deep learning model?

3 - If I want to set the class-weights for a Boosted Trees model, say I have 4 classes (0,1,2 and 3), can I do that as follows: class-weights = {'0' : 0.15, '1': 0.15, '2': 0.2, '3': 0.5}?

Cheers,

Tarek


User 1190 | 3/9/2015, 6:56:34 PM

  1. max_iterations cannot compensate small data. The network has to be simplified.
  2. Yes.
  3. Yes.

User 1319 | 3/9/2015, 7:10:35 PM

Thanks a lot Jay. Tarek