Questions regarding Deep Learning parameters.

User 2001 | 8/21/2015, 2:53:34 PM

I'm trying to compare different deep learning libraries, and am relatively unfamiliar with graphlab, so I have a few questions!

For weight initialisation, I notice that options are restricted to gaussian and xavier. Am I correct in assuming 'xavier' refers to glorot initialisation, as in 'Xavier Glorot'. If so, is this normal or uniform Glorot initialisation?

Secondly, is optimisation limited to SGD? I'd like to try AdaGrad, AdaDelta, RMSprop, etc., as available in other libraries.

Finally, what loss function does the classifier use for training?

Theano libraies such as Keras and Lasagne allow for other initialisations, including both uniform and glorot uniform in addition to normal. They also include a wider range of optimisers and cost functions. Are we likely to see options such as these in GraphLab in the future?

Comments

User 940 | 8/22/2015, 9:39:15 AM

Hi @Joeetaku ,

You are correct in saying that 'Xavier' refers to 'Xavier Glorot', and this is the uniform version of that initialization.

For now, optimization is performed by SGD.

The loss function is the standard cross-entropy loss.

We are always looking for ways to improve our product, and feature requests are welcome. Do you have a use case in mind where these different options might be beneficial?

Cheers! -Piotr


User 2001 | 8/24/2015, 8:41:25 AM

Hi Piotr,

Thank you for your response. I also forgot to ask if SGD uses Nesterov momentum?

Admittedly I don't have other use cases in mind, all I know is that the other two libraries I'm looking at (Keras and Lasagne), both based on Theano, provide more options with respect to the choice of optimisation and initialisation methods, and also the loss function. If you want GraphLab to provide the functionality of other libraries, this is something you may wish to consider. The MNIST example for Keras uses AdaGrad instead of SGD and performs pretty well.

Thanks again,

Joe


User 2001 | 8/24/2015, 11:01:51 AM

Sorry, another question! Am I correct in understanding that ConvolutionNet only supports image data at the moment? If so, are there plans to extend functionality to other popular deep learning tasks such as audio classification and natural language processing?


User 940 | 8/24/2015, 7:39:21 PM

Hi @Joeetaku Our SGD uses what could be called classical momentum. We do not yet have Nesterov momentum.

Currently, one can use all layer types on images, and everything but convolution and pooling with array type input. However, it's possible to interpret many types of input as an image. For instance, audio can be turned into a spectrogram, which is an image type input.

We are actively thinking about what next steps we should take for our Deep Learning toolkit. We are splitting our focus between refining the user experience of our current Deep Learning toolkit and adding functionality. Most likely, we'll add support for other deep learning tasks before we add new kinds of solvers.

Any suggestions and feature requests are always welcome!

Cheers! -Piotr


User 2001 | 8/25/2015, 12:40:23 PM

Thanks yet again @piotr!

I'm also a bit refused regarding the choice of terminology for channels:

Looking at this example: https://dato.com/learn/gallery/notebooks/buildimagenetdeeplearning.html

The number of channels is set to numbers such as 96, 256, 384 etc... surely this is the number of filters and NOT the number of channels? I thought channels refered to the number of colour values, i.e. monochrome is 1 channel, RGB is 3 channels, etc. This is how the reseize method of imageanalyis seems to interpret them: https://dato.com/products/create/docs/generated/graphlab.imageanalysis.resize.html

Can you please clarify this? Perhaps it would be more helpful to call this 'num_filters' if this is the case?

Kind regards,

Joe


User 940 | 8/25/2015, 6:01:14 PM

Hi @Joeetaku ,

I believe that 'channels' and 'filters' are interchangeable in the context of Convolution Layers.

I'll investigate this though.

Cheers! -Piotr