GPU usage by NeuralNetClassifier.extract_features

User 2144 | 8/6/2015, 3:08:56 AM

Does extract_features from the neural net classifier make use of the GPUs if i have graphlab create gpu version installed ? If yes, is there a way(param) to ensure that ? I dont see it making use of the GPUs right now.

Comments

User 1190 | 8/6/2015, 5:39:42 PM

Hi @ajinkya,

Please make sure you are using the gpu egg and have latest cuda driver installed. To verify gpu is in use, try create a neuralnet classifier, and it should print to screen the device that's in use. If it says using cpu, then take a look at the server log, and may find error messages related to failure of using gpu device. Most likely it is because the device driver being too old.

Thanks, -jay


User 2144 | 8/6/2015, 7:29:55 PM

It would be nice if there is a progress/debug info in extract features api when you are extracting features from a large training set. Any chances of this happening anytime soon ?


User 1359 | 8/7/2015, 4:59:27 PM

Thanks for the feedback! We'll definitely look into adding this features.


User 2144 | 8/9/2015, 7:50:00 PM

extract_features fails for me for large image datasets, and I am not sure where to look at the logs. None of the graphlab server logs has any information on why its failing. Any suggestions ?


User 2144 | 8/10/2015, 7:51:13 PM

@"Jay Gu" @"Dick Kreisberg" any suggestions for the log files ?


User 1190 | 8/12/2015, 6:03:57 PM

@ajinkya, what is the error message when extract_features failed?


User 2144 | 8/15/2015, 2:50:53 AM

there was no error on the regular console logs. The process vanished, which is why i was curious if there were other graphlab server logs i could use to debug what the error was.


User 15 | 8/17/2015, 6:07:23 PM

Hi @ajinkya,

When the server starts, it should print out the location that the graphlab server log is written. Is this the "regular console log" you referred to? If not, I would expect the answer to be in the server log. Also note that the real file may have a ".0" appended to the name, as we use log rotation and install a symlink without the ".0" for when the process is running. If the process vanished, the log without the ".0" will not be there, but the one with will still be. Let me know if you can find the right log.

Evan


User 2144 | 8/18/2015, 4:50:34 PM

Hi @EvanSamanas Thanks for the reply. I was referring to the server logs. I am not sure if I am able to get much out of it unless i am missing something. Here is the log

1439879144 : INFO: (start:226): Function entry 1439879145 : INFO: (construct_from_sframe_index:78): Construct sframe from location: sframe/train_shuffle 1439879145 : INFO: (create_arrays_for_reading:135): Opening Frame for Reading of size (29979,3) 1439879145 : INFO: (head:665): Function entry 1439879145 : INFO: (create_arrays_for_writing:236): Opening Frame for writing to with 1 segments and 3 columns 1439879145 : INFO: (construct_from_sframe:72): Function entry 1439879145 : INFO: (lazy_astype:1163): Function entry 1439879145 : INFO: (remove_column:470): Args: 3 1439879145 : INFO: (head:665): Function entry 1439879145 : INFO: (create_arrays_for_writing:236): Opening Frame for writing to with 1 segments and 3 columns 1439879145 : INFO: (materialize:219): Materializing: digraph G { "26412624" [label="B: transform"] "26413264" [label="D: SF(S3)"] "26415184" [label="A: SF(S1,S2)"] "26415344" [label="C: UP(A:0;B:0;A:1)"] "26415184" -> "26415344" "26412624" -> "26415344" "26413264" -> "26412624" } 1439879145 : INFO: (materialize:224): Optimized As: digraph G { "26412624" [label="B: transform"] "26413104" [label="C: UP(A:0;B:0;A:1)"] "26413264" [label="D: SF(S3)"] "26415184" [label="A: SF(S1,S2)"] "26415184" -> "26413104" "26412624" -> "26413104" "26413264" -> "26412624" } 1439879145 : INFO: (construct_from_sframe:72): Function entry 1439879145 : INFO: (load_model:94): Load model from data/imagenet_model 1439879145 : INFO: (load_model:114): Model name: neuralnet_classifier 1439879146 : INFO: (run_toolkit:192): Running toolkit: supervised_learning_get_value 1439879146 : INFO: (get_value:383): Function entry 1439879146 : INFO: (save:38): Function entry 1439879146 : INFO: (run_toolkit:192): Running toolkit: supervised_learning_feature_extraction 1439879146 : INFO: (extract_feature:99): Function entry 1439879146 : INFO: (select_columns:395): Function entry 1439893105 : INFO: (main:611): Quiting with received character: -1 feof = 1 1439893105 : INFO: (~comm_server:207): Function entry 1439893105 : INFO: (stop:234): Function entry 1439893111 : INFO: (save:38): Function entry

Here is the code I am trying

` import graphlab as gl from IPython.display import Image

pathtodir = 'data/' imagesf = gl.SFrame('sframe/trainshuffle') imagesf = imagesf print imagesf.head() print imagesf.numrows() pretrainedmodel = gl.loadmodel(pathtodir + 'imagenetmodel') extractedfeatures = pretrainedmodel.extractfeatures(imagesf[['image']]) imagesf['features'] = extractedfeatures imagesf.save('featuresf') print imagesf.head() print imagesf.num_rows() `

I am trying to do this for around 30k images.


User 940 | 8/20/2015, 9:43:26 PM

Hi @ajinkya ,

This certainly sounds like a problem, but I am unable to reproduce it with my own data. Would it be possible to share the SFrame you are doing this on, to try and reproduce the problem? Have you noticed situations where extract_features has worked/where it hasn't? What OS are you using? Are you using the gpu egg? Which version of GraphLab Create is this?

Thanks for your patience.

Cheers! -Piotr


User 2144 | 8/20/2015, 10:00:12 PM

this is 1.5.2 gpu version. It works if i change line 6 to image_sf = image_sf[0:20000] which is basically restricting the dataset to 20k points. So I am wondering if its a memory limit issue or something ? But the logs are not helpful to conclude that.
OS - Distributor ID: Ubuntu Description: Ubuntu 14.04.3 LTS Release: 14.04


User 2144 | 8/20/2015, 10:01:32 PM

@piotr GPU details +------------------------------------------------------+ | NVIDIA-SMI 346.82 Driver Version: 346.82 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 780 Ti Off | 0000:04:00.0 N/A | N/A | | 43% 52C P0 N/A / N/A | 932MiB / 3071MiB | N/A Default | +-------------------------------+----------------------+----------------------+


User 940 | 8/24/2015, 6:58:03 PM

@ajinkya

I suspect you might be right that this is a memory limit issue. However, I'll investigate this today. Thanks again for your patience.

Cheers! -Piotr


User 940 | 8/24/2015, 8:07:33 PM

@ajinkya

So I just used your code to extract features for 50k images of cats and dogs, also using a GTX 780 with driver 346, with no problems. It's possible that the problem is data specific, is there any way of sharing it with us?

Also, do older versions of GraphLab Create work? This could start helping us narrow down the problem. You could try 1.4.1.

pip install --upgrade --no-cache-dir http://static.dato.com/files/graphlab-create-1.4.1.gpu.tar.gz

Cheers! - Piotr


User 2144 | 8/24/2015, 8:50:21 PM

@piotr Thanks for looking into this! I will try the older version and let you know. Also how long did it take for the 50k images you tried (for my code to run) ?


User 2144 | 8/24/2015, 8:57:47 PM

For me it takes 3 hours for the 20k images. One more thing, I am not able to confirm from the logs that extract_features is using the GPU and didnt find any api to make it specifically use the GPU. So not sure if it is using the gpu. Is there a way to confirm that ? I dont see any graphlab process in the nvidia-smi dump.


User 940 | 8/25/2015, 1:10:56 AM

@ajinkya

20k images on a GTX-780 should not take more than a few minutes. This is a pretty clear indicator that the GPU is not being used for some reason.

You can check if the model uses a gpu with ` pretrained_model['device']

`

The output should be 'gpu'. If not, you should pull down the model by running: pretrained_model = graphlab.load_model('http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45')

Cheers! -Piotr


User 2144 | 8/25/2015, 2:05:42 AM

I was not aware you can check the pretrained model device! Thanks for the info. You were right, the model device came up to be 'cpu'. But downloading the model from dato-datasets fails `

imagenetmodel = graphlab.loadmodel('http://s3.amazonaws.com/dato-datasets/deeplearning/imagenetmodeliter45') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/dato-env/local/lib/python2.7/site-packages/graphlab/toolkits/model.py", line 70, in loadmodel unpickler = glpickle.GLUnpickler(location) File "/dato-env/local/lib/python2.7/site-packages/graphlab/glpickle.py", line 453, in init raise IOError('%s is not a valid file name.' % filename) IOError: http://s3.amazonaws.com/dato-datasets/deeplearning/imagenetmodel_iter45 is not a valid file name. `

Is it down ?


User 940 | 8/25/2015, 5:58:13 PM

@ajinkya

Could you try again? Everything seems to be working for me.

Cheers! -Piotr


User 2144 | 8/25/2015, 6:14:04 PM

@piotr Unfortunately I still get the same error.


User 2144 | 8/25/2015, 6:17:34 PM

@piotr is there any other way I can download the model ? wget didnt seem to work either.


User 940 | 8/26/2015, 12:07:26 AM

@ajinkya

Could you send us the logs for trying to download the model? This is a bit odd.

I've zipped up the model. Now you should be able to just grab it with wget: https://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_iter_45.zip

I've checked that this model has a 'gpu' value for the device attribute.

Cheers! -Piotr


User 2144 | 8/26/2015, 12:18:53 AM

@piotr Thanks the zip worked. These are the logs you asked for - 1440525626 : INFO: (init_extensions:181): Autoloading of dato-env/lib/python2.7/site-packages/graphlab/libgpu_count.so 1440525626 : INFO: (load_toolkit:385): Attempt loading of dato-env/lib/python2.7/site-packages/graphlab/libgpu_count.so 1440525626 : INFO: (load_toolkit:425): Library load of dato-env/lib/python2.7/site-packages/graphlab/libgpu_count.so 1440525626 : INFO: (load_toolkit:478): Adding function: get_gpu_count 1440525626 : INFO: (register_toolkit_function:41): Function entry 1440525626 : INFO: (start:226): Function entry 1440525627 : INFO: (load_model:94): Load model from http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45 1440525627 : PROGRESS: (download_url:49): Downloading http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45/dir_archive.ini to /var/tmp/graphlab-ajkale/2528/000000.ini 1440525627 : PROGRESS: (download_url:49): Downloading http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45/objects.bin to /var/tmp/graphlab-ajkale/2528/000001.bin 1440526043 : INFO: (load_model:114): Model name: neuralnet_classifier 1440526043 : ERROR: (operator():9): CUDA Error: out of memory 1440526043 : ERROR: (operator():132): Unable to load model from http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45: CUDA Error: out of memory 1440526992 : INFO: (load_model:94): Load model from http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45 1440526992 : PROGRESS: (download_url:49): Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/dir_archive.ini to /var/tmp/graphlab-ajkale/2528/000003.ini 1440526993 : PROGRESS: (download_url:49): Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/objects.bin to /var/tmp/graphlab-ajkale/2528/000004.bin 1440527069 : INFO: (load_model:114): Model name: neuralnet_classifier 1440527069 : ERROR: (operator():9): CUDA Error: out of memory 1440527069 : ERROR: (operator():132): Unable to load model from http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45: CUDA Error: out of memory 1440528962 : INFO: (load_model:94): Load model from http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45 1440528962 : INFO: (load_model:114): Model name: neuralnet_classifier 1440528962 : ERROR: (operator():9): CUDA Error: out of memory 1440528962 : ERROR: (operator():132): Unable to load model from http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45: CUDA Error: out of memory


User 2144 | 8/26/2015, 1:58:09 AM

@piotr seems likes its something to do with my GPU acting funky. I still get the CUDA Error even with the pre downloaded imagenet model. I will restart and check if that helps resolve this. Sorry for the trouble, the s3 link to load model was never the problem :/


User 2724 | 12/9/2015, 6:46:28 AM

@piotr When i run pretrained_model['device'] after loading imagenet model, I get 'cpu' as output. Is it right?

When I try to do nearest neighbour classifier or neural net classifier or logistic classifier on a set of images, will there be a problem if device is a CPU ?


User 940 | 12/11/2015, 11:10:27 PM

Hi @goelsvaibhav ,

That depends! Generally, there is nothing wrong with using a cpu rather than a GPU, but it can be slower for neural net classifiers. Do you have a GPU on your computer? What are you trying to do?

Cheers! -Piotr


User 2724 | 12/14/2015, 5:42:42 AM

Hi @piotr

I am trying to create a model for classifying a set of images based on features extracted via DeepFeatureExtraction.

I am currently using a macbook having 2.5 Ghz Intel Core i5 processor. I was wondering is there be a difference, in terms of feature extraction, via a cpu vs a gpu? If yes, then which GPU do you recommend?

Thanks! Vaibhav


User 1190 | 12/14/2015, 8:47:36 PM

Yes, the difference between using CPU and GPU is at least one order of magnitude. For choosing GPU for deeplearning, please take a look at this thread: http://forum.dato.com/discussion/881/advise-regarding-buying-a-gpu and this: http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/