boosted_trees_classifier and MemoryError: std::bad_alloc

User 2568 | 3/11/2016, 1:15:56 AM

I'm working with a data set with 76020 rows and 307 features and looping over boostedtreesclassifier.create for a range of parameters:

params = {'target':'TARGET', 'validation_set':validate,'verbose':False, 'random_seed':8923,
      'early_stopping_rounds':10, 'max_iterations':300}

for d in [2, 4, 6]:
    for r in [1, 0.8, 0.6]:
        for c in [1, 0.8, 0.6]:
            model[d,r,c] = gl.boosted_trees_classifier.create(train, 
                            max_depth=d, row_subsample=r, column_subsample=c, **params)
            print "depth:", d, "row:", r, "col", c, "auc:", model[d,r,c]['validation_auc']

after 10 iteration I'm getting MemoryError: std::bad_alloc

Does this just mean I'm using all the available memory on my server?


User 1190 | 3/11/2016, 6:29:37 PM

Hi Kevin,

Can you please provide the following information? 1. Does it happen consistently? 2. GLC version 3. Operation system 4. Amount of memory on your machine 5. Amount of available memory before iteration 10

Thanks, -jay

User 2568 | 3/12/2016, 3:20:54 AM

  1. Yes
  2. '1.8.4'
  3. AWX Linux - Linux x8664 x8664 x86_64 GNU/Linux
  4. I ran this on a few different AWS ec2 instance. One had 1 GB and other 8gb. The first had the error earlier.
  5. Not sure how to answer this.

My work in in this repo. THe data is in Data and the offending code is In[5]. Ensure you run the initialisation cells at the start.

User 1190 | 3/12/2016, 9:25:15 PM

Thanks for the information. 1GB memory is not sufficient to run boosted trees in memory on the dataset with that size. Please run it with a larger instance, e.g. 8gb.

User 2568 | 3/12/2016, 10:19:12 PM

Jay, It looks more like a memory leak to me. I've run this on an EC2 T1.medium. This has 3GB memory. I get through 10 iterations before I get the MemoryError: std::bad_alloc. If I immediately run the command again the MemoryError is immediate and I get no iterations

While I am storing the models in memory, these are deleted before the next run.

User 1190 | 3/17/2016, 8:38:12 PM

Hi Kevin,

You identified an important memory leak in the tree models. The trained model still keeps state which is used only in training. These state are not freed when training is finished. Fortunately, saving the model and loading it back clears the unnecessary states.

So, here is a workaround which I've tested and works pretty well:

` import graphlab as gl import psutil

sf = gl.SFrame('/data/mnist2') train, test = sf.random_split(0.8)


models = [] memoryused = [] before = psutil.virtualmemory().used for i in range(10): m = gl.boostedtreesclassifier.create(train, 'label', maxiterations=1) models.append(m) after = psutil.virtualmemory().used memoryused.append(after-before) print memoryused


models = [] memoryused = [] before = psutil.virtualmemory().used for i in range(10): m = gl.boostedtreesclassifier.create(train, 'label', maxiterations=1)'tmpmodel') models.append(gl.loadmodel('tmpmodel')) after = psutil.virtualmemory().used memoryused.append(after-before) print memory_used `

I will work on a fix to clear the stated of a trained tree model.

Thank you for reporting the issue. -jay


User 3252 | 3/17/2016, 10:14:00 PM

Well done Jay.

I have a question: I am not too familiar with Python. I am using an AWS instance for the coursera course. When I tried to import psutil, I got the following error message.

ImportError:** No module named psutil**

How do I import psutil?

User 1190 | 3/17/2016, 11:31:15 PM

conda install psutil or pip install psutil

User 3252 | 3/22/2016, 12:53:27 AM

These commands do not work on my AWS instance. ( coursera graphlab image) Both these commands failed with error:

`conda install psutil SyntaxError: invalid syntax

pip install psutil SyntaxError: invalid syntax`