GLC cannot handle "wide" sframes.

User 2032 | 9/28/2015, 1:27:59 PM

GLC is failing constantly on basic operations (join, filter_by, save) with 113 or weird "missing lambda" errors when confronted with a wide sframe. This is both on 1.5.3 and 1.6.1. There is plenty of disk space and here are the settings:

gl.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 50)
gl.set_runtime_config('GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY', 2147483648 * 20) #40GB
gl.set_runtime_config('GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY_PER_FILE', 134217728 * 400) #40GB
gl.set_runtime_config('GRAPHLAB_ODBC_BUFFER_MAX_ROWS', 10000) 
gl.set_runtime_config('GRAPHLAB_ODBC_BUFFER_SIZE', 2147483648 * 5) #10GB
gl.set_runtime_config('GRAPHLAB_SFRAME_JOIN_BUFFER_NUM_CELLS', 52428800 / 100)
gl.set_runtime_config('GRAPHLAB_SFRAME_SORT_PIVOT_ESTIMATION_SAMPLE_SIZE', 2000000 / 100 )

the machine has 70GB ram.

In my case a 'wide' sframe is 1M rows, 100 columns of which 10 are of type dict. One of those dict columns can contain up to 200K entries per row but the median is much lower.

I had to result to ugly shims (addrownumber -> left outer join on subset, ad_columns) to handle joins but this should not behave like this!

Comments

User 1178 | 9/28/2015, 3:42:39 PM

Hi Johnny,

Thanks for reporting this! This looks like a bug, we will look at this as soon as possible.

Ping