Performance problem in PS when reading data from external db (Key:Value) store for classification.

User 2032 | 8/14/2015, 2:41:16 PM

Hi Guys,

Background (simplified): - I have a very frequent background job that refreshes feature data for each example as new data keeps comming in - This background job stores this data in a Key Value store (aerospike) - I have a Predictive Object that takes the Key as input and uses a graphlab model (classifier) on the Value (dict of features)

def predict(key):
    retrived_dict = get_from_kv(key)
    sf = sf_from_dict(retrived_dict)  # <--- Performance problem
    return {'prediction': list(model.predict(sf))[0]}
  • I supply about 250 features to classifier (the retrival is super fast -> 300 piko seconds)
  • Converting the dict to an SFrame takes ages (c.a. 1s)

Here is the code to reproduce the problematic line (and my two takes for getting SFrame from dict):

import random
retrived_dict = {str(i) + "_some_very_long_feature_name": i if i % 2 else random.random() for i in xrange(0, 250)}
%timeit sf = gl.SFrame([retrived_dict]).unpack('X1', column_name_prefix='', na_value=0)
%timeit sf = gl.SFrame({k: [v] for k,v in retrived_dict.iteritems()})

The question is:

Is there a better way to feed a classifier that would :

a) work directly with dict (preferred) b) have a faster conversion from dict to SFrame (without creating the frame on disk)

I feel this is crucial for any serious deployment (In our case models and features are to big to deploy a new predictive object every time data changes)

Please reply ASAP - we plan a deployment on Sunday and this is a blocker.

Kind regards, Jan

Comments

User 2032 | 8/14/2015, 3:04:16 PM

Now there is a way to speed it up to c.a. 100ms (still not acceptable for our usecase)

df = pd.DataFrame([retrived_dict])
%timeit gl.SFrame(df)

User 2032 | 8/14/2015, 3:20:50 PM

In case it helps here is the config of the GLC I tested this on (v. 1.5.3)

{'GRAPHLAB_CACHE_FILE_HDFS_LOCATION': '',
 'GRAPHLAB_CACHE_FILE_LOCATIONS': '/tmp',
 'GRAPHLAB_DEFAULT_NUM_GRAPH_LAMBDA_WORKERS': 16,
 'GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS': 50,
 'GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_DIR': '/etc/pki/tls/certs',
 'GRAPHLAB_FILEIO_ALTERNATIVE_SSL_CERT_FILE': '/etc/pki/tls/certs/ca-bundle.crt',
 'GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY': 2147483648,
 'GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY_PER_FILE': 2684354560,
 'GRAPHLAB_LIBODBC_PREFIX': '/usr/lib/x86_64-linux-gnu',
 'GRAPHLAB_ML_DATA_STATS_PARALLEL_ACCESS_THRESHOLD': 1048576,
 'GRAPHLAB_ML_DATA_TARGET_ROW_BYTE_MINIMUM': 262144,
 'GRAPHLAB_NEURALNET_DEFAULT_GPU_DEVICE_ID': 'auto',
 'GRAPHLAB_ODBC_BUFFER_MAX_ROWS': 10000,
 'GRAPHLAB_ODBC_BUFFER_SIZE': 2147483648,
 'GRAPHLAB_SFRAME_CSV_PARSER_READ_SIZE': 52428800,
 'GRAPHLAB_SFRAME_DEFAULT_BLOCK_SIZE': 65536,
 'GRAPHLAB_SFRAME_DEFAULT_NUM_SEGMENTS': 50,
 'GRAPHLAB_SFRAME_FILE_HANDLE_POOL_SIZE': 3072,
 'GRAPHLAB_SFRAME_GROUPBY_BUFFER_NUM_ROWS': 37950124,
 'GRAPHLAB_SFRAME_IO_READ_LOCK': 0,
 'GRAPHLAB_SFRAME_JOIN_BUFFER_NUM_CELLS': 524288,
 'GRAPHLAB_SFRAME_MAX_BLOCKS_IN_CACHE': 800,
 'GRAPHLAB_SFRAME_MAX_LAZY_NODE_SIZE': 10000,
 'GRAPHLAB_SFRAME_READ_BATCH_SIZE': 128,
 'GRAPHLAB_SFRAME_SORT_BUFFER_SIZE': 24288079872,
 'GRAPHLAB_SFRAME_SORT_MAX_SEGMENTS': 768,
 'GRAPHLAB_SFRAME_SORT_PIVOT_ESTIMATION_SAMPLE_SIZE': 20000,
 'GRAPHLAB_SFRAME_WRITER_MAX_BUFFERED_CELLS': 33554432,
 'GRAPHLAB_SFRAME_WRITER_MAX_BUFFERED_CELLS_PER_BLOCK': 262144,
 'GRAPHLAB_SGRAPH_BATCH_TRIPLE_APPLY_LOCK_ARRAY_SIZE': 1048576,
 'GRAPHLAB_SGRAPH_DEFAULT_NUM_PARTITIONS': 8,
 'GRAPHLAB_SGRAPH_HILBERT_CURVE_PARALLEL_FOR_NUM_THREADS': 50,
 'GRAPHLAB_SGRAPH_INGRESS_VID_BUFFER_SIZE': 3145728,
 'GRAPHLAB_SGRAPH_TRIPLE_APPLY_EDGE_BATCH_SIZE': 1024,
 'GRAPHLAB_SGRAPH_TRIPLE_APPLY_LOCK_ARRAY_SIZE': 1048576,
 'GRAPHLAB_USE_GL_DATATYPE': 0}

User 2032 | 8/14/2015, 3:23:42 PM

Another option would be to keep the SFrame binary (cannot pickle it) in aerospike but I'm not sure how to do this.


User 1178 | 8/14/2015, 4:58:28 PM

Hi Johnny,

Unfortunately there is some overhead for initializing an SFrame, we are looking at this issue now and should have improvement soon. There is one way to make things a little faster, but may still not fit your requirements:

 sf = gl.SArray([retrived_dict]).unpack(column_name_prefix='')

In the mean time, can I suggest a solution for your case?

Predictive Service internally maintains a cache layer so that it can serve request faster if the cache has been warmed. Can you add a functionality in your background job layer (that populates the aerospike cache) to also warm up the Predictive Service cache for any new keys added to the cache? That way, all requests coming from your Predictive Service client would get the answer from cache directly, which should be very fast.

Let me know how it goes!

Ping


User 2032 | 8/14/2015, 5:11:43 PM

Hi Ping,

Your solution won't do since every request is unique and I cannot pre-load them to cache. In simplifying I said I just use 1 key from aerospike - in fact I use 2 which would give me c.a. 30M x 2M possible requests to cache - impossible.

I found a workaround (have not finished but it is promising) i pack all features into a dict when teaching the classifier - creating a single dict column is much, much faster (c.a. 10 ms) and hence acceptable.

Will keep you posted, Jan


User 1178 | 8/14/2015, 5:26:04 PM

Hi Johnny,

Glad to know that! Using dict as feature was one of the solutions we come up too in house but thought maybe it is too much change for you! :)

In the mean time, we fixed the code path for Panda to make it faster (down from 100ms to about 25ms). We could deliver a private build to you if you need. Otherwise, we will roll it out to next release.

Thanks a lot and please keep the good feedback coming!

Ping


User 2032 | 8/14/2015, 5:36:43 PM

Hi Ping,

I will stick to the dict solution if it works (will require a lot of time to train the classifier again :().

Please come up with a solution that takes less than 10ms, preferably less than 1ms (this is a single row after all) - we work in AdTech so 25ms is still ages for a classification of a single row (classification is fast enough so it would be such a waste to have 95%-99% of the request time consumed by converting from dict to single row SFrame)

Another solution would be to convert (transpile) gl classifiers after training to fast python/cython code that works on python dicts. Kind of a "single row" mode. This should not be overly complex for most classifiers. If I code it myself I will let you know.

Kind regards, Jan


User 2032 | 8/16/2015, 7:17:35 AM

Just a small clarification on my metric system error -> "I supply about 250 features to classifier (the retrival is super fast -> 300 piko seconds)" should obviously be "I supply about 250 features to classifier (the retrival is super fast -> 300 micro seconds)".

P.S. I should finish a transpiler for a RandomForest classifier today and will let you know how it went.


User 2032 | 8/16/2015, 2:10:31 PM

Ok guys I had to go faster so I have written the transpiler:

[https://www.dropbox.com/s/skkkf5yri52ho2a/graphlabrandomforesttranspiler.html?dl=0](https://www.dropbox.com/s/skkkf5yri52ho2a/graphlabrandomforesttranspiler.html?dl=0 "https://www.dropbox.com/s/skkkf5yri52ho2a/graphlabrandomforest_transpiler.html?dl=0")

for a 50 trees, 30 depth and 270 features it classifies 1 row in c.a. 0.5-1ms but you have to be careful with memory consumption for larger forests with large trees. So it is 100X faster that running graphlab on a single example.

Sorry that I share it in HTML but I have to make sure that I will not leak any IP. You have to have your own model and test set for training.

Now I dare you to go faster:)

If you like it please let me know how I should contribute it to the comunity. Btw this is hackathon code quality (I only had a few hours to write and test it) so please review, write extra tests etc.

Kind regards, Jan


User 2032 | 8/16/2015, 2:21:49 PM

Also please note that this will not work for nested features.


User 2032 | 12/15/2015, 4:38:29 PM

I see that link for the transpiler expired. So just in case here it is again [https://www.dropbox.com/s/fmf2gjh1zfczaub/graphlabrandomforesttranspiler.html?dl=0](https://www.dropbox.com/s/fmf2gjh1zfczaub/graphlabrandomforesttranspiler.html?dl=0 "https://www.dropbox.com/s/fmf2gjh1zfczaub/graphlabrandomforest_transpiler.html?dl=0")


User 1190 | 12/15/2015, 9:55:09 PM

Hi JohnnyM,

Thanks for your "transpiler" idea. We have a bunch of improvements to the tree models in the coming release and will keep you updated.

-jay