Pickling/PYTHONPATH problem when using graphlab inside celery jobs

User 2032 | 8/31/2015, 6:08:26 PM

I have a function which works ok when run from ipython.

The same function when converted to a celery task has a problem with sarray.apply when we pass to it a function defined just a line above.

#this is the fuction that works ok when run from ipython
@app.task
def update_item_features_task(**task_config):
    # This has to be moved inside the task because otherwise gl will hang
    import graphlab as gl
    from jobs import update_item_features

    for setting, value in GRAPHLAB_SETTING.iteritems():
        gl.set_runtime_config(setting, value)

    with IncrementalJobContext(
            job_log=logging.getLogger(task_config.get('job_name', __name__)),
            aerospike_config=AEROSPIKE_CONFIG,
            aerospike_namespace=AEROSPIKE_NAMESPACE,
            aerospike_job_set=AEROSPIKE_JOB_SET,
            **task_config) as context:
        context.job_result = update_item_features(context)

The code bit that causes the problem is defined as follows:

def update_item_features(context):

    #.....

    def apply_list_to_bow(*args):
        def list_to_bow(l, format_key=lambda x: str(int(x))):
            from collections import defaultdict
            dd = defaultdict(lambda: 0)
            for i in l:
                dd[format_key(i)] += 1
            return dict(dd)
        return list_to_bow(*args)
    isf['text_bow'] = isf['tokenized'].apply(lambda r: apply_list_to_bow(r, lambda s: s.decode('ISO-8859-1').lower()))

    #....

There were also previous versions that did not work:

  1. Imported listtobow from separate module (there was no applylistto_bow just direct call)
  2. Defined the listtobow above def update_item_features(context)
  3. Defined the listtobow above the apply

The error I recieve:

levelname: ERROR asctime:2015-08-31 19:40:29,018 name:celery.worker.job process:23137 thread:140088927409984 message:Task features.tasks.update_item_features_task[8e5aa5d0-d860-46d4-90c7-dfa09c7c75bf] raised unexpected: RuntimeError(RuntimeError('Runtime Exception. Runtime Exception. Traceback (most recent call last):\n  File "/usr/lib/python2.7/pickle.py", line 1382, in loads\n    return Unpickler(file).load()\n  File "/usr/lib/python2.7/pickle.py", line 858, in load\n    dispatch[key](self)\n  File "/usr/lib/python2.7/pickle.py", line 1090, in load_global\n    klass = self.find_class(module, name)\n  File "/usr/lib/python2.7/pickle.py", line 1124, in find_class\n    __import__(module)\nImportError: No module named features.jobs\n',),)
Traceback (most recent call last):
  File "/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/newrelic-2.54.0.41/newrelic/hooks/application_celery.py", line 66, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/production/code/current/tailor_ai/features/tasks.py", line 62, in update_item_features_task
    context.job_result = update_item_features(context)
  File "/home/production/code/current/tailor_ai/features/jobs.py", line 115, in update_item_features
    isf['text_bow'] = isf['tokenized'].apply(lambda r: list_to_bow(r, lambda s: s.decode('ISO-8859-1').lower()))
  File "/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/graphlab/data_structures/sarray.py", line 1466, in apply
    return SArray(_proxy=self.__proxy__.transform(fn, dtype, skip_undefined, seed))
  File "/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/graphlab/cython/context.py", line 49, in __exit__
    raise exc_type(exc_value)
RuntimeError: RuntMarkdown`�I�M!	��7#	++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using conda install numpy=1.10.1

Thanks for your attention!

RafaelMarkdown558,824,8414L���4L���179.110.206.156179.110.206.1564P�}��Xj�8\j�1str

Comments

User 2032 | 8/31/2015, 6:09:47 PM

Also here is a graphlab initialization log:

levelname: DEBUG asctime:2015-08-31 19:40:00,944 name:newrelic.core.data_collector process:23161 thread:140088785516288 message:Connecting to data collector to register agent with license_key='a088dc2c16f0bd84239573bcccb5a473d8add3f1', app_name='PROD features', linked_applications=[], environment=[('Agent Version', '2.54.0.41'), ('Admin Command', "['/home/production/.virtualenvs/tailor_core/bin/newrelic-admin', 'run-program', 'celery', '-A', 'features.tasks', 'worker', '-l', 'DEBUG', '-Ofair', '-Q', 'update_item_features', '-c', '1', '-E', '-n', 'update_item_features.%h']"), ('Arch', 'x86_64'), ('OS', 'Linux'), ('OS version', '3.16.0-46-generic'), ('Total Physical Memory (MB)', 15789.83203125), ('Logical Processors', 10), ('Physical Processor Packages', 10), ('Physical Cores', 10), ('Python Program Name', '/home/production/.virtualenvs/tailor_core/bin/celery'), ('Python Executable', '/home/production/.virtualenvs/tailor_core/bin/python'), ('Python Home', ''), ('Python Path', '/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/newrelic-2.54.0.41/newrelic/bootstrap:/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages/newrelic-2.54.0.41:/home/production/.virtualenvs/tailor_core/lib/python2.7/site-packages'), ('Python Prefix', '/home/production/.virtualenvs/tailor_core'), ('Python Exec Prefix', '/home/production/.virtualenvs/tailor_core'), ('Python Runtime', '2.7.6'), ('Python Implementation', 'CPython'), ('Python Version', '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]'), ('Python Platform', 'linux2'), ('Python Max Unicode', 1114111), ('Compiled Extensions', 'newrelic.core._thread_utilization'), ('Dispatcher', 'tornado'), ('Dispatcher Version', '(4, 1, 0, 0)'), ('Plugin List', ['warnings', 'code', 'types', 'pprint', 'string', 'SocketServer', 'common', 'cmd', 'multiprocessing', 'dis', 'abc', 'UserList', 'newrelic.hooks.external_requests', 'newrelic.hooks.datastore_redis', 'optparse', '_ctypes', 'glob', 'fnmatch', 'codecs', 'StringIO', 'newrelic.hooks.framework_tornado', 'pkg_resources', 'weakref', 'base64', '_json', 'newrelic.hooks.functools', 'tokenize', 'newrelic.hooks.framework_tornado.stack_context', 'smtplib', 'socket', 'shelve', 'sre_parse', 'pickle', 'newrelic.hooks.external_httplib', 'newrelic.hooks.external_urllib2', 'newrelic.hooks.external_urllib3', 'numbers', 'shutil', 'csv', 'htmlentitydefs', '_weakrefset', 'newrelic.hooks.framework_tornado.newrelic', 'functools', 'sysconfig', 'newrelic.hooks.framework_tornado.gen', 'uuid', 'tempfile', 'httplib', 'decimal', 'token', 'newrelic', 'flower', 're', 'shlex', 'newrelic.hooks.framework_tornado.web', 'logging', 'traceback', 'features', 'parser', 'codeop', 'yaml', '_LWPCookieJar', 'posixpath', 'locale', 'hashlib', 'keyword', 'stringprep', 'newrelic.hooks.framework_tornado.weakref', '_curses', 'newrelic.hooks.framework_tornado.httputil', '_hashlib', '__main__', 'newrelic.hooks.framework_tornado.sys', 'linecache', 'bz2', 'babel', '_ssl', 'tornado', 'newrelic.hooks.external_urllib', '_virtualenv_distutils', 'newrelic.hooks.framework_tornado.logging', 'hmac', '_multiprocessing', 'random', 'datetime', 'copy', 'importlib', 'celery', 'cProfile', 'newrelic.hooks.framework_tornado.template', 'zipfile', 'newrelic.hooks.framework_tornado.iostream', 'ssl', 'newrelic.hooks.re', '_lsprof', 'cookielib', 'resource', 'aerospike', 'bisect', 'threading', '_csv', 'newrelic.hooks.framework_tornado.traceback', 'atexit', 'calendar', 'urllib', '_MozillaCookieJar', 'email', 'Queue', 'ctypes', '_billiard', 'opcode', 'sre_compile', 'pkgutil', 'platform', 'sre_constants', 'json', 'certifi', 'newrelic.hooks.framework_tornado.types', 'raven', 'copy_reg', 'subprocess', 'site', 'io', 'newrelic.hooks.urlparse', 'rfc822', 'requests', 'urlparse', 'billiard', 'gzip', 'heapq', 'distutils', 'newrelic.hooks.framework_tornado.itertools', 'struct', '_abcoll', 'collections', 'kombu', 'textwrap', 'newrelic.hooks.newrelic', 'ConfigParser', 'quopri', 'stat', 'redis', 'newrelic.hooks.application_celery', 'symbol', 'cgi'HTTP/1.1 200 OK

Transfer-Encoding: chunked Date: Thu, 21 Jul 2016 23:13:36 GMT Server: Warp/3.2.6 Content-Type: application/json

016A ["37zyefqi2sweveyp","42fn7zeo6v5ui427","66pt5sk2wz2jrbzu","awoljknjigytdyls","cj2lanoogknwopto","cnm3adnh35xmsx3f","ebxs4t2y6xr5izzy","eg5zus2pz72mr7xb","exshwew2w2jv3n7r","hxrxgzvgms3incmf","hymu5oh2f5ctk5jr","jkisbjnul226jria","lag7djeljbjng6bu","o3l65o4qzcxs327j","qsk2jzo2zh523r24","t7k6g7fkndoggutd","xfllvjyax4inadxh","ygtjzi2wkfonj3z7","yycjajwpguyno4je"] 0


User 2032 | 8/31/2015, 6:10:01 PM

'browsermonitoring.autoinstrument': True, 'enabled': True, 'debug.logautorummiddleware': False, 'featureflag': set([]), 'crossprocessid': None, 'analyticsevents.transactions.enabled': True, 'stripexceptionmessages.whitelist': [], 'maxstacktracelines': 50, 'debug.logdatacollectorcalls': False, 'applicationid': None, 'agentlimits.sqlquerylengthmaximum': 16384, 'utilization.detectaws': True, 'threadprofiler.enabled': True, 'linkedapplications': [], 'transactiontracer.enabled': True, 'loglevel': 20, 'debug.logtransactiontracepayload': False, 'slowsql.enabled': True, 'agentlimits.xrayprofileoverhead': 0.05, 'transactiontracer.generatortrace': [], 'synthetics.enabled': True, 'utilization.detectdocker': True, 'agentlimits.datacompressionthreshold': 65536, 'agentlimits.slowsqldata': 10, 'browsermonitoring.loaderversion': None, 'debug.localsettingsoverrides': [], 'debug.logdatacollectorpayloads': False, 'debug.logthreadprofilepayload': False, 'developermode': False, 'agentlimits.syntheticsevents': 200, 'debug.ignoreallserversettings': False, 'agentlimits.slowsqlstacktrace': 30, 'jsagentloader': None, 'processhost.displayname': None, 'debug.lograwmetricdata': False, 'agentlimits.datacompressionlevel': None, 'debug.logagentinitialization': False, 'browsermonitoring.enabled': True, 'logfile': None}.


User 1394 | 8/31/2015, 6:31:38 PM

Have you seen this Forum thread with a potential workaround to this issue?

http://forum.dato.com/discussion/comment/3259/#Comment_3259

Essentially there might be something incompatible with Celery logging and GraphLab Create logging. Can you try the workaround described here and let me now if it works?

Thanks,

Rajat


User 2032 | 8/31/2015, 7:59:03 PM

I don't think this is the issue. I solved the hanging problem from the above thread by making sure graphlab is not imported outside of task definitions. The problem I'm struggling with right now is caused by lambda workers not being able to unpickle functions that are passed to the apply method and stems from graphlab not seeing the module in which the function is defined.