Serialization Errors, shipping a Classifier to Amazon.

User 2411 | 10/14/2015, 5:35:12 PM

I'd like to ship a graphlab classifier model to EC2 and create an endpoint that I can hit from a website (so I can share my model). It's a simple classifier that takes in a structured wine tasting note and guesses what wine it is.

I've successfully connected my Amazon account, and added a simple function:

def helloWorld(): return "Hello, World!" ps.add('helloWorld',helloWorld) ps.apply_changes()

ps.query('helloWorld') --> {u'response': u'Hello, World!', u'uuid': u'8b3dce34-57e5-4f3f-a5d4-7dc7250f7d49', u'version': 1}

I've also successfully added a function that takes an input: ps.query('returnString', string='WHOA!') --> {u'response': u'WHOA!', u'uuid': u'a7ab7c42-282b-45fc-a507-15c5309eeaa5', u'version': 1} --

However when I try to use a function I've added that takes an array (features in a wine) and classifies it (country and grape), I get a JSON serializable error:

ps.query('guessWineSimple',tastingnote=tastingnote_arr) -->

TypeError Traceback (most recent call last) <ipython-input-66-c23ac260e0aa> in <module>() ----> 1 ps.query('guessWineSimple',tastingnote=tastingnote_arr)

/Library/Python/2.7/site-packages/graphlab/deploy/predictiveservice/predictiveservice.pyc in query(self, poname, **kwargs) 712 try: 713 timeout = self.querytimeout if hasattr(self, 'querytimeout') else 10 --> 714 return self.environment.query(poname, self.apikey, timeout=timeout, **kwargs) 715 except _NonExistError: 716 return "Predictive Object '%s' can not be found. If you just deployed "\

/Library/Python/2.7/site-packages/graphlab/deploy/predictiveservice/predictiveserviceenvironment.pyc in query(self, poname, apikey, timeout, **kwargs) 106 107 self.clientconnection.setquerytimeout(timeout) --> 108 return self.clientconnection.query(poname, kwargs) 109 110 def feedback(self, requestid, apikey, timeout, kwargs):

/Library/Python/2.7/site-packages/graphlab/deploy/predictiveservice/predictiveclient.pyc in query(self, uri, **kwargs) 162 163 internaldata = {'apikey': self.apikey, 'data': kwargs} --> 164 response ='query/%s' % uri, internaldata, timeout=self.querytimeout) 165 if response.status_code == 200: 166 return response.json()

/Library/Python/2.7/site-packages/graphlab/deploy/predictiveservice/predictiveclient.pyc in _post(self, path, data, timeout) 216 217 if data: --> 218 data = json.dumps(data) 219 220 if not timeout or not isinstance(timeout, int):

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.pyc in dumps(obj, skipkeys, ensureascii, checkcircular, allownan, cls, indent, separators, encoding, default, sortkeys, **kw) 241 cls is None and indent is None and separators is None and 242 encoding == 'utf-8' and default is None and not sortkeys and not kw): --> 243 return default_encoder.encode(obj) 244 if cls is None: 245 cls = JSONEncoder

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in encode(self, o) 205 # exceptions aren't as detailed. The list call should be roughly 206 # equivalent to the PySequenceFast that ''.join() would do. --> 207 chunks = self.iterencode(o, one_shot=True) 208 if not isinstance(chunks, (list, tuple)): 209 chunks = list(chunks)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.pyc in iterencode(self, o, oneshot) 268 self.keyseparator, self.itemseparator, self.sortkeys, 269 self.skipkeys, oneshot) --> 270 return iterencode(o, 0) 271 272 def makeiterencMarkdown`�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using conda install numpy=1.10.1

Thanks for your attention!



User 1178 | 10/16/2015, 6:11:00 PM

Hi Rajiv,

Not all python types are JSON serializable. You may convert your input to a value that is JSON serializable and then pass to your model. A suggested way to do that is to wrap your model into a custom function and use the function to do the type conversion:

# create my model
my_classifier_model = gl.classifier.create(...)
# define a custom function to wrap the model
def my_classifier( input_list):
      # convert input_list to a value that classifier expects, input_list must be json serialization
      input_converted = convert_to_array(input_list)
      model_result = my_classifier_model.classify(input_converted)
      return model_result

 # add my custom function as a model in Predictive Service
 ps.add("my_classifier", my_classifier)
 ps.test_query("my_classifier", <put you input here>)

Note that the function myclassifier() is pseudo code, you need to modify to work with your scenario. testquery() is a great way to validate your function before deploy it to service.

Hope this helps!