Integrating Graphlab with standalone web visualization tool

User 2165 | 8/12/2015, 8:40:29 PM

My team is creating a graph visualization tool as a web application running JavaScript, HTML and other basic web technologies. But the actual machine learning and data analysis for the data we are visualizing is run on a Graph Lab server. What would be the best way to connect to Graph Lab from the web application? Does a simple API call work to retrieve the data as JSON objects? What are the best practices? Ideally we'd like to have users select different Graph Lab queries directly from the visualization app, have the app pass the query, retrieve the output and then somehow get it into a JSON format to visualize it for the user.

Any suggestions? I haven't been using Graph Lab much at all and I work mostly on the engineering side of projects like this rather than the hard core machine learning side.


User 4 | 8/13/2015, 5:56:33 AM

Enabling web applications to sit on top of GraphLab Create (and make use of SFrame, SGraph, and Model objects), is something I have been giving a lot of thought to! GraphLab Canvas (what comes up when you call .show on an object in GraphLab Create) is a web application, and uses JSON as the transport mechanism, so it is very similar to what you are describing. However, GraphLab Canvas is currently closed source (so I can't provide any specific code samples from our code), and is not currently extensible (so there is no way to add the functionality you are describing to GraphLab Canvas). But I hope that by sharing the general approach we can enable others to build useful web apps on top.

GraphLab Create data structures are generally JSON serializable, by converting to built-in Python types. SFrame is generally trivially convertible to list: list(sf). SGraph is convertible to two lists (which you could then put in a dict, if you wanted to group them together): {'vertices': list(sg.vertices), 'edges': list(sg.edges)}. Be aware that this approach only works for relatively small data; SFrame and SGraph can hold terabytes, but you will be limited by the size of RAM once you convert to built-in Python types (and in practice, anything much over a megabyte can get pretty slow going back and forth to a web app, with both serialization and network overhead). Once you have the data in built-in Python types like list or dict, you can serialize them to JSON using the library of your choice: popular choices include the Python built-in json, and simplejson. Many server frameworks (including Tornado, Flask, and Dato's own Predictive Services) will take care of the JSON serialization part for you, as long as the data you pass in is trivially serializable (composed of a subset of Python built-in types). There are also some gotchas in serializing data to JSON from Python (namely around datetime and other types not supported by JSON, as well as things like out-of-range floating point values), but these could happen with any Python data and stem from a mismatch between Python's type system and the supported types in JSON. This can be worked around by massaging the data prior to serialization (using a recursive function to clean the data) or by writing custom serialization code (see JSONEnconder).

To get GraphLab Create to respond to queries over HTTP (REST) and return JSON, you can use many possible approaches. The simplest one would be to use Dato's Predictive Services to wrap your desired queries (as simple Python functions) in a Custom Predictive Object. With this approach, you get all the benefits of Predictive Services (including a load balancer, a runtime configurable number of hosts behind the load balancer, and caching of responses), with your custom REST APIs underneath. Predictive Services is a paid product separate from GraphLab Create, so if you are looking for a DIY solution to write it yourself using Python, I would recommend Tornado or Flask as the server layer. These packages will provide custom routing and handling of HTTP requests with a lot of configurability. From the function that handles the request, you could run Python code to perform the desired operation, then return the result (depending on the framework, it may serialize for you, or you may need to serialize to JSON first).

From the web app client side, you would treat it as a REST API just like any other; this one happens to be using GraphLab Create underneath, but from the web app client perspective, it's just REST -- make an HTTP request, get back an HTTP response with a JSON payload.

Hope that helps! Let me know if I can provide any more advice or if you have any more specific questions.