Graphlab and Java support

User 690 | 12/17/2014, 3:49:04 AM

Hi Everybody, Initially I remember that Graphlab came with a Java API. But Graphlab create seems to have a python only API? Has Graphlab Java API been discontinued? Is Graphlab Create Java API on the road map? Thanks, Sunil.

Comments

User 10 | 12/17/2014, 7:47:45 PM

Hey Sunil -

GraphLab Create Java API is on the roadmap, and you are correct, currently GraphLab Create does not offer a Java API.

Can you share a little more about why a Java API is important to you? Do you have existing projects in Java that use a different ML platform that you are looking to port to GraphLab Create? Are you thinking Java because of company/team policy or existing infrastructure?


User 690 | 12/18/2014, 5:26:09 AM

Thanks Rajat for the response. While you talked about Graphlab Create not having a Java API currently but it being on the roadmap.. <b class="Bold"><b class="Bold">You didn't answer the other part of my question about Graphlab's (not Graphlab Create) original JAVA api status (which I remember was there)</b></b>. I am interested in Java since I want to use it in a JVM hosted language (clojure to be precise) .. Even with basic Graphlab Create, I feel one cannot use a package as is and will need some (some times a lot) glue code.. which usually will involve compute intensive stuff, and I am not convinced, I can do that with python(with out loss of performance) . For now Graphlab Create works fine ..but when i want to productionize it .. I need some glue code.


User 6 | 12/18/2014, 4:32:37 PM

HI, While GraphLab Create's interface is in Python, the compute engine is implemented in C++ and is way more efficient than Java. You are welcome to try it out. We would love to setup a phone call with you and understand your production setup, as we are now adding support for other platforms to make it easier to deploy GraphLab in production. Email me if interested.


User 18 | 12/22/2014, 7:48:35 PM

Hi Sunil,

We're phasing out support for GraphLab PowerGraph and replacing them with GraphLab Create. You can still access <a href="https://github.com/graphlab-code/graphlab">PowerGraph source on github</a>.

Re: Java API, what is it that you need to glue code to do that requires a lot of computation? It is in data ingest/export/cleaning? Model performance tracking? Or something else? If it's an essential part of production, then it's something that we'd like to understand and support down the road.

Alice


User 1276 | 2/12/2015, 6:44:33 PM

I'd like to use GraphLab create Java API from a Scala environment, e.g. using Spark Notebook instead of IPython notebook.

Vu Ha CTO, Semantic Scholar http://www.quora.com/What-is-Semantic-Scholar-and-how-will-it-work


User 1375 | 3/6/2015, 7:00:05 PM

Piggybacking on this thread, we have a commercial search application that needs to apply a trained model (or models, including feature pre-processing) to a large (~100K, perhaps more) set of documents during a search session with a duration hard limit in the order of 100ms. Our stack is JVM-based (Solr, Lucene). In this context, going through python to access Graphlab Create (GLC) or using REST to query Dato Predictive Services (DPS) both seem prohibitively slow. As such, how would you advise to deploy our GLC models in this JVM environment with low latency constraints? One thought we had was to write a JNI extension to talk to the GLC process directly. Does something like this already exist? Any other ideas?

Thank you


User 1394 | 3/6/2015, 7:51:01 PM

Hey msainz -

There are 2 points to address is your question. Let me address them separately.

<b class="Bold">1) JNI extension</b>

This is not something the unity_server process is designed for. It is not currently architected to have non-Python clients connect to it.

<b class="Bold">2) Integrating into low-latency environments (&lt; 100ms latency)</b>

Have you tried this with Dato Predictive Services yet? Depending on the model and the data, we regularly seen end to end latency being between 10-100ms.

When thinking of Dato Predictive Services, consider using Custom Predictive Objects, and aggressively using the caching features. The 100K documents can be processed once and cached (potentially) and then the call on the model doesn't required preprocessing 100K documents on each request.

My assumption is that the 100K documents are from a larger corpus of documents (say 1M) that isn't changing with each request. So as a separate process each document can be preprocessed and the results cached in the Predictive Service, making the overall request much faster.

For docs on Custom Predictive Objects, check out: http://dato.com/learn/userguide/index.html#Deployment (search for Defining a Custom Predictive Object) and https://dato.com/products/create/docs/generated/graphlab.deploy.predictiveservice.predictiveservice.PredictiveService.add.html#graphlab.deploy.predictiveservice.predictiveservice.PredictiveService.add

If you would like to discuss offline, please feel free to email me at rajat@dato.com.


User 1375 | 3/6/2015, 8:00:06 PM

Thank you @"Rajat Arya". The corpus is large (>20M) but distributed (SolrCloud). There are features which are search query-independent and relatively unchanging, and thus can be preprocessed as you suggest, but there are also features which are query-document specific. We'll do more math and also try Dato Predictive Services and Custom Predictive Objects in due time. Thank you