Hey msainz -
There are 2 points to address is your question. Let me address them separately.
<b class="Bold">1) JNI extension</b>
This is not something the unity_server process is designed for. It is not currently architected to have non-Python clients connect to it.
<b class="Bold">2) Integrating into low-latency environments (< 100ms latency)</b>
Have you tried this with Dato Predictive Services yet? Depending on the model and the data, we regularly seen end to end latency being between 10-100ms.
When thinking of Dato Predictive Services, consider using Custom Predictive Objects, and aggressively using the caching features. The 100K documents can be processed once and cached (potentially) and then the call on the model doesn't required preprocessing 100K documents on each request.
My assumption is that the 100K documents are from a larger corpus of documents (say 1M) that isn't changing with each request. So as a separate process each document can be preprocessed and the results cached in the Predictive Service, making the overall request much faster.
For docs on Custom Predictive Objects, check out: http://dato.com/learn/userguide/index.html#Deployment (search for Defining a Custom Predictive Object) and https://dato.com/products/create/docs/generated/graphlab.deploy.predictiveservice.predictiveservice.PredictiveService.add.html#graphlab.deploy.predictiveservice.predictiveservice.PredictiveService.add
If you would like to discuss offline, please feel free to email me at email@example.com.