User 2955 | 1/5/2016, 7:11:56 AM

Hi,

I have a client who i'm build an app for; I have created a linear*regression model using Dato tool via iPythonNotebook - To implement runtime prediction was pretty straight forward by exporting the linear*regression model['coefficients']

boosted*trees*regression is performing much better than linear_regression so I'm exploring implementing that but I don't fully understand how to export and predict at run time in other languages (i.e. away from DATO or a real time service)

How would i go about doing this?

User 940 | 1/5/2016, 9:02:06 PM

Hi @"Joe Booth" ,

There are two ways to run at run time in other languages. One is using our product Predictive Services, where you can access your trained models via Rest API. We also have clients for several languages.

Alternatively, the tree is encoded in JSON within the model, you can access it via model['trees_json'].

I hope this helps! Cheers! -Piotr

User 2955 | 1/6/2016, 12:49:35 AM

@piotr thank you.... i need this to run real time within the app without a service call; performance is key.

Where do I find the algorithm / example code for how to predict at run time

also, where can i find comparative benchmarks for comparing prediction time between different algorithms / ML approaches - i'm trying to compare and struggling to find articles on this (i'm just looking for directional information, obviously there are many factors that will influence it)

Many thanks

User 940 | 1/6/2016, 7:56:56 PM

@"Joe Booth" ,

I'll get a sample code for you, and post it by tomorrow.

As for your second question, I'll try to answer it in very general terms.

In very general terms, non-parametric models like Nearest Neighbors can be very slow at predict time. They are computing pair-wise distances between most of the points in the training set, and this can get very expensive. Alternatively, most parametric models are simply matrix multiplications. So it depends on how big/how many multiplications there are. Linear models, like logistic regression, are very fast. OTOH Neural networks depend tremendously on depth and size.

I hope this helps!

Cheers! -Piotr

User 2955 | 1/7/2016, 6:00:37 AM

Great, thank you on both points! Look forward to the code!!

User 940 | 1/7/2016, 7:38:59 PM

Hi @"Joe Booth" ,

Here's a pseudo-code sample for python. You could use any other language once you've pulled out the json_encoded trees.

``python`

from math import exp

def predict*single*tree(tree, input):
# get margin for single tree
# traverse tree to leaf node, return value

model = graphlab.random*forest*classifier.create(training, features=['feature list'], target='valid')
json = model.get('trees_json')

def multi*class*predict(trees, classes, input):
'''
trees as a list (0, 3, 6) from above example
classes is a list of classes: [0, 1, 2] from above example
input is the input to predict

```
output is a list of probabilities by class, should sum to 1
'''
k = len(classes)
margin = [] # could be numpy array instead
for c in classes:
for i in range(c, len(trees), k): # should go 0, 3, 6 for class 1, 4, 7 for class 1
margin[c] += predict_single_tree(trees[i], input)
# normalize
soft_max = map(exp, margin) # exp(margin) for each margin
soft_max_sum = soft_max.sum()
prob = []
for c in classes: # could be done with generators / list comprehension
prob[c] = exp(margin[c]) / soft_max_sum
return prob
```

```
`
Let me know if this is what you were looking for.
```

Cheers! -Piotr

User 2955 | 1/16/2016, 4:51:57 AM

@piotr - this is perfect, thank you!!!