Ensemble of machine learning models

User 2568 | 3/20/2016, 7:40:48 AM

GraphLab does not have the concept of an ensemble of models where an Ensemble:

  • is a collection of models of the same class (i.e., classifier or regression). For example I might create an ensemble consisting of a random forest, boosted tree, a logistic regression and an neural network.
  • extends predict/classify with blending of the individual model predicitions to create the ensemble prediction. For example, if the prediction is a probability it might be the weighted average of all the predictions. If its a class, then it might be a simple voting algorithm. There are many other blending approach, which should be added to Ensemble as standard methods and it should be possible to pass an arbitrary blending method.
  • extends the evaluation functions to work with the Ensemble.

Conceptually this isn't hard. I spent the afternoon creatimg this prototype (ensemble.py and simple example) that does enough for my curent work. Note, I'm a python n00b, so the code is awful, however it outlines the concept.

I'd really like to see this properly written and integrated into GraphLab. Ensembles are commonly used in the Kaggel comps and the addition of an Ensemble model to would simplify this.


User 1190 | 3/21/2016, 6:35:51 PM

Hi Kevin,

Thank you for your feedback. Ensemble models are important techniques in machine learning and have always been on our roadmap. On the other hand, like you said, because the concept is very easy and can be expressed in a few lines in Python, currently we leave it up to our users to implement the ensembles they want. However, I certainly agree with you that it is useful to have it in our API and we will prioritize based on user feedback.

Best, -jay