Feature weighting possible with GraphLab's classification algorithms?

User 2360 | 10/4/2015, 12:01:22 AM

First of all, kudos to the builders of GraphLab and the documentation base. I am jumping into machine learning and trying to get a working example for my use case before starting with more in-depth research. I am already seeing some amazing results from some simple experiments.

Is it possible to prioritize a feature when using any of GraphLab's classification algorithms?

I am trying to automate the mapping of database objects into WikiData. I am processing this information for every field of every object: the database field name, the contents of that field and the context of the object. I want to predict the correct WikiData field name based on those three pieces of information. The field name is by far most important, contents are of medium importance and the additional context holds only little importance.

I am having trouble with false predictions because in some cases the most important piece of information (database field) gets overwhelmed by the larger amount of content in other features. Repeating the most important field multiple times in the ngram feature doesn't seem to help.

Here is one writing I found about feature weighting (on nearest neighbour classifiers). http://www.fon.hum.uva.nl/praat/manual/kNNclassifiers111__Feature_weighting.html

Comments

User 940 | 10/6/2015, 6:26:21 PM

Hi @eesahe ,

Thanks for the kind words about GraphLab Create!

We do support feature weighting with our Nearest Neighbor Classifier. Specifically, when you look at the following https://dato.com/products/create/docs/generated/graphlab.nearestneighborclassifier.create.html#graphlab.nearestneighborclassifier.create, you can specify a composite distance with custom weights for different features. If you have prior knowledge about feature importance.

Hope this helps! Let us know if you have any other questions!

Cheers! -Piotr