How to make a new feature with for each of the selected_words

User 2429 | 10/26/2015, 6:18:09 AM

I want to build a new feature with for each of the selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate'] as a features, selected_words_model = graphlab.logistic_classifier.create(train_data, target='sentiment',features="? ")

But I do not know how to make new features from [selected_words]. Please help me!

Comments

User 1592 | 10/26/2015, 6:25:30 AM

We are almost there. Instead of writing your own code, you can use our sentiment analysis toolkit: https://dato.com/products/create/docs/generated/graphlab.sentimentanalysis.create.html#graphlab.sentimentanalysis.create

If you really want to write your code read below. It seems you are creating a list of positive words and negative sentiment words. However you should prepare a training set of the type I really like this place => 1 I hate this restaurant => 0 the food was wow => 1 etc .etc. The classifier will learn on its own which are the words which give the positive sentiment. Having a mapping between a single word to a score is deterministic for that you do not need machine learning, this is a rule based approach which can be used, but will be less accurate as your list is limited in scope. I suggest reading the following blog post which details some more: http://blog.dato.com/practical-text-analysis-using-deep-learning


User 4 | 10/26/2015, 8:57:37 PM

Hi @"Azadeh Esmaily",

Assuming you already have a bag-of-words representation of the text as a column named word_count, in an SFrame named sf, you can create a new feature (representing the count of a single word) using something like the following:

selected_word='foo' sf[selected_word] = sf['word_count'].apply(lambda bag_of_words: bag_of_words[selected_word] if selected_word in bag_of_words else 0)

This will create a new column in the SFrame whose name is the value of selected_word (in this case foo), and whose value is the count of the word foo, as stored in the column word_count. To do this for each of a set of selected words, you could use a for loop to repeat the process for different words.