problem with classifier model

User 5324 | 6/24/2016, 2:47:06 AM

in assignment of week3 in the specialization of machine learning I mean that course : https://www.coursera.org/learn/ml-foundations/home/welcome

the problem about creating a classifier featurse which are that :

selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate'] when I run these commands they work :

import graphlab products=graphlab.SFrame("amazon_baby.gl") products['countword']=graphlab.textanalytics.countwords(products['review']) products = products[products['rating'] != 3] products['sentiement'] = products['rating'] >= 4 traindata,testdata=products.randomsplit(.8 , seed=0) wordlist = ['awesome', 'wow' , 'great'] for word in wordlist: products[word] = products['count_word'].apply(lambda x: x[word] if word in x else 0L)

  • and when I run this :products , also works and give me the output this :

​ Out[10]: name review rating countword sentiement awesome Planetwise Wipe Pouch it came early and was not disappointed. i love ... 5.0 {'and': 3, 'love': 1, 'it': 2, 'highly': 1, ... 1 0 Annas Dream Full Quilt with 2 Shams ... Very soft and comfortable and warmer than it ... 5.0 {'and': 2, 'quilt': 1, 'it': 1, 'comfortable': ... 1 0 Stop Pacifier Sucking without tears with ... This is a product well worth the purchase. I ... 5.0 {'ingenious': 1, 'and': 3, 'love': 2, ... 1 0 Stop Pacifier Sucking without tears with ... All of my kids have cried non-stop when I tried to ... 5.0 {'and': 2, 'parents!!': 1, 'all': 2, 'puppet.': ... 1 0 Stop Pacifier Sucking without tears with ... When the Binky Fairy came to our house, we didn't ... 5.0 {'and': 2, 'cute': 1, 'help': 2, 'doll': 1, ... 1 0 A Tale of Baby's Days with Peter Rabbit ... Lovely book, it's bound tightly so you may no ... 4.0 {'shop': 1, 'be': 1, 'is': 1, 'it': 1, 'as': ... 1 0 Baby Tracker® - Daily Childcare Journal, ... Perfect for new parents. We were able to keep ... 5.0 {'feeding,': 1, 'and': 2, 'all': 1, 'right': 1, ... 1 0 Baby Tracker® - Daily Childcare Journal, ... A friend of mine pinned this product on Pinte ... 5.0 {'and': 1, 'help': 1, 'give': 1, 'is': 1, ... 1 0 Baby Tracker® - Daily Childcare Journal, ... This has been an easy way for my nanny to record ... 4.0 {'journal.': 1, 'all': 1, 'standarad': 1, ... 1 0 Baby Tracker® - Daily Childcare Journal, ... I love this journal and our nanny uses it ... 4.0 {'all': 1, 'forget': 1, 'just': 1, "daughter's": ... 1 0 wow great 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 [166752 rows x 8 columns] Note: Only the head of the SFrame is printed. You can use printrows(numrows=m, numcolumns=n) to print more rows and columns.

notice : you see awesome , wow and great columns

but when I run the model

model=graphlab.logisticclassifier.create(traindata,target='sentiement' ,features = wordlist, validationset = test_data)

it gave me a strange error :

ToolkitError: The following columns were expected but are missing: ['great', 'awesome', 'wow']

although you see the columns when I run : products

really I dont konw what is going

the whole error


ToolkitError Traceback (most recent call last) <ipython-input-11-0d04a516a3d0> in <module>() 1 model =graphlab.logisticclassifier.create(traindata , target='sentiement' , ----> 2 features = wordlist, validationset = test_data)

/home/zynab/miniconda2/envs/dato-env/lib/python2.7/site-packages/graphlab/toolkits/classifier/logisticclassifier.pyc in create(dataset, target, features, l2penalty, l1penalty, solver, featurerescaling, convergencethreshold, stepsize, lbfgsmemorylevel, maxiterations, classweights, validationset, verbose) 306 lbfgsmemorylevel = lbfgsmemorylevel, 307 maxiterations = maxiterations, --> 308 classweights = class_weights) 309 310 return LogisticClassifier(model.proxy)

/home/zynab/miniconda2/envs/dato-env/lib/pythonMarkdown`�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using conda install numpy=1.10.1

Comments

User 940 | 6/24/2016, 8:39:04 PM

Hi @zynab,

This is because you split into train and test data before you add word features to the original products dataset. The random split returns new SFrames which are unaffected by your word count operation.

python train_data,test_data=products.random_split(.8 , seed=0) word_list = ['awesome', 'wow' , 'great'] for word in word_list: products[word] = products['count_word'].apply(lambda x: x[word] if word in x else 0L) I hope this helps.

Cheers! -Piotr


User 5324 | 6/24/2016, 10:20:19 PM

thanks @piotr it works now