tfidf bug?

User 1326 | 5/7/2015, 12:40:10 PM

Hi everyone,

today I have encouraged a really strange behaviour of the function graphlab.textanalytics.tfidf. I have created a really simple dataset consisting of three documents:

	[{'s1': 3.0}, {'s2': 4.0}, {'s1': 5.0}]

after computing tfidf scores, I got

	[{'s1': 0.0}, {'s2': 4.394449154672438}, {'s1': 0.0}]

According to documentation, tfidf score is given by following formula

	TF-IDF(w,d)=tf(w,d)∗log(N/f(w))

so for second document: 4.0 log(3.0/1.0) = 4.394449154672439, but for the first one: 3.0 log(3.0/2.0) = 1.2163953243244932.

Am I missing something or this is a bug?

Comments

User 398 | 5/7/2015, 5:15:10 PM

Hi ziky,

Thanks for using GraphLab Create. This is a known issue and we've addressed it in the latest version of GraphLab Create (1.4) which is to be released very, very soon (hopefully before the end of the week). Sorry for the inconvenience.

Robert


User 1326 | 5/10/2015, 12:06:57 PM

Ok, thanks.