tfidf bug?

User 1326 | 5/7/2015, 12:40:10 PM

Hi everyone,

today I have encouraged a really strange behaviour of the function graphlab.textanalytics.tfidf. I have created a really simple dataset consisting of three documents:

	[{'s1': 3.0}, {'s2': 4.0}, {'s1': 5.0}]

after computing tfidf scores, I got

	[{'s1': 0.0}, {'s2': 4.394449154672438}, {'s1': 0.0}]

According to documentation, tfidf score is given by following formula


so for second document: 4.0 log(3.0/1.0) = 4.394449154672439, but for the first one: 3.0 log(3.0/2.0) = 1.2163953243244932.

Am I missing something or this is a bug?


User 398 | 5/7/2015, 5:15:10 PM

Hi ziky,

Thanks for using GraphLab Create. This is a known issue and we've addressed it in the latest version of GraphLab Create (1.4) which is to be released very, very soon (hopefully before the end of the week). Sorry for the inconvenience.


User 1326 | 5/10/2015, 12:06:57 PM

Ok, thanks.