[GL Create] TopicModel.evaluate() = { perplexity:nan } if BOW contains integers instead of strings.

User 2032 | 6/15/2015, 4:11:58 PM

Bug: m.evaluate(...) will return {'perplexity': nan} if the BOW contains integers as keys instead of strings.

To ad to my confusion: - topicmodel.create will provide estimated perplexity if given a validation set - topicmodel.create will not report any errors and compute both number of docs and vocabulary correctly

Use case: - treating topic_modeling as a fuzzy clustering technique for multi-sets (not your average use-case I presume but not uncommon i.e. in ad tech)


User 19 | 6/15/2015, 8:29:59 PM

Hi Jan,

Thanks for the bug reports. We will address these in a future release.

We agree that topic modeling can definitely be used for multi-sets. We'd be very interested in hearing more about your particular use case! Feel free to get in touch.

Cheers, Chris

User 2032 | 6/15/2015, 8:35:31 PM

Hi Chris,

This is not critical anymore since I just converted integers to strings and I'm not seeing any significant overhead doing it right now.

I'd say the behavior of validation_set and Est. perplexity as described in my other posts is a more pressing issue.

I'm in touch with Rajat and we might do a blog post on our use case some time in the future.

User 19 | 6/16/2015, 5:42:57 PM

OK, sounds good.