[GL Create] TopicModel.evaluate() = { perplexity:nan } if BOW contains integers instead of strings.

Bug: m.evaluate(...) will return {'perplexity': nan} if the BOW contains integers as keys instead of strings.

To ad to my confusion: - topicmodel.create will provide estimated perplexity if given a validation set - topicmodel.create will not report any errors and compute both number of docs and vocabulary correctly

Use case: - treating topic_modeling as a fuzzy clustering technique for multi-sets (not your average use-case I presume but not uncommon i.e. in ad tech)


Thanks for the bug reports. We will address these in a future release.

We agree that topic modeling can definitely be used for multi-sets. We'd be very interested in hearing more about your particular use case! Feel free to get in touch.

This is not critical anymore since I just converted integers to strings and I'm not seeing any significant overhead doing it right now.

I'd say the behavior of validation_set and Est. perplexity as described in my other posts is a more pressing issue.

I'm in touch with Rajat and we might do a blog post on our use case some time in the future.

