Topic model gets corrupted after saving

User 3309 | 3/6/2016, 3:07:46 AM

After I train a topic model and save it using the save method, the model seems to get corrupted. When I loadmodel and export topic words using gettopics method, they look complete different from what they used to be. Is this a known problem?

Comments

User 3309 | 3/6/2016, 3:12:42 AM

If you can fix the problem, that will be great. I will be happy to provide an example upon request. BTW, I have the latest version of GraphLab Create.


User 19 | 3/6/2016, 4:22:34 AM

Hi fenwei,

Yes, would you mind providing a reproducible example? That will help us track down the issue.

Chris


User 3309 | 3/6/2016, 5:02:45 PM

Please look at the CSV file before saving and then after loading the model.


User 3309 | 3/6/2016, 5:22:05 PM

I forgot to attach the CSV files generated by the model. But I couldn't attach it here, there seemed to be some problem with attachments ...


User 3309 | 3/6/2016, 5:25:59 PM

Here they are, CSV files are not allowed :)


User 3309 | 3/9/2016, 3:42:08 PM

Could anyone reproduce this problem? Thanks


User 19 | 3/9/2016, 5:02:56 PM

Hi fenwei,

Could you send the code snippet you used to produce the CSVs?

I have not been able to reproduce this using GLC version 1.8.3.

Thanks! Chris


User 3309 | 3/9/2016, 7:47:19 PM

It is in the notebook of previous attachment (zip) file.


User 19 | 3/9/2016, 11:07:24 PM

He fenwei,

Whoops, I see it now!

Running your code, I get identical CSVs from both the model and the loaded version of it using GLC 1.8.3.

dell_model.save('dell_model_final') dell_model2 = gl.load_model('dell_model_final') print dell_model.get_topics(num_words=30) print dell_model2.get_topics(num_words=30)

gives

`

+-------+-------------+------------------+ | topic | word | score | +-------+-------------+------------------+ | 0 | data | 0.04358176641 | | 0 | information | 0.0312366235104 | | 0 | status | 0.0176507627313 | | 0 | hadoop | 0.0173095653145 | | 0 | capacity | 0.0139131001197 | | 0 | cloudera | 0.0101909464816 | | 0 | business | 0.0096636413829 | | 0 | mobility | 0.00856250426497 | | 0 | make | 0.00831436068909 | | 0 | big | 0.00806621711322 | +-------+-------------+------------------+ [450 rows x 3 columns] Note: Only the head of the SFrame is printed. You can use printrows(numrows=m, num_columns=n) to print more rows and columns. +-------+-------------+------------------+ | topic | word | score | +-------+-------------+------------------+ | 0 | data | 0.04358176641 | | 0 | information | 0.0312366235104 | | 0 | status | 0.0176507627313 | | 0 | hadoop | 0.0173095653145 | | 0 | capacity | 0.0139131001197 | | 0 | cloudera | 0.0101909464816 | | 0 | business | 0.0096636413829 | | 0 | mobility | 0.00856250426497 | | 0 | make | 0.00831436068909 | | 0 | big | 0.00806621711322 | +-------+-------------+------------------+ `

What operating system are you using? Thanks, Chris


User 3309 | 3/10/2016, 6:13:29 PM

I will try it again. I'm using Windows 10.


User 3309 | 3/11/2016, 11:03:05 PM

After upgrading, the issue seems to go away. Thanks!


User 1207 | 3/14/2016, 9:28:05 PM

Glad to be of help! Yes, we fixed this bug in the latest release. Thanks for helping us track it down!

-- Hoyt