Can GraphLab Create's clustering methods progressively save clustering status?

User 2978 | 1/10/2016, 12:45:41 AM

How can I save the K-means clustering centers (and other data like cluster_info) after every N epochs using GraphLab Create's kmeans?

I also wonder how can I resume the clustering from current status (or the latest saved status) if there is an unexpected power outage on my server?

Comments

User 2593 | 1/12/2016, 9:05:00 PM

Hi @JonBaker,

Assuming you have created a kmeans model with one iteration, you can access the cluster centers as an SFrame using model['clusterinfo'] and save that to disk using SFrame.save('path/to/file'). For the second iteration, you can upload that SFrame from disk and pass it on to the updated kmeans model using the argument 'initialcenters'. You can repeat this for N iterations, by doing a loop N times, where each loop reads the centers from the iteration before it and saves the updated centers to disk.

Let me know if this works,

Thanks, Charlie


User 2978 | 1/13/2016, 9:35:25 PM

Hi @cloofa , Yeah your method is one way to go. I hope Dato can provide this function for all clustering methods in the future.