I have been using graphlab for the past couple of weeks and am really in love with its functionalities and capabilities. I know that there currently exists the random split function to split into training and testing data, but for extremely large number of records it will be really good to have something like caTools in R or traintestsplit in python for better creating testing and training sets!

Any suggestions or ideas?

A sample python code for what I mean is below.

import pandas as pd from sklearn.crossvalidation import traintest_split

url = '' quality = pd.read_csv(url)

train, test = traintestsplit(quality, trainsize=0.75, randomstate=88)

qualityTrain = pd.DataFrame(train, columns=quality.columns) qualityTest = pd.DataFrame(test, columns=quality.columns)


