User 1703 | 5/7/2015, 10:00:58 PM
I find that graphlab create is amazing for datasets under 1 million rows. Unfortunately I have a sframe with 2.4b rows and roughly 20 columns that I am trying to aggregate. I believe that the aggregated form would be closed to 1.6m rows when rolled up. Unfortunately when I try to add new columns or run filters on the sframe it takes too long and then times out on a xlarge aws instance.
Should I be using something else for manipulation of datasets this large? I am proficient in SQL but not sure how to prepare the best environment.