Difference in using Graph Structures vs Tabular data?

User 2490 | 10/27/2015, 4:03:20 AM

I'm very new to ML concepts and have been taking the Coursera Machine Learning signature track. Previous to this I had done research into Graph Databases like neo4j. Can you utilize ML libraries like Dato over graph structures instead of tabular ones? Are other tools necessary for conducting Machine Learning on graph data sets? I'm very new to this so I appreciate anything you can add to the discussion.


User 1592 | 10/27/2015, 4:16:13 AM

Hi GraphLab Create can handle both tabular data and graph data. The tabular data is stored in our special data structure called SFrame (scalable data frame) which is disk based. The graph data is stored in a data structure which uses SFrame called SGraph (scalable graph). This is a light data structure that helps us run graph analytics on top of the tabular data.

Assuming you have an SFrame called data, you can always generate a graph out of it. Each row will be one edge. The nodes are specified using the src column and destination column. For example assume you have a table with shopping data, and you would like to generate a graph between users and shops they bought in. This is very simple:

graph = graphlab.SGraph(data, src_field='user', dst_field='shop')

After you generate the graph you can use any of our graph analytics tools to analyze the graph. For example this is how you compute the pagerank :

pr = graphlab.pagerank.create(graph)

A more detailed code example is here: https://dato.com/learn/gallery/notebooks/gettingstartedwithgraphlabcreate.html

A detailed video explaining the differences between SFrame and SGraph and discusses the implementation: https://www.youtube.com/watch?v=uVaCCYh7JuU&feature=youtu.be&list=PLRu6_g339G-f-UuGxxPRzJeYHN6lItKCK