Loading from Matrix Market Format and JSON

User 1586 | 3/30/2015, 9:59:54 AM

Greetings all,

I am currently trying Graphlab Create with one of my graphs. I am not very experienced with SFrames and not even with Pandas DataFrames.

My graph is stored as a matrix in a Matrix Market format (MMF) file. I also have its metadata (nodes attributes) in a JSON file. I read the MMF file line by line and the I load the JSON as a dict.

Inside the for loop to read line by line the MMF file, I am building a SGraph with the with addedges() and addvertices(). In pseudocode I am doing this: <pre class="CodeBlock"><code> metadata = loadJSON(metadataJSON) g = SGraph() for line in MMMfile: source, dest, weight = parse(line) g = g.addvertices([Vertex(source, attr={"label":metadata[source]}), Vertex(dest, attr={"label":metadata[dest]})] g = g.addedges(Edge(source, dest, attr={'weight': weight}))</code></pre>

If I add the edges without adding the vertices explicitly (and without attributes), this runs quite fast, around 6 min for 16M edges. Adding explicitly the vertices, as shown, is quite slow. My question is, which would be the correct way to add the attributes to the SGraph structure, while keeping it fast?


User 1586 | 3/30/2015, 2:44:19 PM

Actually, I was adding the edges to a list and after the loop added them to the graph. This way, I was achieving 6min loading time. It is much more slow to do it as it is shown in my pseudocode.

Anyway, I still look for a solution to add attributes to vertices as fast as possible.

User 1190 | 4/6/2015, 6:40:10 PM

Hi @palosgrafos ,

The fastest way for constructing SGraph is via SFrames.

The Matrix Market Format can be loaded into an SFrame easily via the <pre><code>gl.SFrame.read_csv</code></pre>.

We also support loading JSON into SFrame.

If you can share a toy dataset, I'm happy to write an example code for "How to build an SGraph from Matrix Market and JSON format".

Thanks. -jay