How to load a graph from a DHT file that contains all the vertices and another with the edges?

User 334 | 1/22/2015, 12:00:57 PM

Hi guys,

I am wondering how to make new graphs based on existing ones, but without some edges. It's necessary to perform structure learning.

The way I though I could do that was storing the edges in a DHT (Distributed Hash Table) object and the vertices in another DHT object and then generate the new graph based on those structures. This way I would be able to have two slightly different graphs in memory to compare them and keep the best fit.

So, do you guys know <i class="Italic">how to load a graph from a DHT file that contains all the vertices and another with the edges? Or another way to achieve what I mentioned before?</i>

Thank you for your time :smiley:


User 1774 | 9/11/2015, 2:34:22 PM

Hi, A dramatic solution! The GraphLab SGraph is really an abstraction over two SFrames - one for edges, one for vertices.

`python import graphlab as gl

nodes with node data

nodedata = {"id": ["w", "x" ,"y", "z"], "rank": [13, 99, 78, 15]} nodedatasf = gl.SFrame(nodedata)


edgedata = {"src":["x", "x", "y"], "dst": ["y", "w", "z"], "weight": [50, 12, 29]} edgedatasf = gl.SFrame(edgedata)

Let's print the inner SFrames

graph = gl.SGraph().addedges(edgedatasf, srcfield="src", dst_field="dst") print graph print print graph.vertices print print graph.edges `

Now, if you want to generate a new graph, you can create a subgraph based on your nodes: python graph.get_neighborhood(["x"], radius=0) #only x is chosen graph.get_neighborhood(["x"], radius=1) # x + neighbors are chosen. this is the default

No need to store anything in anywhere besides the SFrames as far as I can see. Note that the vertex / edge data (aka vertex rank / edge weight in this example) is not copied to the new graph. You can add it like this:


choose nodes and neighbors as before

graph2 = graph.getneighborhood(["x"]) graph2 = graph2.addvertices(graph2.vertices.join(nodedatasf, on={"__id": "id"})) print graph2.vertices `

You can also create a subgraph based on edges - let's select 50% of the edges randomly and repeat the example: python head, tail = graph.edges.random_split(0.5) graph3 = gl.SGraph().add_edges(head) graph3 = graph3.add_vertices(graph3.vertices.join(node_data_sf, on={"__id":"id"})) print graph3.vertices print graph3.edges

Does this answer your needs? If not, can you please explain your specific use-case and why do you need a DHT (and also what DHT implementation are you going to use - your own?).