Runtime exception (std::bad_alloc) upon calling graph summary function

User 165 | 3/21/2014, 5:39:40 PM

Environment: Fedora 20 x86_64 Hardware: 2x 6-core Xeons w/ 256GB RAM (single machine) Dataset: 31M vertices, 184M edges; csv with edge pairs is ~16GB

The dataset was parsed and loaded into the graph data structure rather quickly and (apparently) without error, but upon calling a simple print g.summary() the program terminated with the following error: Traceback (most recent call last): File "graph.py", line 12, in <module> print g.summary() File "/usr/lib/python2.7/site-packages/graphlab/datastructures/graph.py", line 258, in summary ret = self.proxy.summary() File "graph.pyx", line 50, in graphlab.cython.graph.UnityGraphProxy.summary File "graph.pyx", line 51, in graphlab.cython.graph.UnityGraphProxy.summary RuntimeError: Runtime Exception: 0. 2 exception raised: std::badalloc std::bad_alloc

Any help would be very much appreciated.

Comments

User 14 | 3/21/2014, 6:00:55 PM

Hi jcw005,

The Graph data structure is implemented lazily, which means the actual load did not happen until the summary function is called. The bad_alloc indicates an out of memory error, however, 256G RAM should be sufficient to hold the graph.

Here are a few things you can do to help us diagnose the problem: - How long did the summary function take before the badalloc was thrown? - What function do you use to load the csv? - Can you try loading a subset of the csv? - Can you try loading the csv into SFrame first, and use g = Graph().addedges(sf) function to load the graph? - If you load the edge csv into SFrame first, you should be able to g = Graph().add_edges(sf.head(10000)) to load a subset of the edges.

Thanks, -jay


User 165 | 3/21/2014, 8:58:15 PM

Wow, thanks for the very quick response-time. Here's what I've got.

1) Counting from after the parsing finished to the moment the error was raised took ~25 minutes.

2) As per the tutorial, I was calling the readcsv method of SFrame: gl.SFrame.readcsv()

3) I tried up to half the dataset, which worked; I didn't try anything larger than that.

4) I believe that's what I was already doing; I'm loading the data into vertex + edge SFrames via the readcsv method, instantiating the Graph data structure, and then calling the Graph data structure's addvertices() and add_edges() functions with the two SFrames passed in as parameters.


User 14 | 3/21/2014, 9:49:45 PM

Hi jcw005,

Thank you for your quick response. Based on the information, it is an out of memory error. We are working on making the Graph scale significantly better on a single machine. Please stay in touch.

In the mean time, it is possible to reduce the memory footprint. Although the size of the graph you are trying to load is not huge, the actual csv file size is quite big, averaging 90 bytes per edge. You can try to reduce the number and size of attributes on the edges and vertices. For example, avoid adding unnecessary attributes by subselecting columns in the SFrame before adding to the Graph. Using int/float type when possible rather than string.

Finally, there is a INFO message on graphlab start, which shows the place for grahplab_server log file. If you are interested, you can take a look at that file to see how far the graph loading gets, and how much memory it uses on a successful loading.

Let me know if it helps. Thanks, -jay


User 165 | 3/21/2014, 11:57:00 PM

Thanks for the suggestion to shrink the row length. I changed a few things and it works now; thanks!