Graphlab Config

User 190 | 1/12/2015, 5:00:41 PM

I'm wondering if the Graphlab-Create support team could make a small tutorial/explainer on how to properly configure graphlab via the config settings.

These are my current settings:

glconfig = { 'GRAPHLABCACHEFILELOCATIONS': '/GraphlabTemp', 'GRAPHLABDEFAULTNUMPYLAMBDAWORKERS':4, 'GRAPHLABDEFAULTNUMGRAPHLAMBDAWORKERS':4, 'GRAPHLABFILEIOMAXIMUMCACHECAPACITY': 2147483648, 'GRAPHLABFILEIOMAXIMUMCACHECAPACITYPERFILE': 134217728, 'GRAPHLABSGRAPHDEFAULTNUMPARTITIONS': 32, 'GRAPHLABSGRAPHTRIPLEAPPLYEDGEBATCHSIZE': 1024, 'GRAPHLABSGRAPHTRIPLEAPPLYLOCKARRAYSIZE': 1048576, 'GRAPHLABSGRAPHBATCHTRIPLEAPPLYLOCKARRAYSIZE': 1048576, 'GRAPHLABSGRAPHINGRESSVIDBUFFERSIZE': 4194304, 'GRAPHLABSFRAMEFILEHANDLEPOOLSIZE': 256, 'GRAPHLABSFRAMECSVPARSERREADSIZE': 52428800, 'GRAPHLABSFRAMEDEFAULTBLOCKSIZE': 65536, 'GRAPHLABSFRAMEDEFAULTNUMSEGMENTS': 32, 'GRAPHLABSFRAMEGROUPBYBUFFERNUMROWS': 1048576, 'GRAPHLABSFRAMEIOREADLOCK': 1, 'GRAPHLABSFRAMEJOINBUFFERNUMCELLS': 52428800, 'GRAPHLABSFRAMEMAXBLOCKSINCACHE': 32, 'GRAPHLABSFRAMEREADBATCHSIZE': 128, 'GRAPHLABSFRAMESORTBUFFERNUMCELLS': 52428800, 'GRAPHLABSFRAMESORTPIVOTESTIMATIONSAMPLESIZE': 100000, 'GRAPHLABSFRAMEWRITERMAXBUFFEREDCELLS': 33554432, 'GRAPHLABSFRAMEWRITERMAXBUFFEREDCELLSPER_BLOCK': 262144, }

I have reduced the number of pylamba workers and increased the segments of the graphs, but keep running out of RAM, especially on triple_apply() functions.

My setup: 8-core i7 @ 2.4ghz 16GB ram/16GB cache 4GB GPU RAM Ubuntu 14.04 running Graphlab 1.2.1 w/GPU acceleration

Comments

User 1189 | 1/12/2015, 7:42:57 PM

Hi,

That may be a good tutorial to write. We do hope to improve to the point where eventually such tweaking is unnecessary. The tweaks you have done are basically correct. What are you doing in the triple apply?

If you are doing something rather (memory/cpu) intensive on a very large graph, I might also suggest the SDK which will allow you to write a triple apply in C++ which will be more memory efficient and faster.


User 190 | 1/12/2015, 9:09:55 PM

I'm essentially implementing a custom laplacian centrality metric which leverages pagerank for one of the calculations.

I'd originally implemented it without tripleapply but after seeing this: https://github.com/graphlab-code/how-to/blob/master/tripleapplyweightedpagerank.py

I decided to give it a shot. As it is right now I keep getting communication errors and have gone back to my original, unweighted implementation. The graphs range in size, but generally have between 1M-10M nodes and roughly 5X the number of edges.

I've checked out the Graphlab SDK and once I get a better handle on C++ I'll give it a shot in there.

Keep up the good work!


User 6 | 1/13/2015, 4:32:03 PM

Hi, Can you generate some code example which results in communication error? We would love to debug and check what is wrong.

Thanks


User 190 | 1/13/2015, 7:55:08 PM

Just sent you guys some stuff via feedback@graphlab.com