Detailed settings for GraphLab set_runtime_config?

User 512 | 2/2/2015, 11:02:32 PM

I saw several posts related to graphlab.setruntimeconfig, which seems quite useful. Is there any detailed document on this? I cannot find it in Graphlab user guide.

Comments

User 92 | 2/2/2015, 11:24:00 PM

Hi Shuning,

Thanks for bringing this up! It is in our radar to add more document for runtime configuration. Please stay tuned.

Basically, tuning the runtime configurations helps GraphLab Create unity server (c++ side) to efficiently use system resources (disk, memory, cpu, etc.) in a way that best fit user needs. For example, you can configure where the cache file go, how big the cache size is, how much memory to use for join or sort, etc.

For now, please let us know if you have any specific question regarding the runtime configuration and we will be happy to answer that.

Ping


User 512 | 2/3/2015, 4:31:55 PM

Thanks, Ping! The points you mentioned sound interesting to me. I do want to set the cache file location, its size and the memory used. Could you give me some examples on these?


User 92 | 2/3/2015, 6:58:07 PM

Hi Shuning,

Thanks to your feedback, we will have some documentation regarding this in upcoming release, please stay tuned.

In the mean time, here are a few commonly used configs:

<pre> - *GRAPHLABCACHEFILE_LOCATIONS* The directory in which intermediate SFrames/SArrays are stored. For instance "/var/tmp". Multiple directories can be specified separated by a colon (ex: "/var/tmp:/tmp") in which case intermediate SFrames will be striped across both directories (useful for specifying multiple disks). Defaults to /var/tmp if the directory exists, /tmp otherwise.

- *GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY*
 The maximum amount of memory which will be occupied by *all* intermediate
 SFrames/SArrays. Once this limit is exceeded, SFrames/SArrays will be
 flushed out to temporary storage (as specified by
 `GRAPHLAB_CACHE_FILE_LOCATIONS`). On large systems increasing this as well
 as `GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY_PER_FILE` can improve performance
 significantly. Defaults to 2147483648 bytes (2GB).

- *GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY_PER_FILE*
 The maximum amount of memory which will be occupied by any individual
 intermediate SFrame/SArray. Once this limit is exceeded, the
 SFrame/SArray will be flushed out to temporary storage (as specified by
 `GRAPHLAB_CACHE_FILE_LOCATIONS`). On large systems, increasing this as well
 as `GRAPHLAB_FILEIO_MAXIMUM_CACHE_CAPACITY` can improve performance
 significantly for large SFrames. Defaults to 134217728 bytes (128MB).

**Sort Performance Configuration**

- *GRAPHLAB_SFRAME_SORT_PIVOT_ESTIMATION_SAMPLE_SIZE*
 The number of random elements to sample from the SFrame to estimate the 
 sort partitioning pivots.

- *GRAPHLAB_SFRAME_SORT_BUFFER_SIZE*
 The maximum estimated memory consumption sort is allowed to use. Increasing
 this will increase the size of each sort partition, and will increase
 performance with increased memory consumption.

**Join Performance Configuration**

- *GRAPHLAB_SFRAME_JOIN_BUFFER_NUM_CELLS*
 The maximum number of cells to buffer in memory. Increasing this will
 increase the size of each join partition and will increase performance
 with increased memory consumption. 
 If you have very large cells (very long strings for instance),
 decreasing this value will help decrease memory consumption.

**Groupby Aggregate Performance Configuration**

- *GRAPHLAB_SFRAME_GROUPBY_BUFFER_NUM_ROWS*
 The number of groupby keys cached in memory. Increasing this will increase 
 performance with increased memory consumption.

</pre>

Thanks!


User 512 | 2/3/2015, 7:25:46 PM

Thanks a lot! This is very helpful!