SFrame.join caused Runtime Exception. Fail to write. Disk may be full.

User 1296 | 2/19/2015, 9:54:41 PM

Hi,

A merge of two SFrame instances by calling the <code class="CodeInline">SFrame.join</code> method caused the following exception:

<blockquote class="Quote">File "/usr/local/lib/python2.7/site-packages/graphlab/datastructures/sframe.py", line 3886, in join return SFrame(proxy=self.proxy.join(right.proxy, how, joinkeys)) File "/usr/local/lib/python2.7/site-packages/graphlab/cython/context.py", line 39, in exit raise exctype(exc_value) RuntimeError: Runtime Exception. Fail to write. Disk may be full.</blockquote>

I have OS X 10.10 (Yosemite) with 100GB+ of free space on my hard drive.

However, it works for me when I set: <pre>graphlab.setruntimeconfig('GRAPHLABFILEIOMAXIMUMCACHECAPACITY', 10010241024*1024) # 100GB graphlab.setruntimeconfig('GRAPHLABFILEIOMAXIMUMCACHECAPACITYPERFILE', 10010241024*1024) # 100GB</pre> (just a quick solution with hardcoded values)

Would it be possible for the <code class="CodeInline">join</code> method to check the amount of free space on its own? Sure, the size of free space could change anytime, but this should cover most of trivial cases.

Thanks

Ondrej

Comments

User 1190 | 2/23/2015, 5:55:49 PM

Hi @ondrejj,

Thank you for you feedback. We strive to have GLC provide the best user experience for programmers and data scientists.

Yes, we should be smart and detect system resources and try the best we can. However, what you ask is not so obvious to implement. While detecting the amount of free space is not difficult, it is hard to take additional action when disk space is low. Shall we retry the operation with a different system configuration as you did? I'm not sure.

When you set the cache to 100GB above, SFrame operations will hold at most 100GB of content in memory, which reduces the amount of disk space needed for shuffle and flushing. For most people, memory budge is much tighter than disk.

jay


User 15 | 2/23/2015, 7:00:36 PM

Hi @ondrejj,

Just wanted to add a few things to Jay's response.

As he said, the parameters you set actually influence the amount of memory we allow for caching SFrames. It's interesting that setting it to that level worked, but if the join would actually require 100 GB of memory, I'm guessing you would've run out of RAM on your system...unless you have that much RAM. I think something else is afoot here.

My hypothesis is that GLC could've run out of disk space, but did so simply because it was using a different partition...not the one with 100GB+ free. When you increased the cache size, you probably made the size of the cache bigger than the amount free on the partition GLC was using, but the problem was small enough to fit in your memory. Do you happen to have several partitions on your disk, and possibly a separate one for your system files? If not, I'll have to think of something else :). If you're not familiar on how to check this, execute "df -h" from your command line.

You can adjust the place in your filesystem that GLC uses to flush SFrame contents to disk by setting GRAPHLABCACHEFILELOCATIONS either with setruntime_config or an environment variable.

Hope this helps,

Evan


User 1296 | 3/1/2015, 11:11:24 AM

Hi Jay Gu and @EvanSamanas,

Thanks for your responses.

Just based on my experience, that setting the two properties helped me, I would expect that setting them in the join method should be helpful and not harmful. It would be a fixed part of the join method, no need to try it without it and, if it fails, then with it.

I have 8GB RAM and only one partition; the <code class="CodeInline">df -hl</code> output: <pre>Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1 233Gi 124Gi 108Gi 54% 32636274 28342540 54% /</pre>

The following two lines <pre class="CodeBlock"><code>import graphlab graphlab.getruntimeconfig()</code></pre>

return the following result (shortened): <pre class="CodeBlock"><code>'GRAPHLABCACHEFILELOCATIONS': '/var/tmp', 'GRAPHLABFILEIOMAXIMUMCACHECAPACITYPERFILE': 134217728, 'GRAPHLABFILEIOMAXIMUMCACHE_CAPACITY': 2147483648</code></pre>

It seems that the initial value of the GRAPHLABFILEIOMAXIMUMCACHECAPACITY property is theoretical only (2L ** 31). Is it a correct way to set the property value to the actually available disk space?

Thanks

Ondrej


User 15 | 3/6/2015, 6:06:19 PM

Hi Ondrej,

No, it is not correct to set that property value to the available disk space. As I said previously, the variable refers to the amount of memory we limit our cache to using. It doesn't have anything to do with disk space. We don't try to limit the amount of disk we use, other than making sure we use memory efficiently before we spill to disk. No setting will keep us from using all of it if you give us a big enough problem.

Execute <pre>graphlab.setruntimeconfig?</pre> in a python terminal for a full explanation of the settings there. It's true that setting often is helpful to the join algorithm, but you should limit your setting to how much memory is available on your system.

Can you give us a sense of how large these SFrames are that you're joining? Number of rows/columns, whether any data types of columns are particularly large (strings that are whole documents, images, large vectors, etc.). Do these SFrames approach the size of 100GB?

Thanks,

Evan


User 3073 | 1/19/2016, 5:26:03 PM

I'm struggling to get SFrame to change the cache file location. I've tried it on two different machines but each give different errors. any ideas?

os.environ["GRAPHLABCACHEFILELOCATIONS"]='/illumina/scratch/tmp/users/philtedder' allTR=iscall.join(bw1,how="outer",on=["CHROM","POS"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ptedder/virtualenv/venv/lib/python2.7/site-packages/sframe/datastructures/sframe.py", line 4338, in join return SFrame(proxy=self.proxy.join(right.proxy, how, joinkeys)) File "/home/ptedder/virtualenv/venv/lib/python2.7/site-packages/sframe/cython/context.py", line 49, in exit raise exctype(exc_value) IOError: Fail to write. Disk may be full.

and

os.environ["GRAPHLABCACHEFILE_LOCATIONS"]='/illumina/scratch/tmp/users/philtedder' print os.environ["GRAPHLABCACHEFILELOCATIONS"] /illumina/scratch/tmp/users/philtedder allTR=iscall.join(bw1,how="outer",on=["CHROM","POS"]) 1453223711 : FATAL: (createcurrentprocesstempdirectory:214): Unable to create temporary directories at /var/tmp/graphlab-ptedder/23944