Out of disk space error (the disk isn't full)

User 1129 | 4/20/2015, 12:37:58 PM

I work on mac. This is the output of df -h:

<pre class="CodeBlock"><code> Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1 465Gi 356Gi 108Gi 77% 93458662 28387646 77% / devfs 190Ki 190Ki 0Bi 100% 656 0 100% /dev map -hosts 0Bi 0Bi 0Bi 100% 0 0 100% /net map auto_home 0Bi 0Bi 0Bi 100% 0 0 100% /home </code></pre>

I try to find a giant component of a graph with ~1.3M nodes and 10M edges. I get the following exception "RuntimeError: Runtime Exception. Fail to write. Disk may be full.", despite the fact that my HD still has 108Gi of free space. GraphLab temp directory is the default /var/tmp.

Comments

User 4 | 4/22/2015, 8:14:42 PM

Hi @bgbg, there are a number of possible reasons that error might occur, only some of which imply the disk is actually full (sorry the actual message can be a bit misleading). A few cases in which this could occur:

  1. The space is filled with temporary files during execution, but the temp files are cleared out afterwards. You could monitor this by running df -h during execution, and see if the available space shrinks over time until the error occurs.
  2. It's possible you are hitting a kernel limit for number of open files (file descriptors) or number of inodes. You could try checking the current limit and setting a higher one by following the instructions here.

User 1768 | 5/22/2015, 7:38:14 AM

Hi @Zach. unfortunately, I encountered with the same error in Macbook. Always I want run any query I receive the same error in mac that "startup disk is full" , and also in python notebook, "Fail to write. Disk may be full". In addition of monitoring with df -h, do you have any suggestion to solve the problem? Unfortunately, I don't have any Amazon grand access. Do you have any other suggestion? Because I am just in middle of experiment and I need to solve it soon. I will be appreciated , if you guide me.

Thanks


User 1768 | 5/22/2015, 8:28:09 AM

Hi again @Zach. Now I just received an Amazon AWS Grant. But, Can you please guide me how I can start to use Graphlab Create in Azazon AWS?


User 4 | 5/22/2015, 5:15:26 PM

Hi @naoomi, there are two main ways you can use GraphLab Create on an EC2 machine in AWS:

  1. ssh into the EC2 machine, and run GraphLab Create like you normally would (pip install, and execute in Python). To do this you may need to satisfy some system requirements first using a combination of apt-get install and/or Anaconda's Python installer.
  2. From your local machine, you can use Dato Distributed to coordinate the execution of custom code across one or more EC2 machines. For that, see the API documentation here: https://dato.com/products/create/docs/generated/graphlab.deploy.job.create.html#graphlab.deploy.job.create

User 1768 | 5/26/2015, 9:25:37 AM

Hi @zach. thanks a lot for your answer and guidance. I install all requirement such as pip, Python, Anaconda. But, when I try to install Graphlab with these command that I tried both, pip install graphlab-create==1.4.0 or pip install --upgrade https://get.dato.com/GraphLab-Create/1.4.0/{email}/{key}/GraphLab-Create-License.tar.gz it gives me this error:

Can you please guide me, Thanks again


User 1768 | 5/26/2015, 11:07:27 AM

no, sorry for the disturbance. I found that with this command pip install graphlab-create==1.4.0 , I don't have this problem, since it has the root permission. It solved already. Thanks @zach.


User 4 | 5/26/2015, 5:25:34 PM

Hi @naoomi, I modified your above post to remove your license key. You should not share your license key with anyone since its use could directly result in billing to your account for paid products and services.

You appear to be trying to pip install without permission into a system-wide installation of Python. You should either use virtualenv or conda env to create a Python environment locally that your user has write access to.


User 1768 | 5/27/2015, 8:11:26 AM

Oh Sorry @Zach . Since I have copy it from the Dato site, I thought this a public trial key that every one can use it. Thanks to inform me.


User 3182 | 2/6/2016, 3:25:18 PM

Hi, I have the same problem. It seems graphlab creates a lot of logs in /tmp/. I wrote a cron job that deletes them every minute. This is not fast enough. These have been very furstrating three days I am trying to run a script which fails at the end when writing the file.

Being a professional software engineer (Java EE) myself, I have to ask you, why would you enable logging by default ?!?


User 5281 | 6/21/2016, 2:52:53 PM

Hi, So any other way of disabling the buildup of files in /var/tmp? i had the same problem and found 20GB of materials inside the folder. Quite a pain.


User 940 | 6/22/2016, 4:52:54 PM

Hi @"Nikolay Kostadinov" and @ThusithaC ,

We have some environment variables you can set to help with this.

GRAPHLABLOGPATH sets the path. GRAPHLABLOGROTATIONINTERVAL sets the log rotation interval in seconds. GRAPHLABLOGROTATIONTRUNCATE sets how many log files to keep.

I hope this helps!

Cheers! -Piotr