GraphChi - Bug while reading temp file

User 230 | 5/6/2014, 10:54:01 AM

Hi there,

GraphChi writes a lot of temporary files (edata,intervals...). After running the sharding, ALS encounters an error while reading a .apj belonging to shard 152 (which does not exist, 151 being the latest).

I wonder if I did something wrong with userids and itemids? More details in the attachment (with command I ran).


User 6 | 5/6/2014, 3:10:50 PM

Hi Guy, 1) I recommend running with --clean_cache=1 to clean the old temp files in case they were messed up. 2) For you specific error message, it seems that you run out of open file descriptors. Please increase the number of open file descriptors, for example: ulimit -n 100000 We know that the error message for this case is unclear - this issue is captured on issue 10 here:

User 230 | 5/7/2014, 1:11:22 PM

Thanks Danny. I got the error out of the way with the new value for ulimit .

I was wondering if knew what that error was referring to and how to solve it (see attached)? I couldn't find a good answer on git or the forum here.


User 24 | 5/7/2014, 3:42:22 PM

Guillaume, I have not seen that error before, but it is probably because the computer is running out of memory. You can try reducing the number of threads, or the membudget_mb setting (see conf/graphchi.cnf file).

User 230 | 5/7/2014, 8:45:44 PM

Thanks Akyrola. Will try.

User 6 | 5/8/2014, 6:31:42 AM

Hi Guy, Are you running in a virtual box? This may mean there are not enough resources for the Linux OS.

User 230 | 5/8/2014, 10:32:27 AM

Yes. it runs on a big EC2 instance. I remember quadrupling the membudget (since I have 16GB of RAM) to speed up ALS (it would take over 8h to run 20 iterations on 20MM edges). I will try to scale it down.

User 6 | 5/8/2014, 10:36:43 AM

Assuming you have enough memory: 1) You can preload the problem into memory instead of reading them from disk on each iteration. This is done using the --nshards=1 command. 4) You can disable compression by defining the following macro in your program code:


and then recompile. It will speed up execution

User 230 | 5/8/2014, 5:55:32 PM

Hi Danny/Akyrola:

Thanks for your inputs. I got myself into trouble with the implicit rating argument. I didn't understand that you add N% of the item/User matrix (so N% * NUsers * NItems). Basically, I was adding 5B edges running the command with a wrong parameter, leading to lengthy processing time.

Running it with a proper parameter runs under 10s ... Pretty impressive.

User 6 | 5/8/2014, 6:05:20 PM

Great to hear this! keep us posted!