read_csv() only until memory limit reached

User 3820 | 3/20/2016, 1:36:31 PM

Hi,

I have to read in a csv file 80gb in size. The problem is that it always hangs when "GRAPHLABFILEIOMAXIMUMCACHECAPACITYPERFILE" is reached. I thought that it should use the disk then and save parts of the data structure in the "GRAPHLABCACHEFILE_LOCATIONS" directory. But this is not happening.

I see the import running

Read 349277212 lines. Lines per second: 259738 Read 350244986 lines. Lines per second: 259448

But it hangs at the point where there is no more memory left.

Any advice?

My Config: gl.setruntimeconfig("GRAPHLABCACHEFILELOCATIONS", "D:\dato\tmp") gl.setruntimeconfig("GRAPHLABFILEIOMAXIMUMCACHECAPACITY", 2e+10) gl.setruntimeconfig("GRAPHLABFILEIOMAXIMUMCACHECAPACITYPER_FILE", 1.0e+10)

Best regards

david

Comments

User 1190 | 3/21/2016, 6:39:10 PM

Hi Divad,

Sorry to hear that you had trouble. What's is the memory usage look like when it reaches max cache capacity?

-jay


User 3820 | 3/22/2016, 8:36:48 AM

Hi jay,

thanks for taking your time for me.

The memory is fluctuating. It goes slightly (slightly in reference to 20gb memory) up and down and the processor is doing something. But i can't see any disk usage.

One of our problems was that we only had a small main disk and a separate big disk (D: drive, that is why i had to change the cache directory). I changed this now (one huge main disk) and set everything up again.

We use at the moment a smaller dataset which is loading quite quickly (15gb csv file).

Another question that might be the same problem: We created a graph based on the dataset and are applying a triple_apply function. How long should this take? It is peeking now at 20gb of ram and is working for over 10 hours?

System Informations (it's a virtual server): - Windows Server 2012R2 - 8 Cpus - 32GB Ram - 500GB Disk

Best regards

David