large sframe appending error

User 2356 | 10/3/2015, 9:19:18 AM

on the following line after running for many iterations i get: ` featuresframe=featuresframe.append(sf)

error Communication Failure: 113 and halts execution ` the code appends a feature sframe with 113 columns , it should grow to more then 10 million rows.

I have attached the log files.

basically trying to append to a feature table sframe for ML , I have millions of training points hence appending the main sframe featuresframe.

I cant upload the huge server log uploading the tail -1000 version and the client log

Comments

User 2356 | 10/3/2015, 10:57:41 AM

featuresframe=featuresframe.append(sf)

File "/usr/local/lib/python2.7/dist-packages/graphlab/datastructures/sframe.py", line 3791, in append return SFrame(proxy=self.proxy.append(processedotherframe.proxy)) File "/usr/local/lib/python2.7/dist-packages/graphlab/cython/context.py", line 49, in exit raise exctype(excvalue) RuntimeError: Communication Failure: 113.


User 2356 | 10/6/2015, 7:44:21 AM

why there is no reply to the posts?


User 940 | 10/6/2015, 5:37:54 PM

Hi @abby ,

Sorry for the delay in reply.

Could you do

python features_frame.__materialize__()

every iteration or every couple of iterations and report what you get? It's possible that appending in a loop causes a stack overflow in the query tree.

Cheers! -Piotr


User 2356 | 10/27/2015, 5:31:11 AM

ya this works but , also appending operation makes the program very sluggish and slow


User 2356 | 10/31/2015, 7:45:26 AM

@piotr why is the append operation so slow?


User 2356 | 11/16/2015, 6:00:21 AM

@piotr features_frame.materialize()

was working earlier now again throwing the same runtime error on append operation: Communication Error :113

Also this is very wastefull as the process requires 9-12 days for millions of rows which are appended only to later get this message of communication failure which wastes all previous computations and we have to retry on smaller data.


User 940 | 11/16/2015, 7:21:45 PM

@abby,

Again, sorry for the delay. We are looking into this currently. We will keep you posted.

Cheers! -Piotr


User 940 | 11/17/2015, 1:37:23 AM

@abby,

We have confirmed an unexpected slowdown, and are looking into it. Thank you for the pointer!

Cheers! -Piotr


User 2356 | 11/17/2015, 4:56:12 AM

@piotr Also following error is occuring even during materialize call:

Traceback (most recent call last): ... File "/home/user/new-corpus/generatefeatures.py", line 226, in generatefeatures featuresframe.materialize() File "/home//anaconda/lib/python2.7/site-packages/graphlab/datastructures/sframe.py", line 3667, in materialize self.proxy.materialize() File "/home//anaconda/lib/python2.7/site-packages/graphlab/cython/context.py", line 49, in exit raise exctype(excvalue) RuntimeError: Communication Failure: 113.


User 940 | 11/18/2015, 5:05:12 PM

@abby,

In this case, could you send us the server logs? At the beginning of spinning up GLC, there should be a line that says Server log: <file_path>. This should give us some information to help us debug.

Cheers! -Piotr


User 2356 | 11/23/2015, 6:25:12 AM

@piotr Attached last 10000 lines of the log as the log is too huge >32MB

Also is there a way to periodically checkpoint the sframe so that all intermediate data is not lost? we have huge sframes and one such communication error means the entire sframe is lost


User 940 | 11/23/2015, 6:09:03 PM

@abby,

Thank you. We're taking a look at this now.

If you are appending in a loop, you can save the SFrame with a featuresframe.save(<insert_name_here>) every few iterations.

Cheers! -Piotr


User 2356 | 11/24/2015, 6:08:26 AM

This was what I was doing earlier. I hope the problem is resolved.


User 2356 | 11/24/2015, 6:26:34 AM

Also the append operations are very slow compared to pandas frame almost 30-40times slower even on a 128gb ram and haswell xeon processor.