graph create stuck and not proceeding further

User 2356 | 10/6/2015, 8:59:03 AM

Below is the server log, the unity server process shows ~1% utilization and the classification task does not proceed at all for even datasets of size 5000 samples with 111+ columns., it works well for say 1000 samples :

1444120955 : INFO: (constructfromsframeindex:69): Construct sframe from location: /home/tempuser/new-corpus/sframes/featuressframe497000.sframe 1444120955 : INFO: (append:770): Function entry 1444120955 : INFO: (constructfromsframeindex:69): Construct sframe from location: /home/tempuser/new-corpus/sframes/featuressframe498000.sframe 1444120955 : INFO: (append:770): Function entry 1444120955 : INFO: (constructfromsframeindex:69): Construct sframe from location: /home/tempuser/new-corpus/sframes/featuressframe499000.sframe 1444120955 : INFO: (append:770): Function entry 1444120955 : INFO: (removecolumn:473): Args: 87 1444120955 : INFO: (removecolumn:473): Args: 114 1444120955 : INFO: (sample:979): Args: 0.400061, 1444120955 1444120955 : INFO: (newcache:157): Cache Utilization:4096 1444120955 : INFO: (logicalfilter:741): Function entry 1444120955 : INFO: (groupbyaggregate:1036): Function entry 1444120955 : INFO: (groupbyaggregate:1040): Args: Keys: 1444120955 : INFO: (groupbyaggregate:1041): X1, 1444120955 : INFO: (groupbyaggregate:1042): Groups: 1444120955 : INFO: (groupbyaggregate:1049): Operations: 1444120955 : INFO: (groupbyaggregate:1051): 1444120955 : INFO: (groupby_aggregate:206): Filling group container: 1444120955 : INFO: (materialize:226): Materializing: digraph G { "24840400" [label="B: SF(S1)"] "24844400" [label="A: PR(0)"] "24840400" -> "24844400" }

it stucks at above " Materializing: digraph G {"

Once I kill the process with CTRL+C it throws: model = gl.classifier.create(data, target='label') File "/usr/local/lib/python2.7/dist-packages/graphlab/toolkits/classifier/classifier.py", line 64, in create verbose = verbose) File "/usr/local/lib/python2.7/dist-packages/graphlab/toolkits/supervisedlearning.py", line 565, in createclassificationwithmodelselector numclasses = dataset[target].unique().size() File "/usr/local/lib/python2.7/dist-packages/graphlab/datastructures/sarray.py", line 2519, in unique res = tmpsf.groupby('X1',{}) File "/usr/local/lib/python2.7/dist-packages/graphlab/datastructures/sframe.py", line 4179, in groupby groupoutputcolumns, groupops)) File "/usr/local/lib/python2.7/dist-packages/graphlab/cython/context.py", line 49, in exit raise exctype(excvalue) RuntimeError: Runtime Exception. Cancelled by user. [INFO] Stopping the server connection.

Also if I reduce number of examples then still I get stuck at following :

1444121716 : INFO: (constructfromsframe:63): Function entry 1444121716 : INFO: (deletepathimpl:299): Deleting cache://tmp/000009.frameidx 1444121716 : INFO: (append:770): Function entry 1444121716 : INFO: (deletepathimpl:299): Deleting cache://tmp/000008.0000 1444121716 : INFO: (deletepathimpl:299): Deleting cache://tmp/000008.sidx 1444121716 : INFO: (deletepathimpl:299): Deleting cache://tmp/000010.0000 1444121716 : INFO: (deletepathimpl:299): Deleting cache://tmp/000010.sidx 1444121716 : INFO: (runtoolkit:183): Running toolkit: supervisedlearningtrain 1444121716 : INFO: (train:64): Function entry 1444121716 : INFO: (materialize:226): Materializing: digraph G { "28762416" [label="D: SF(S2,...,S59,S1,S60,...,S114)"] "28764976" [label="E: transform"] "28765296" [label="F: S115"] "28765456" [label="C: transform"] "28767376" [label="A: Filter(D[C])"] "28767696" [label="B: UP(A:0,...,57,59,...,113)"] "28767376" -> "28767696" "28762416" -> "28767376" "28765456" -> "28767376" "28764976" -> "28765456" "28765296" -> "28764976" } 1444121716 : INFO: (materialize:231): Optimized As: digraph G { "28764976" [label="D: transform"] "28765296" [label="E: S115"] "28765456" [label="A: transform"] "28768336" [label="C: Filter(B[A])"]Markdown`

Comments

User 2356 | 10/6/2015, 9:02:02 AM

updated question


User 940 | 10/6/2015, 6:44:30 PM

Hi @abby,

Do you have a small reproducible code snippet? This would greatly aid us in debugging your problem.

Cheers! -Piotr


User 2356 | 10/7/2015, 10:13:09 AM

model = gl.classifier.create(data, target='label') the data is 113 column text features and the size is about 5000k it works on small datasets <20k but fails on >20k @piotr


User 940 | 10/7/2015, 5:00:58 PM

@abby, would it be possible to share the dataset, so I can run the exact code on my machine?

Also, have you tried running data.__materialize__() before that line?

We perform lazy evaluation, so it may be possible that the classifier.create call is not what is crashing, but rather it is the creation of the SFrame that is crashing.

Thanks for your patience! -Piotr