GLC feature_engnineering.SentenceSplitter() returns error (spaCy version problem?)

User 3147 | 5/9/2016, 4:02:38 PM

Hi all,

During a sentiment analysis that I recently had to make I came up with some problem with the SentenceSplitter() transformer of the GLC.feature_engineering toolkit. More specifically, having select the default arguments of the SentenceSplitter() the function returned:

AttributeError: 'module' object has no attribute 'default_model'

during the call it tries to make internally in order to import the spaCy Package.

My current version of GLG is 1.9, whereas I have already installed the spaCy package (vesion: 0.100.7) [My Sputnik library is currently at version: 0.9.3].

Is this a problem of spaCy/sputnik version inconsistency with the GLC v1.9? Do I need to downgrade to some older version of spaCy library?

P.S.: The problem is also reproducible from GLC toolkits.textanalytics.splitby_sentence() method.

Comments

User 940 | 5/9/2016, 5:36:55 PM

Hi @theod ,

I'm sorry you're having this issue. But yes, you're correct. spaCy moves quickly, and it looks like 0.100.7 was recently released, but we've only tested 0.100.6. Could you try downgrading to 0.100.6? We'll be making sure we integrate with 0.100.7 for the next release.

Cheers! -Piotr


User 3147 | 5/10/2016, 7:44:41 PM

Hi @piotr

You are absolutely right. By downgrading spaCy at version 0.100.6 I managed to overcome the problem with GLV.featureengineering.SentenceSplitter(). However, during the subsequent sentimentanalysis.create() call I had the following unexpected Exception Error:


RuntimeError: Runtime Exception. Unable to load model from /var/tmp/model_cache/sentiment-combined/1: Archive does not contain a model.


But I have not trained a relative sentiment-model before. Neither I saved it in this path of my disk. Could you please see what is happening?


User 940 | 5/11/2016, 6:14:19 PM

Hi @theod,

If you don't provide a target variable for the .sentimentanalysis.create(), a pre-trained model is loaded and used. Now, I'm not super familiar with the code here, but it appears that a download of the sentiment-analysis model to a model cache was corrupted. I would suggest removing the /var/tmp/modelcache/sentiment-combined/1 directory, and then it should try downloading the model from s3 instead.

Let me know if this helps!

Cheers! -Piotr


User 3147 | 5/11/2016, 6:23:11 PM

Hi @piotr,

You are absolutely right. I was just going to write back for an update. Having removed the corrupted cached models in the /var/tmp/model_cache/sentiment-combined/1 directory everything worked fine again.

Thank you for your help! theod