Extract feature weights from pre-trained imagenet classifier

User 2255 | 9/16/2015, 8:21:36 PM

In this example, you show how to train deep neural network on ImageNet data, and to build a classifier using known labels.

For the purposes of building an anomaly detector, I would like to simply extract the high-level feature weights, corresponding to each imput image, rather than to look at results from the classifier. Is there a striaghtforward way of doing this?


User 91 | 9/16/2015, 8:23:03 PM

Take a look at the deep feature extractor in feature engineering. https://dato.com/learn/userguide/feature-engineering/deepfeatureextractor.html

User 2255 | 9/16/2015, 8:47:58 PM

Thanks! I'll give it a look.

User 2255 | 9/17/2015, 1:39:35 AM

Hi Srikris,

Perhaps you can help me sort out an error. Taking it line-by-line in the example you linked, I get failures. In the first instance, This line:

extractor = gl.feature_engineering.DeepFeatureExtractor(features = 'image')

results in an error:

TypeError: __init__() got an unexpected keyword argument 'features'

which seems consistent with the result from help(gl.feature_engineering.DeepFeatureExtractor). However, when I simply pass 'image' as an argument (not as a keyword), I get an error which suggests that for the MNIST data the feature extractor is looking for a non-existent, pickled imagenet model. I get an identical error if I supply data in an SFrame built from my own images scaled to 256x256:

IOError Traceback (most recent call last) <ipython-input-25-d91f27947128> in <module>() 10 11 # extractor = gl.featureengineering.DeepFeatureExtractor(features = 'image') ---> 12 extractor = gl.featureengineering.DeepFeatureExtractor('image') 13 ` 14 # Fit the encoder for a given dataset.

/usr/lib/python2.7/site-packages/graphlab/toolkits/featureengineering/deepfeatureextractor.pyc in init(self, feature, model, outputcolumnname) 150 "http://s3.amazonaws.com/dato-datasets/deeplearning/imagenetmodeliter45" 151 import graphlab as gl --> 152 self.state['model'] = gl.loadmodel(modelpath) 153 if type(self.state['model']) is not _NeuralNetClassifier: 154 raise ValueError("Model parameters must be of type NeuralNetClassifier " +

/usr/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in loadmodel(location) 68 else: 69 # Not a ToolkitError so try unpickling the model. ---> 70 unpickler = gl_pickle.GLUnpickler(location) 71 72 # Get the version

/usr/lib/python2.7/site-packages/graphlab/glpickle.pyc in init(self, filename) 451 else: 452 if not _os.path.exists(filename): --> 453 raise IOError('%s is not a valid file name.' % filename) 454 455 # GLC 1.3 Pickle file

IOError: http://s3.amazonaws.com/dato-datasets/deeplearning/imagenetmodeliter45 is not a valid file name.

User 91 | 9/17/2015, 4:37:41 AM

It looks like a typo in our user guide. It should be feature not features.

extractor = gl.feature_engineering.DeepFeatureExtractor(feature = 'image')

User 2255 | 9/17/2015, 7:13:02 PM

Well, I should have spotted that. It solved my problem. Thanks again.

Using my own images, scaled to 256 x 256, I've managed to extract 4096 features, which I suppose correspond to weights for layer19. I guess this by referring to the model description when I trained a classifier for the same images.

help(DeepFeatureExtractor) begins with:

class DeepFeatureExtractor(graphlab.toolkits.featureengineering.feature_engineering.TransformerBase) | Takes an input dataset, propagates each example through the network, | and returns an SArray of dense feature vectors, each of which is the | concatenation of all the hidden unit values at layer[layer_id] ...

but does not indicate how I might actually specify the layer_id to control with feature weights are returned.

Most important to me is that I'm trying to use isolation forest to detect anomalous images out of a batch. Much to my surprise, iForest using raw pixels does pretty well, while iForest using the feature weights returned from the extractor does considerably worse (does very poorly in fact). Do I need to explicitly subtract a mean from each image before the fit/transform steps?

Any suggestions would be most welcome.

User 940 | 9/17/2015, 9:38:29 PM

Hi @mw0,

You're right, you can't specify the layer to extract features from in the Deep Feature Extractor right now this. This is a bug, thanks for pointing it out.

For now, you can do the following

python pretrained_model = graphlab.load_model('http://s3.amazonaws.com/dato-datasets/deeplearning/imagenet_model_iter45') data['extracted_features'] = pretrained_model .extracted_features(data, <layer_id>) As for the extracted features, no you should not need to subtract a mean image.

It is a bit surprising that the extracted features don't do well, but raw pixels do. What kind of images are they?

Hope this helps!

Cheers! -Piotr

User 2255 | 9/17/2015, 10:30:57 PM

Thanks Piotr for pointing me to load_model. I'll try different layers to see what I get.

My batch of images was dominated by patches of the moon (grayscale), with a few grayscale sections from the Golden Gate bridge used for the anomalous photos. Clearly I'll need to try it out with different sets of color images to ensure that something simple like contrast or mean density isn't driving the result. For now it remains a genuine curiosity to see the outcomes (raw pixels vs. high-level features) working out this way.