Feature engineering images, zoom into target object

User 2266 | 9/29/2015, 2:21:30 AM

Hi, I came across an idea or predicting number of levels in a house in google street view.

just to prove my theory, I started with Pesadena home http://www.vision.caltech.edu/html-files/archive.html

Looked great and worked well with Dato's NN classifier

I developed a routine where it will capture the house within street view.

So far I have created about 20,000 houses with meta data (manually) each picture has floors = 1, 2, 1.5 etc

used dato to determine the Nueral Net based on the data.

Result however is not as good as the result from the exercise of Pesadena home example data.

applied the same technique, yet, result was significantly lower (Pesadena home was only about 150 examples where as G street view has over 20k!)

I think it's due to the fact that quality of the picture in Pesadena homes sample was far more superior than the google street view.

as of next step, I am thinking, if there is a way to >>> recognize the house as an object in street view picture and focus on house,

possibly I will have a higher change of predicting a house level?

==========

so far, I have only used grey scale, 400x600 resized images. trainsf['image'] = gl.imageanalysis.resize(train_sf['image'], 580, 389, channels=1)~~~~~~~~

at this point any idea is appreciated. I will try images in different color spaces this week ( thank god, I have Titan X)

Comments

User 2266 | 9/29/2015, 3:01:24 PM

OK, tried the same data but using imagenet iter45 model to extract feature and build simple classifier. The result was more less the same.

I have a feeling that my NN is not complex enough.

network layers

layer[0]: ConvolutionLayer initrandom = gaussian padding = 0 stride = 2 numchannels = 10 numgroups = 1 kernelsize = 3 layer[1]: MaxPoolingLayer padding = 0 stride = 2 kernelsize = 3 layer[2]: FlattenLayer layer[3]: FullConnectionLayer initsigma = 0.01 initrandom = gaussian initbias = 0 numhiddenunits = 100 layer[4]: RectifiedLinearLayer layer[5]: DropoutLayer threshold = 0.5 layer[6]: FullConnectionLayer initsigma = 0.01 initrandom = gaussian initbias = 0 numhidden_units = 3 layer[7]: SoftmaxLayer

end network layers

which was created using

Create a default NeuralNet for subject data.

net = gl.deeplearning.create(sfsplittrain, target='Buildingstoreys')

could someone point me to an example on Dato on creating more complex net?

Thanks


User 2266 | 9/30/2015, 12:29:37 PM

It turns out, the similarities between features which was extracted from the picture using imagenet iter45 was too similar (confirmed by using random forest)

Would you guys suggest that I use higher resolution image than 255x255x3?

does that mena I have to create a new model based on higher image?


User 940 | 10/6/2015, 5:52:36 PM

Hi @sjl070707 ,

This is a cool project! As for recommendations on how to accomplish this task, here's an idea:

  1. Train on the MIT Places dataset with our Imagenet architechture. Here's a paper that shows how to do this: http://papers.nips.cc/paper/5349-learning-deep-features-for-scene-recognition-using-places-database.pdf

  2. Use those features and apply them to your problem scenario.

The reason I suggest this is that Places dataset is much closer to your problem domain.

Cheers! -Piotr


User 2266 | 10/10/2015, 10:32:40 PM

Thank you Piotr, I will try that,

I think I will use: bbowwindow_outdoor images to train for the windows -> reapply the feature on my dataset

Thank you again, hopefully, my next post will be a success story :smile:

so here is the new approach

1) After reading the paper. http://papers.nips.cc/paper/5349-learning-deep-features-for-scene-recognition-using-places-database.pdf 2) we will use the MIT scene dataset, http://places.csail.mit.edu/browser.html 3) train the model that will extract windows feature -> bow_windows 4) apply the windows feature model to Wies picture -> create feature column 5) build classifier on top of it.