After some investigation, this is what I've found.
In regards to Q3, yes extracted features become non-deterministic.
For prediction purposes, the most robust thing to do is average. In fact, this should even provide better results than just the center crop. Here's a code snippet to do that:
def averagedprediction(model, sf, numclasses, num_samples = 10):
Average out predictions based on random crops and random mirror.
prob = model.predicttopk(sf, k=numclasses).sort(['rowid', 'class'])
for i in range(numsamples-1):
print "Making prediction : %s" % i
prob['score'] = prob['score'] + \
model.predicttopk(sf, k=numclasses).sort(['rowid', 'class'])['score']
prob['score'] = prob['score'] / (numsamples * 1.0)
The issue was not present in GraphLab Create 1.1, but there was also no support for string target types, so you would have to enumerate your targets.
Sorry for the trouble.