BotoClientError: BotoClientError: Bucket names cannot contain upper-case characters when using eithe

User 1375 | 5/22/2015, 12:03:59 AM

` In [21]: gl.connect.server.defaultlocalconf.version Out[21]: '1.4.0'

load a transformer chain from disk

In [22]: chain = gl.load_model('/var/tmp/2015052109-xformer-chain-model.gl') 2015-05-21 23:51:59,214 [INFO ] [SimpleFeatures] setstate 2015-05-21 23:51:59,259 [INFO ] [JobTitleFeatures] setstate 2015-05-21 23:51:59,961 [INFO ] [JobCatFeatures] setstate 2015-05-21 23:51:59,963 [INFO ] [TopKQueryMetroId] setstate 2015-05-21 23:51:59,965 [INFO ] [TopKJobMetroId] setstate 2015-05-21 23:51:59,965 [INFO ] [EmpRunImpressionsImputer] setstate 2015-05-21 23:51:59,965 [INFO ] [EmpRunUniqueJobsPlusOneImputer] setstate 2015-05-21 23:51:59,971 [INFO ] [EmpRunUniqueJobsImputer] setstate 2015-05-21 23:51:59,978 [INFO ] [EmpRunCTRImputer] setstate 2015-05-21 23:51:59,978 [INFO ] [RunImpressionsImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [RunCTRImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [UserGuidRunImpressionsImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [UserRunImpressionsImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [UserGuidRunCTRImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [UserRunCTRImputer] setstate 2015-05-21 23:51:59,987 [INFO ] [Identity ] setstate

it looks happy

In [23]: map(lambda step: step.class, chain['steps']) Out[23]: [xformers.SimpleFeatures, xformers.JobTitleFeatures, xformers.JobCatFeatures, xformers.TopKQueryMetroId, xformers.TopKJobMetroId, xformers.EmpRunImpressionsImputer, xformers.EmpRunUniqueJobsPlusOneImputer, xformers.EmpRunUniqueJobsImputer, xformers.EmpRunCTRImputer, xformers.RunImpressionsImputer, xformers.RunCTRImputer, xformers.UserGuidRunImpressionsImputer, xformers.UserRunImpressionsImputer, xformers.UserGuidRunCTRImputer, xformers.UserRunCTRImputer, xformers.Identity]

now let's save it to s3

In [24]: chain.save('s3://pclick/p2click/2015052109-xformer-chain-model.gl')

BotoClientError Traceback (most recent call last) <ipython-input-24-9b790b3fd902> in <module>() ----> 1 chain.save('s3://pclick/p2click/2015052109-xformer-chain-model.gl')

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in save(self, location) 286 # Save to a temoporary pickle file. 287 try: --> 288 self.savetopickle(makeinternal_url(location)) 289 except IOError as err: 290 raise IOError("Unable to save model. Trace (%s)" % err)

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in savetopickle(self, filename) 320 # Save the object. 321 self.saveimpl(pickler) --> 322 pickler.close() 323 324 def saveimpl(self, pickler):

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/glpickle.pyc in close(self) 363 364 if self.s3path: --> 365 fileutil.s3recursivedelete(self.s3path) 366 fileutil.uploadtos3(self.gltempstoragepath, self.s3path, 367 is_dir = True)

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlabutil/fileutil.pyc in s3recursivedelete(s3path, awscredentials) 462 bucket = conn.getbucket(s3bucketname, validate=False) 463 matches = bucket.list(prefix=s3keyprefix) --> 464 bucket.deletekeys([key.name for key in matches]) 465 466 def expandfullpath(path):

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/boto/s3/bucketlistresultset.pyc in bucketlister(bucket, prefix, delimiter, marker, headers, encodingtype) 32 rs = bucket.getallkeys(prefix=prefix, marker=marker, 33 delimiter=delimiter, headers=headers, ---> 34 encodingtype=encodingtype) 35 for k in rs: 36 Markdown`�I�M! ��7# ++����FYI: If you are using Anaconda and having problems with NumPyHello everyone,

I ran into an issue a few days ago and found out something that may be affecting many GraphLab users who use it with Anaconda on Windows. NumPy was unable to load, and consequently everything that requires it (Matplotlib etc).

It turns out that the current NumPy build (1.10.4) for Windows is problematic (more info here).

Possible workarounds are downgrading to build 1.10.1 or forcing an upgrade to 1.11.0 if your dependencies allow. Downgrading was easy for me using conda install numpy=1.10.1

Thanks for your attention!

RafaelMarkdown558,824,8414L���4L���179.110.206.156179.110.206.1564P�}��Xj�8\j�1str�"��\j�Xj��\j�8bj�րi�1(׀i��g��b�j����Xj�\j�Xj�8\j�1.hpp(decrementdistributedcounter:787): Distributed Aggregation of likelihood. 0 remaining. INFO: distributedaggregator.hpp(decrementdistributedcounter:793): Aggregate completion of likelihood Likelihood: -3.22336e+08 INFO: distributedaggregator.3HLABDISABLELAMBDA_SHM"] = "1" os.environ["GRAPHLABFORCEIPCTOTCP_FALLBACK"] = "1" import graphlab as gl

3. Test out your lambda worker code in this environment. If it works, then you can make the above configuration permanent by running:

gl.sys_util.write_config_file_value("GRAPHLAB_DISABLE_LAMBDA_SHM", "1")
gl.sys_util.write_config_file_value("GRAPHLAB_FORCE_IPC_TO_TCP_FALLBACK", "1")

Note that this can be undone by setting these to "0" instead of "1", or by editing the file given by gl.sys_util.get_config_file().

4. If the lambda workers do not work after trying step 1, then there are two things we would very much appreciate you do to help us track down the issue.

4.1. First, execute the following code in a clean python shell, where you have not yet imported graphlab create. At the end of this code, it prints out the path to a zip file that, if you could send it to us, will help us diagnose the issue. Please create a

Comments

User 1375 | 5/22/2015, 12:38:50 AM

As a workaround I tried saving to S3 via the command line tool aws s3, as follows:

` (graphlab)root@ip-10-39-141-48:~# aws s3 cp /var/tmp/2015052109-xformer-chain-model.gl s3://pclick/p2click/ --recursive upload: ../var/tmp/2015052109-xformer-chain-model.gl/01a7b1a3-22f7-48d5-841f-d541b9a1c598/dirarchive.ini to s3://pclick/p2click/01a7b1a3-22f7-48d5-841f-d541b9a1c598/dirarchive.ini ... upload: ../var/tmp/2015052109-xformer-chain-model.gl/f4cf87ce-cec4-494d-be75-35cebeb22c9e/objects.bin to s3://pclick/p2click/f4cf87ce-cec4-494d-be75-35cebeb22c9e/objects.bin upload: ../var/tmp/2015052109-xformer-chain-model.gl/fa726489-ddc5-4df6-9dbf-6e36b1727aa2/objects.bin to s3://pclick/p2click/fa726489-ddc5-4df6-9dbf-6e36b1727aa2/objects.bin

(graphlab)root@ip-10-39-141-48:~# aws s3 ls s3://pclick/p2click/2015052109-xformer-chain-model.gl PRE 2015052109-xformer-chain-model.gl/ `

but, that reveals a different problem (...back in ipython):

` In [8]: chain = load_model('s3://pclick/p2click/2015052109-xformer-chain-model.gl') 2015-05-22 00:37:30,142 [INFO ] [load_model ] <<< Starting >>>


RuntimeError Traceback (most recent call last) <ipython-input-8-55a8ca1ae3e3> in <module>() ----> 1 chain = load_model('s3://pclick/p2click/2015052109-xformer-chain-model.gl')

/root/src/utils.pyc in timed(args, **kw) 39 logger.info('<<< Starting >>>') 40 ts = time.time() ---> 41 result = func(args, **kw) 42 te = time.time() 43 logger.info('<<< Completed >>> in %2.2f secs', te - ts)

/root/src/utils.pyc in loadmodel(modelpath) 152 @timeit 153 def loadmodel(modelpath): --> 154 return gl.loadmodel(modelpath) 155 156 @timeit

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in loadmodel(location) 54 internalurl = makeinternalurl(location) 55 try: ---> 56 return glconnect.getunity().loadmodel(internalurl) 57 except IOError as e: 58 unpickler = glpickle.GLUnpickler(internalurl)

cyunity.pyx in graphlab.cython.cyunity.UnityGlobalProxy.load_model()

cyunity.pyx in graphlab.cython.cyunity.UnityGlobalProxy.load_model()

RuntimeError: Runtime Exception. Unable to load model from s3://pclick/p2click/2015052109-xformer-chain-model.gl: Invalid directory archive. Please make sure the directory contains dir_archive.ini

whereas the local version is still ok:

In [9]: chain = load_model('/var/tmp/2015052109-xformer-chain-model.gl') 2015-05-22 00:38:16,291 [INFO ] [load_model ] <<< Starting >>> 2015-05-22 00:38:16,299 [INFO ] [SimpleFeatures] setstate 2015-05-22 00:38:16,346 [INFO ] [JobTitleFeatures] setstate 2015-05-22 00:38:17,063 [INFO ] [JobCatFeatures] setstate 2015-05-22 00:38:17,066 [INFO ] [TopKQueryMetroId] setstate 2015-05-22 00:38:17,068 [INFO ] [TopKJobMetroId] setstate 2015-05-22 00:38:17,068 [INFO ] [EmpRunImpressionsImputer] setstate 2015-05-22 00:38:17,068 [INFO ] [EmpRunUniqueJobsPlusOneImputer] setstate 2015-05-22 00:38:17,074 [INFO ] [EmpRunUniqueJobsImputer] setstate 2015-05-22 00:38:17,081 [INFO ] [EmpRunCTRImputer] setstate 2015-05-22 00:38:17,081 [INFO ] [RunImpressionsImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [RunCTRImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [UserGuidRunImpressionsImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [UserRunImpressionsImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [UserGuidRunCTRImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [UserRunCTRImputer] setstate 2015-05-22 00:38:17,089 [INFO ] [Identity ] setstate 2015-05-22 00:38:17,090 [INFO ] [load_model ] <<< Completed >>> in 0.80 secs `


User 91 | 5/22/2015, 1:23:12 AM

A couple of things. We changed a few things between GLC-1.3 and GLC-1.4 to make the file formats a bit more efficient for larger models and SFrames. As you are aware, S3 does not have a concept of directories so in-order to save a directory, we have to delete all the keys (recursively) that were present in your "directory".

Can you make sure that there were no "files" in the directory "s3://pclick/p2click/2015052109-xformer-chain-model.gl" that contained capital keys (say from something saved using an older version).

Second, can you give me a list of files that are presented in the folder ""s3://pclick/p2click/2015052109-xformer-chain-model.gl" I could not see the result of your ls command.


User 1487 | 5/29/2015, 12:47:59 AM

Here's an update, we can reproduce it the error with a shorter S3 path: aws s3 ls s3://pclick/test.gl returns nothing

in ipython session: `python chain Out[14]: Class : TransformerChain

Steps

0 : SimpleFeatures() 1 : JobTitleFeatures() 2 : JobCatFeatures(jobcatbaseurl=s3://pclick/p2click/jobcat/, target=1grams, outputprefix=jobcat) 3 : TopKQueryMetroId() 4 : TopKJobMetroId() 5 : EmpRunImpressionsImputer() 6 : EmpRunUniqueJobsPlusOneImputer() 7 : EmpRunUniqueJobsImputer() 8 : EmpRunCTRImputer() 9 : RunImpressionsImputer() 10 : RunCTRImputer() 11 : UserGuidRunImpressionsImputer() 12 : UserRunImpressionsImputer() 13 : UserGuidRunCTRImputer() 14 : UserRunCTRImputer() 15 : Identity()

In [15]: chain.save('s3://pclick/test.gl')

BotoClientError Traceback (most recent call last) <ipython-input-15-ebbbb665719f> in <module>() ----> 1 chain.save('s3://pclick/test.gl')

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in save(self, location) 286 # Save to a temoporary pickle file. 287 try: --> 288 self.savetopickle(makeinternal_url(location)) 289 except IOError as err: 290 raise IOError("Unable to save model. Trace (%s)" % err)

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/toolkits/model.pyc in savetopickle(self, filename) 320 # Save the object. 321 self.saveimpl(pickler) --> 322 pickler.close() 323 324 def saveimpl(self, pickler):

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlab/glpickle.pyc in close(self) 363 364 if self.s3path: --> 365 fileutil.s3recursivedelete(self.s3path) 366 fileutil.uploadtos3(self.gltempstoragepath, self.s3path, 367 is_dir = True)

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/graphlabutil/fileutil.pyc in s3recursivedelete(s3path, awscredentials) 462 bucket = conn.getbucket(s3bucketname, validate=False) 463 matches = bucket.list(prefix=s3keyprefix) --> 464 bucket.deletekeys([key.name for key in matches]) 465 466 def expandfullpath(path):

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/boto/s3/bucketlistresultset.pyc in bucketlister(bucket, prefix, delimiter, marker, headers, encodingtype) 32 rs = bucket.getallkeys(prefix=prefix, marker=marker, 33 delimiter=delimiter, headers=headers, ---> 34 encodingtype=encodingtype) 35 for k in rs: 36 yield k

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/boto/s3/bucket.pyc in getallkeys(self, headers, params) 470 return self.getall([('Contents', self.key_class), 471 ('CommonPrefixes', Prefix)], --> 472 '', headers, params) 473 474 def getallversions(self, headers=None, **params):

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/boto/s3/bucket.pyc in getall(self, elementmap, initialquerystring, headers, **params) 396 response = self.connection.makerequest('GET', self.name, 397 headers=headers, --> 398 queryargs=queryargs) 399 body = response.read() 400 boto.log.debug(body)

/root/anaconda/envs/graphlab/lib/python2.7/site-packages/boto/s3/connection.pyc in makerequest(self, method, bucket, key, headers, data, queryargs, sender, overridenumretries, retryhandler) 652 authpath = self.callingformat.buildauth_path(bucket, key) 653 HTTP/1.1 200 OK Transfer-Encoding: chunked Date: Thu, 21 Jul 2016 23:13:36 GMT Server: Warp/3.2.6 Content-Type: application/json

016A ["37zyefqi2sweveyp","42fn7zeo6v5ui427","66pt5sk2wz2jrbzu","awoljknjigytdyls","cj2lanoogknwopto","cnm3adnh35xmsx3f","ebxs4t2y6xr5izzy","eg5zus2pz72mr7xb","exshwew2w2jv3n7r","hxrxgzvgms3incmf","hymu5oh2f5ctk5jr","jkisbjnul226jria","lag7djeljbjng6bu","o3l65o4qzcxs327j","qsk2jzo2zh523r24","t7k6g7fkndoggutd","xfllvjyax4inadxh","ygtjzi2wkfonj3z7","yycjajwpguyno4je"] 0


User 1487 | 5/29/2015, 1:04:30 AM

It seems like there's something about that TransformerChain that prevents it from being serialized to S3 properly, other models / SFrames in bin format get saved just fine.


User 91 | 5/29/2015, 3:08:54 AM

I was able to reproduce the bug. The capital letters come from the fact that our s3 upload code appends your AWS credentials (which have capital letters) to the path. boto does not support s3 paths that work that way. It appears to be a very silly bug from our side.

For now, can you save the object to your local disk and use boto to copy over the file to s3? If not, I can send you a hot-patch.


User 1487 | 5/29/2015, 7:37:47 PM

A hot patch would help, thank you


User 1487 | 5/30/2015, 6:02:16 AM

The patch worked, thank you!


User 91 | 5/30/2015, 6:03:21 AM

Feel free to reach us if you have more questions.


User 91 | 6/16/2015, 6:33:12 PM

This issue should be fixed in Graphlab-Create 1.4.1. Thanks for reporting it.