graphlab.SArray.apply

User 2570 | 6/30/2016, 5:43:53 PM

I am running the following function to download dataset to my local machine, sf['img'] = sf['imUrl'].apply(lambda x: gl.Image(x)) seems it supports JPG and PNG image format, unfortunately seems in the dataset there is another format which is making the function not work and hence ana error, how can i bypass the unsupported images dataset rows

error: C:\Anaconda2\lib\site-packages\graphlab\datastructures\sarray.pyc in apply(self, fn, dtype, skipundefined, seed) 1853 assert callable(fn), "Input function must be callable." 1854 -> 1855 dryrun = [fn(i) for i in self.head(100) if i is not None] 1856 if dtype == None: 1857 dtype = infertypeof_list(dryrun)

in (x) ----> 1 sf['img'] = sf['imUrl'].apply(lambda x: gl.Image(x))

C:\Anaconda2\lib\site-packages\graphlab\datastructures\image.pyc in init(self, path, format, **Imageinternalkwargs) 70 from ..util import makeinternalurl 71 from .. import extensions as extensions ---> 72 img = extensions.loadimage(makeinternal_url(path), format) 73 for key, value in list(img.dict__.items()): 74 setattr(self, key, value)

C:\Anaconda2\lib\site-packages\graphlab\extensions.pyc in (*args, kwargs) 166 167 def makeinjected_function(fn, arguments): --> 168 return lambda *args, kwargs: runtoolkitfunction(fn, arguments, args, kwargs) 169 170 def classinstancefromname(classname, *arg, **kwarg):

C:\Anaconda2\lib\site-packages\graphlab\extensions.pyc in runtoolkitfunction(fnname, arguments, args, kwargs) 155 if ret[0] != True: 156 if len(ret[1]) > 0: --> 157 raise ToolkitError(ret[1]) 158 else: 159 raise _ToolkitError("Toolkit failed with unknown error")

ToolkitError: Unsupported image format. Supported formats are JPG and PNG


RuntimeError Traceback (most recent call last) <ipython-input-12-e9c98a15eccb> in <module>() ----> 1 sf.head()

C:\Anaconda2\lib\site-packages\graphlab\datastructures\sframe.pyc in head(self, n) 2904 tail, printrows 2905 """ -> 2906 return SFrame(proxy=self.proxy.head(n)) 2907 2908 def todataframe(self):

graphlab\cython\cysframe.pyx in graphlab.cython.cysframe.UnitySFrameProxy.head()

graphlab\cython\cysframe.pyx in graphlab.cython.cysframe.UnitySFrameProxy.head()

RuntimeError: Runtime Exception. Exception in python callback function evaluation: ToolkitError('Unsupported image format. Supported formats are JPG and PNG',): Traceback (most recent call last): File "graphlab\cython\cypylambdaworkers.pyx", line 426, in graphlab.cython.cypylambdaworkers.evallambda File "graphlab\cython\cypylambdaworkers.pyx", line 169, in graphlab.cython.cypylambdaworkers.lambdaevaluator.evalsimple File "<ipython-input-11-9eb47a22ba77>", line 1, in <lambda> File "C:\Anaconda2\lib\site-packages\graphlab\datastructures\image.py", line 72, in init img = extensions.loadimage(makeinternalurl(path), format) File "C:\Anaconda2\lib\site-packages\graphlab\extensions.py", line 168, in <lambda> return lambda *args, **kwargs: runtoolkitfunction(fn, arguments, args, kwargs) File "C:\Anaconda2\lib\site-packages\graphlab\extensions.py", line 157, in runtoolkitfunction raise _ToolkitError(ret[1]) ToolkitError: Unsupported image format. Supported formats are JPG and PNG

Comments

User 16 | 6/30/2016, 6:11:18 PM

Assuming the url has the correct file extension, you could do something like this:

sf['img'] = sf['imUrl'].apply( lambda x: gl.Image(x) if x.endswith('.jpg') or x.endswith('.png') else None)


User 2570 | 6/30/2016, 6:34:45 PM

The code doing fine, download parts of the images, but seems to stop somehow, so when I check sf.head() to check if image columnis added nothing happens


User 16 | 7/1/2016, 12:14:37 AM

It's probably an issue with the downloading; probably one of the hosts you're downloading from is either very slow or offline, or perhaps the host is throttling you.

I'd suggest downloading the files locally first. Print out each URL before you try to download it. That way you'll know which one it's hanging on.


User 2570 | 7/1/2016, 4:37:55 AM

Thank you @Toby