window function for SArray

User 2013 | 8/17/2015, 4:57:00 PM

I have a case that need to transform my array to a new array with using the history data for each element. How I could do it efficiently in a gl way? Now I converted the SArray to python List and do the transformation in a pure python way, but I am worrying this might not be a scalable way for large volume of data.

The transformation in details are as follows. e.g. orgarray is a SArray object. orgarraypython = list(orgarray) windowsize =10 newlist=[] for i in range(orgarray.size()): start = max(0,i-windowsize) tmpvalue = max(orgarraypython[start:i+1]) # I need do some calculation for each element like a window sliding operation newlist.append(tmpvalue) newlistgl = gl.SArray(newlist)



User 1592 | 8/17/2015, 6:28:17 PM

Hi Cong We are working on a time series toolkit that will allow to do sliding window aggregations. Please stay tuned we will announce it in this forum once it is out.

User 2013 | 8/17/2015, 6:34:24 PM

Thanks for the update!

User 15 | 8/17/2015, 7:29:13 PM


You can actually run this code on SArray since you can slice and iterate over an SArray, just like you can a Python list. It probably won't be very performant until we implement sliding window operations on the C++ side. A note however, is that we currently have a bug that only allows a certain amount of appends, probably too small for your case. You could work around that bug by only periodically flushing to SArray or writing all values to a file and parsing it later as an SArray.

Sorry for the inconvenience. We hope to have sliding operations soon.


User 2013 | 8/17/2015, 9:38:09 PM

Hi Evan:

Thanks for letting me know there is a bug for SArray.append(). Now I understand why sometimes I lose response when I append the values to a SArray iteratively.

By the way, it seems also there is a similar issue if I try to get the sum() for a sliced SArray. the code like

for i in range(arraysize): sumvalue = asarray[:i].sum() apythonlist.append(sumvalue)

Is this a known bug? If not, I could create another post and add more details for that.

User 954 | 8/17/2015, 11:55:13 PM

Hi Cong,

As Evan mentioned this is a known bug that only allows a certain number of appends. Your latest cumulative sum script has the same issue. One simple workaround is to use our integration with numpy and calculate cumulative sum at scale.

<pre> import graphlab.numpy n = graphlab.numpy.array(sf) res = n. cumsum() </pre>

I hope it helps.

User 2013 | 8/18/2015, 12:12:12 AM