Is there a function to split a SFrame into a tupple of SFrames based on a boolean mask

User 2568 | 4/14/2016, 10:14:46 PM

I have a SFrame that I'd like to split into two SFrames based on some boolean mask, e.g.,

def split(sf, mask):

	inverse_mask = [1 if x == 0 else 0 for x in mask] # Graphlab doesn't have an "logical not" or inverse function like numpy.
	sf1 = sf[mask]
	sf2 ==sf[inverse_mask]

	return tuple(sf1, sf2)

Also, is there a better way to create the inverse of mask.?


User 15 | 4/15/2016, 8:57:31 PM

Hmm. I thought we had the inversion operator implemented, but I guess we don't. That's just a matter of adding an implementation for __invert__ in SArray. I'm guessing this hasn't been implemented because it only makes sense if the SArray is of type int, so the implementation would just have to check the type and fail appropriately. Should be straightforward. I'll create an issue to track.

As far as splitting in to a tuple with a boolean mask, I think you're doing it the best way as it stands. The closest thing we have is the hidden _group method on SFrame which is hidden because we need to do some things in the backend implementation to actually make it performant. That will group a column or columns in to different SFrames for each unique value of the group, so you could just add your boolean mask as a column and group on that. It's function is exactly the same as the TimeSeries group function. Once we make that bit performant, we will unhide/publicize that feature.

User 2568 | 4/16/2016, 1:40:04 AM

I propose that dato introduce a function that splits a SFrame based on a mask, returning a tuple. This has precedence with dropnasplit randomsplit.

The benefit of creating a builtin function to split a SFrame based on a mask is 1. Performance: it avoids the need to traverse the SFrame twice and computing the inverse of the mask. 2. Expressiveness: it creates a simple and fast way to partition a data set.

User 4 | 4/18/2016, 1:21:14 AM

Hi @Kevin_McIsaac -- since this is an SFrame feature, please open a Feature Request (tagged as "enhancement") issue in the SFrame issues on GitHub. SFrame is an open source project so we'd like to keep all project management (including bug tracking and feature requests) there. If you'd prefer I can open it for you, but if you open it it will help us track the origin of feature requests and you will get updates from GitHub when there are changes to the issue. Thanks!

User 4 | 4/18/2016, 1:22:14 AM

And in case it was not clear, I agree, this sounds like a great feature request -- thanks! In my mind the only reason not to do it immediately would be relative prioritization with other feature work (so it may take a while for us to get to it, but since it's open source we'll welcome contributions from others too!).