SFrame filtering with "Like" function?

User 512 | 10/10/2014, 11:30:50 PM

Is there any way to filter SFrame with similar "like" function from SQL? For example, if I have a table below:

ID Product 1 Apple Iphone 2 Apple Ipad 3 Samsung Galaxy

I want to filter out the records with "Apple" in Product field, so the resulted table will be: ID Product 3 Samsung Galaxy

In SQL, I can simply use like '%Apple%'. Can I do something similar in GraphLab?


User 19 | 10/10/2014, 11:59:51 PM

This will be made easier in our upcoming version: <pre>

sf = graphlab.SFrame({'id': ['Apple', 'Apple', 'Samsung'], 'product': ['Iphone', 'Ipad', 'Galaxy']}) sf.filter_by(['Apple'], 'id', exclude=True) Columns: id str product str

Rows: 1

Data: +---------+---------+ | id | product | +---------+---------+ | Samsung | Galaxy | +---------+---------+ [1 rows x 2 columns]


User 15 | 10/11/2014, 7:17:58 PM

What you're asking for can be accomplished through regular expression matching, which is easily done through using 'apply' with a python lambda function. Here's an example when 'product' has the company name and the model in the same column:


import graphlab as gl sf = gl.SFrame({'id':[1,2,3],'product':['Apple Iphone', 'Apple Ipad', 'Samsung Galaxy']}) import re prog = re.compile('.Apple.') sf[sf['product'].apply(lambda x: 0 if prog.match(x) else 1)] Columns: id int product str

Rows: Unknown

Data: +----+----------------+ | id | product | +----+----------------+ | 3 | Samsung Galaxy | +----+----------------+ [? rows x 2 columns] Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated. You can use len(sf) to force materialization.


Of course, the query will probably be faster if the column entries are clean enough to assume that all Apple products start with 'Apple'. Then you can do:

<pre> sf[sf['product'].apply(lambda x: 0 if x[0:5] == 'Apple' else 1)] </pre>

User 512 | 10/13/2014, 3:19:29 PM

Thanks much to both of you! The regular expression matching is exactly what I need.