SFrame filtering with "Like" function?

User 512 | 10/10/2014, 11:30:50 PM

Is there any way to filter SFrame with similar "like" function from SQL? For example, if I have a table below:

ID Product 1 Apple Iphone 2 Apple Ipad 3 Samsung Galaxy

I want to filter out the records with "Apple" in Product field, so the resulted table will be: ID Product 3 Samsung Galaxy

In SQL, I can simply use like '%Apple%'. Can I do something similar in GraphLab?

Comments

User 19 | 10/10/2014, 11:59:51 PM

This will be made easier in our upcoming version: <pre>

sf = graphlab.SFrame({'id': ['Apple', 'Apple', 'Samsung'], 'product': ['Iphone', 'Ipad', 'Galaxy']}) sf.filter_by(['Apple'], 'id', exclude=True) Columns: id str product str

Rows: 1

Data: +---------+---------+ | id | product | +---------+---------+ | Samsung | Galaxy | +---------+---------+ [1 rows x 2 columns]

</pre>


User 15 | 10/11/2014, 7:17:58 PM

What you're asking for can be accomplished through regular expression matching, which is easily done through using 'apply' with a python lambda function. Here's an example when 'product' has the company name and the model in the same column:

<pre>

import graphlab as gl sf = gl.SFrame({'id':[1,2,3],'product':['Apple Iphone', 'Apple Ipad', 'Samsung Galaxy']}) import re prog = re.compile('.Apple.') sf[sf['product'].apply(lambda x: 0 if prog.match(x) else 1)] Columns: id int product str

Rows: Unknown

Data: +----+----------------+ | id | product | +----+----------------+ | 3 | Samsung Galaxy | +----+----------------+ [? rows x 2 columns] Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated. You can use len(sf) to force materialization.

</pre>

Of course, the query will probably be faster if the column entries are clean enough to assume that all Apple products start with 'Apple'. Then you can do:

<pre> sf[sf['product'].apply(lambda x: 0 if x[0:5] == 'Apple' else 1)] </pre>


User 512 | 10/13/2014, 3:19:29 PM

Thanks much to both of you! The regular expression matching is exactly what I need.