Support Vector Machine

User 1102 | 4/15/2015, 12:24:04 PM

Hello all

I am using LIBSVM (Java Library) for Binary Classification Problem (0/1) of Medical Entity Relationship Detection ( Like Chest pain and mild pain) and i am using features as bag of words with word's POS , Chunk and others .

My training file looks like : [Class Label] [word1ID : Count ] [word2ID : Count] [word1POSID1] [word2POSID2] .... and so on.

Class Label : 1 means Relationship Exist 0 means Relationship not Exist

Is it possibile to use Graphlab Support Vector Machine toolkit for this binary classification problem ? If yes then how can i use it with these set of features ?

Comments

User 91 | 4/15/2015, 11:23:12 PM

You can use features as "dictionaries" i.e sparse representation. Here is the code to parse your data in the LIBSVM format. The underlying implementation will leverage the sparsity of your data.

<pre class="CodeBlock"><code> def libsvmtosf(inputpath, outputpath): """ Parse files in the libsvm format.

Parameters
----------
filename: str
  Full path to the file.

Returns
----------
An SFrame.
"""
sf = gl.SFrame.read_csv(input_path, header=False)
sf['target'] = sf['X1'].apply(lambda x: int(x.split(' ', 1)[0]) > 0)
sf['features'] = sf['X1'].apply(lambda x: {int(a.split(':')[0]):float(a.split(':')[1]) for a in x.split(' ')[1:]})
del sf['X1']
return sf</code></pre>

User 1102 | 4/22/2015, 12:06:04 PM

thank you very much !