Relation Extraction and Data Preprocessing with Raw Data

User 155 | 4/7/2014, 11:04:03 PM

I'm a big fan of graph-based machine learning, but it only works well with a clean dataset. Unfortunately, relation extraction from raw text is more difficult and I can't find any good methods or software packages that facilitate this process. Does anyone have any suggestions?


User 20 | 4/8/2014, 4:53:11 AM


Good question. This is pretty much in relation to the general "data preprocessing" problem; that initial data cleaning can be a very difficult task.

To that end in GraphLab Create, we have been architecting the "SFrame" which is a scalable, external memory table datastructure. If we imbue the SFrame with enough power, in combination with a scalable Graph datastructure, the task of extraction and conversion will be substantially simplified. Tighter integration between the SFrame and the Graph is something we are working on.


User 33 | 4/8/2014, 9:16:51 AM

graphbuilder is an opensource project for similar purpose

but, maybe it's not easy to use ^.^

User 155 | 4/10/2014, 1:07:40 AM

Thanks Yucheng and rongchen.

I'm guessing relation extraction is more of an NLP problem, but do most graph-based machine learning projects use clean, structured datasets? I might be just thinking about some strange examples like finding relations between people from raw text.