Languages available for text analytics

User 5147 | 4/25/2016, 9:29:47 AM

Hello, I'd like to know if graphlab's text analytics functionality is available for languages other than English. If yes, how do I specify the language of my text? Thanks.

Comments

User 19 | 4/25/2016, 5:37:12 PM

Hi Octavia,

You may use GraphLab functionality on any piece of text (e.g. CountWords, Tokenizer, etc) but your mileage may vary depending on your target language. For example, you might still be able to handle Latin-based languages even without having a language-specific tokenizer.

What languages are you working with? What's your goal use case?

Happy to help, Chris


User 5147 | 4/26/2016, 3:54:20 PM

Hi Chris, Thanks for your answer. I am particularly interested in French, and I am thinking about features derived via functions which are more language-dependent, such as lemmatisation and PoS tagging. Is this kind of functionality available within GraphLab? Thank you. Octavia


User 19 | 4/26/2016, 11:06:37 PM

Hi Octavia,

In the next version of GLC we will be providing a tool for extracting specific parts of speech from text, but at first we will only be supporting English. I'll add a feature request for French support.

Out of curiosity: What's your end task that these features will be used for?

Happy to help, Chris