Two questions on GraphLab Data Feed?

User 512 | 4/15/2015, 11:13:14 PM

I have two questions on GraphLab data feed:

1) Is there any way to directly load data from Splunk to GraphLab SFrame?

2) What is the data compression rate for SFrame? For example, if I have 100TB data in Hadoop, what would be the rough size in GraphLab? I would like to estimate the storage needed for my analysis.

Thanks!

Comments

User 1592 | 4/17/2015, 1:51:36 PM

Hi 1) We support multiple input formats. You can use <a href="https://www.splunk.com/enus/solutions/solution-areas/business-analytics/odbc-driver.html">Splunk ODBC driver</a>. See an example for ODBC connection here: https://github.com/dato-code/how-to/blob/master/sqlto_sframe.py

2) Compression rates depends on your data. As we compress columns and not rows, we typically gain compression which is x2 than what you will get in gzip. So to be on the safe side, you can gzip your data and use it as a good estimate of our storage cost.


User 512 | 4/22/2015, 3:12:40 PM

Thanks for the reply! But Splunk ODBC driver only support Windows OS, while GraphLab and Hadoop are running under Linux OS, so I am not sure if this would work.


User 1592 | 4/22/2015, 3:29:26 PM

We are releasing a Windows version in a few days. Would you like to try it out?


User 512 | 4/22/2015, 4:05:35 PM

You mean GraphLab Windows version? Sure, I would be happy to try it.


User 512 | 7/18/2015, 4:54:12 AM

Any update on Graphlab Windows version?


User 4 | 7/18/2015, 7:06:29 AM

I am happy to announce that as of version 1.5.1, released hours ago, GraphLab Create is available natively for Windows (in beta)! You can follow the same download process as for other platforms, or if you know your product key, pip install directly on Windows and call graphlab.product_key.set_product_key.