GraphLab server communication

User 726 | 1/14/2015, 9:22:41 PM

I was wondering what if any services are run automatically on the GraphLab remote EC2 server?


User 16 | 1/15/2015, 9:59:42 PM

First to clarify, an EC2 server only starts when you do one of the following: 1 - Call 2 - Call graphlab.deploy.predictiveservice.create(....) 3 - Create a job using an instance of graphlab.deploy.environment.EC2

I'm not sure what exactly you mean by "services". Our EC2 instances run on Linux so all of the standard Linux services will be running. However we automatically configure the firewall (AWS calls this the security group) so we only allow access to what your GraphLab Create Client needs to communicate with the GraphLab Create Server on EC2.

Do you have specific concerns or is there a service you would like to run?

User 1375 | 3/7/2015, 1:18:45 AM

Is there a way to view the remote EC2 GraphLab Server logs? More generally, how does one set up ssh key-pair authentication so as to easily ssh into the EC2 machine. Unrelated, I noticed that the image provisioned via aws.launch_EC2 is not set up with ODBC connectivity. I suppose one would then need to ssh in and do it manually? Any plans for making ODBC setup be part of the AMI? Sorry to spiral from one question to another, and thank you.

User 1394 | 3/9/2015, 11:49:29 PM

Hey msainz -

Great questions! Thanks for being such an active and engaged user! I will try to address one by one below.

<b class="Bold">1) View EC2 GraphLab Server logs</b>

Depends on how the EC2 instance is launched. If launched with than the logs are not exposed in an easy way. If you launch using graphlab.deploy.job.create() than the logs are written to the S3 location specified in the EC2 Environment object.

<b class="Bold">2) ssh-ing to the EC2 instance.</b>

There are a few steps required, and we discourage it (mostly because it should not be required). Is the reason you want to ssh so you can configure ODBC? You are correct, using ODBC with is currently not supported. We would need to bake into the AMI all the different database drivers that could potentially be used - which would always be an incomplete list.

My suggestion instead is to launch an EC2 instance outside of GraphLab Create, ssh to it, and then pip install graphlab-create on it. This way you have full control over the database drivers to install and maintain. If you want to use GraphLab Canvas from this remote EC2 machine, we have some instructions on how to do that available <a href="">here</a>.

Let me know if you have any other questions. Feel free to email me off-list ( as well.



User 1375 | 3/10/2015, 5:19:52 AM

Thank you for your help, Rajat. Tangentially, I am confused as to where the dividing lines between GraphLab create and Dato Distributed are situated. For example, if I execute graphlab.deploy.job.create() or graphlab.modelparametersearch, am I tapping into Dato Distributed territory? Feel free to email me off-list (marcos at glassdoor dot com). Thank you.

User 1394 | 3/10/2015, 6:01:18 AM

Hey msainz -

Yes, this is confusing. We have not done a good job of making the package delineations clear when using GraphLab Create. Essentially what your intuition here is correct, executing Jobs is part of Dato Distributed.

We are in the process of making changes to make this delineation more apparent, so stay tuned for more details :smiley: