[ERROR] Could not enable cache on ec2

User 481 | 5/27/2015, 10:32:58 PM

I am getting the following error while creating a EC2 predictive service. The environment is created but fails in creating the ps.

Setting up the predictive service deployment ... 
[INFO] This commercial license of GraphLab Create is assigned to mjdata@mindjet.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-4818 - Server binary: /Library/Python/2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1432763976.log
[INFO] GraphLab Server Version: 1.4.0
[INFO] Launching Predictive Service with 1 hosts, as specified by num_hosts parameter
[INFO] Launching Predictive Service, with name: test-ps
[INFO] [Step 0/5]: Initializing S3 locations.
[INFO] [Step 1/5]: Launching EC2 instances.
[INFO] This commercial license of GraphLab Create is assigned to mjdata@mindjet.com.

[INFO] Launching an m3.xlarge instance in the us-west-2b availability zone, with id: i-ff336c09. You will be responsible for the cost of this instance.
[INFO] WARNING: Launching Predictive Service without SSL certificate!
[INFO] [Step 2/5]: Launching Load Balancer.
[INFO] [Step 3/5]: Configuring Load Balancer.
[INFO] [Step 4/5]: Waiting for Load Balancer to put all instances into service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster is fully operational, [1/1] instances currently in service.
[INFO] [Step 5/5]: Finalizing Configuration.
[ERROR] Could not enable cache on ec2-52-25-228-110.us-west-2.compute.amazonaws.com: (<requests.packages.urllib3.connectionpool.HTTPConnectionPool object at 0x1061c29d0>, 'Connection to ec2-52-25-228-110.us-west-2.compute.amazonaws.com timed out. (connect timeout=10)')
[ERROR] Cannot get node status from i-ff336c09, error: Cannot get status for host ec2-52-25-228-110.us-west-2.compute.amazonaws.com, error: (<requests.packages.urllib3.connectionpool.HTTPConnectionPool object at 0x1061eac10>, 'Connection to ec2-52-25-228-110.us-west-2.compute.amazonaws.com timed out. (connect timeout=10)')
[WARNING] Tearing down Predictive Service due to error launching
[INFO] Deleting load balancer: test-ps
[INFO] Terminating EC2 host(s) [u'i-ff336c09'] in us-west-2
[INFO] Deleting log files.
[INFO] Deleting model data.
Traceback (most recent call last):
File "/usr/local/bin/mjdatasvc", line 8, in 
execfile(file)
File "/bin/mjdatasvc", line 164, in 
main(sys.argv[1:])
File "/bin/mjdatasvc", line 126, in main
setup_env(envname, psname, s3bucketpath)
File "/bin/mjdatasvc", line 76, in setup_env
api_key='')
File "/Library/Python/2.7/site-packages/graphlab/deploy/predictive_service.py", line 297, in create
result.cache_enable(None, True)
File "/Library/Python/2.7/site-packages/graphlab/deploy/_predictive_service/_predictive_service.py", line 2284, in cache_enable
while not self._environment._is_cache_ok("healthy") and \
File "/Library/Python/2.7/site-packages/graphlab/deploy/_predictive_service/_predictive_service_environment.py", line 277, in _is_cache_ok
for x in status]
File "/Library/Python/2.7/site-packages/graphlab/deploy/_predictive_service/_predictive_service_environment.py", line 274, in healthy_or_disabled
str(cache_status))
RuntimeError: ('Unexpected value for cache_status: %s', 'None')
[INFO] Stopping the server connection.

No predictive service is being created. Any ideas?

Comments

User 1394 | 5/27/2015, 11:08:49 PM

Hey MSH -

Sorry to read this. Does this happen consistently for you? Can you try launching the PS a couple times and see if the error repeats?

Also, can you try a 3-node PS and see if the error repeats? There is a caching behavior change from 1 node deployments and 3 node deployments.

Thanks,

Rajat


User 481 | 5/28/2015, 12:00:28 AM

I tried it with 3 hosts but still getting the same error, only thrice now, once for each hosts.


User 1394 | 5/28/2015, 12:24:51 AM

The root cause for this is a connectivity issue between the client machine and the EC2 instances. When making administrative changes to the deployment, we use port 9005 for communication. When we cannot communicate over port 9005 this is the error displayed.

We should improve the error message, but this is the root cause of this issue.

Can you confirm that you can make outbound connections on port 9005?

Thanks,

Rajat


User 481 | 5/28/2015, 6:22:55 AM

It was the connectivity issue. On a different network the issue did not persist. Thanks!