unity_server behing firewall - how to configure web- or socks5-proxy?

User 1217 | 1/21/2015, 4:00:10 PM

I'm running GraphLab Create (unityserver via ipython) behind a firewall. Calls to gl.SFrame.readcsv() fail because they try to directly download from the internet. How can I configure a web or socks5 proxy for the unity_server to use for all outbound downloads?

In the log file (/tmp/graphlabserver1421855458.log.0) I get: 1421855460 : INFO: (constructfromcsvs:90): Construct sframe from csvs at http://s3.amazonaws.com/dato-datasets/bitcoin/useredges2011-07-13.txt 1421855460 : INFO: (constructfromcsvs:97): Parsing config: commentchar: continueonfailure: 1 delimiter: doublequote: 1 escapechar: \ navalues: ["NA"] quotechar: " skipinitialspace: 1 storeerrors: 0 use_header: 0

1421855460 : INFO: (parsecsvstosframe:800): Adding CSV file http://s3.amazonaws.com/dato-datasets/bitcoin/useredges2011-07-13.txt to list of files to parse 1421855460 : PROGRESS: (downloadurl:33): Downloading http://s3.amazonaws.com/dato-datasets/bitcoin/useredges2011-07-13.txt to /var/tmp/graphlab-this/24974/000000.txt 1421855460 : PROGRESS: (downloadurl:48): Failed to download http://s3.amazonaws.com/dato-datasets/bitcoin/useredges2011-07-13.txt: Couldn't resolve host name 1421855460 : ERROR: (operator():71): Fail to download from http://s3.amazonaws.com/dato-datasets/bitcoin/useredges2011-07-13.txt. Couldn't resolve host name 1421855460 : ERROR: (operator():21): Cannot open http://s3.amazonaws.com/dato-datasets/bitcoin/useredges_2011-07-13.txt


User 15 | 1/21/2015, 11:37:10 PM

For a normal HTTP proxy, try setting your http_proxy environment variable to http://username:password@proxy.my.domain:port. You may need to restart GraphLab Create so that the server picks up your environment variable on startup. I believe that should work, but unfortunately I don't have a way to test it.

Thanks for using GraphLab Create!


User 1217 | 1/23/2015, 2:35:26 PM

It works! To be honest, initially I thought it could be like that, but then I thought that the server gets started by ipython without propagating the env.. What library or classes are you using to communicate? Is it libcurl?

Thanks for providing GraphLab Create!

User 15 | 1/23/2015, 5:14:35 PM

Yeah, you're pretty close. When the server is started on the same machine as the client, your environment gets propagated to the server's environment. Our server indeed uses libcurl, though the http_proxy environment variable is apparently some sort of standard and works in other libraries too.

User 1217 | 1/26/2015, 1:37:13 PM

I just managed to make it even working with a socks5 (with remote DNS resolution) proxy by using the env var ALL_PROXY.

ALL_PROXY=socks5h://proxyname:proxyport ipython notebook --pylab=inline

User 15 | 1/26/2015, 5:51:14 PM

Good to know. Thanks for sharing!