Cannot start server: Unable to start local server.

User 327 | 5/28/2014, 4:01:21 PM

Hi, I got the following error when run Logistic Regression with Create on an Ubuntu 14.04 machine: ubuntu@ip-10-154-286-01:~$ python LogisticRegression.py [ERROR] Cannot start server: Unable to start local server. [INFO] GraphLab server shutdown Traceback (most recent call last): File "ppp2.py", line 3, in <module> data = SFrame('train.csv') File "/usr/local/lib/python2.7/dist-packages/graphlab/datastructures/sframe.py", line 211, in init self.proxy = UnitySFrameProxy(glconnect.getclient()) File "/usr/local/lib/python2.7/dist-packages/graphlab/connect/main.py", line 203, in getclient assert isconnected(), "Cannot connect to GraphLab Server" AssertionError: Cannot connect to GraphLab Server

Any hint what's the problem? Thanks! Best, Suijian

Comments

User 14 | 5/28/2014, 5:00:13 PM

Hi Suijian,

Can you send us your platform details? Is it a 32-bit or 64-bit machine, and how many cores does it have?

Thanks


User 327 | 5/28/2014, 7:16:59 PM

Hi, It's 64-bit machine with 1 core. Thanks.

-Suijian


User 14 | 5/28/2014, 9:25:03 PM

GraphLab Create v0.2 requires at least 2 cores. However, the problem is solved in v0.3 which will be out soon. Stay in touch!


User 327 | 5/28/2014, 9:56:31 PM

ok. I see. Thanks!


User 14 | 5/29/2014, 12:09:35 AM

Hi Suijian,

We just released 0.3 this afternoon. Give it a try and let us know how it goes.

Best, -jay


User 327 | 5/29/2014, 3:24:32 PM

Hi, Jay, I installed this 0.3 version and it works, localserver could be started now. However when I tried to run the example of Logistic Regression as in http://graphlab.com/products/create/docs/graphlab.toolkits.regression.html:

from graphlab import logisticregression, SFrame data = SFrame('train-admissions-data.csv') m = logisticregression.create(data, response='admission', predictors=['GPA', 'SAT-score', 'essay-score']) testdata = SFrame('test-admissions-data.csv') predictions = m.predict(testdata) print predictions:

it failed as: TypeError: create() got an unexpected keyword argument 'response'

Not sure what's the problem.

Best, -Suijian


User 14 | 5/29/2014, 3:44:02 PM

There is a known issue that creating SFrame from csv has trouble with 1 core. We have a fix in a newer version which I can send you individually. Sorry about the inconvenience.


User 327 | 5/29/2014, 3:51:04 PM

Thanks Jay! By the way, what's exactly is format of the input dataset? I think it should looks like the following: $cat input.csv "Target","Feature1","Feature2",...,"FeatureN" 1, 0.2, 0.5,...,1.2 0, 0.3, 0.8,..., 0.9 ......

Best -Suijian


User 14 | 5/29/2014, 6:44:45 PM

Yes, the default input format is a standard csv file.


User 14 | 5/29/2014, 6:59:12 PM

Hi Suijian,

I've attached GraphLab-Create 0.3.01 with the bug fix. To install it, you can do "pip install YOURDOWNLOADFOLDER/Graphlab-Create-0.3.01.tar.gz". If you installed the previous GraphLab-Create in a virtualenv, make sure you install this one in the same environment.

Also here is a csv format dataset that you can download to try out the logistic regression. http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

However, the csv file does not have header, so to do it you will need to tweak the code as follows: data = SFrame.readcsv('wine.data', header=False, columntypehints=float) m = logisticregression.create(data, target='X1', features=['X2', 'X3', 'X4'])

Notice that in 0.3, we change the API for regression, so 'response' => 'target', and 'predictors'=>'features'.

Thanks -jay


User 327 | 5/29/2014, 7:07:41 PM

Hi, Jay, For this small test dataset: $ cat tiny.csv "Activity","F1","F2","F3" 0,0.5,0.8,0.2 1,0.7,0.2,0.8 1,0.2,0.3,0.5 0,0.3,0.4,0.7 0,0.2,0.9,0.8

When I run the following: from graphlab import logisticregression, SFrame data = SFrame('tiny.csv') m = logisticregression.create(data, target="Activity", features=['F1','F2','F3']) testdata = SFrame('tiny.csv') predictions = m.predict(testdata) print predictions

It failed in the step of "logisticregression.create(data, target="Activity", features=['F1','F2','F3'])" : [INFO] GraphLab Server Version: 0.3.0 PROGRESS: Read 0 lines. Lines per second: 0 PROGRESS: Finish parsing 0 lines in 0.001989 secs. Traceback (most recent call last): File "LogisticRegression.py", line 6, in <module> m = logisticregression.create(data, target="Activity", features=['F1','F2','F3']) File "/usr/local/lib/python2.7/dist-packages/graphlab/toolkits/regression/logistic_regression.py", line 269, in create raise TypeError("Target column must be type int") TypeError: Target column must be type int [INFO] Stopping the server connection. [INFO] GraphLab server shutdown

But the Target Column(the first column) is already with type 'int', confused what the input format should be.

Best, -Suijian


User 14 | 5/29/2014, 7:31:35 PM

Try with data = SFrame('tiny.csv', columntypehints=float). By default, all columns will be parsed as str, unless columntypehints is used. If you type "data", you will find out that the target column is str.


User 327 | 5/29/2014, 7:44:48 PM

Thanks Jay, I tried it but failed with: Traceback (most recent call last): File "LogisticRegression.py", line 5, in <module> data = SFrame('tiny.csv', columntypehints=float) TypeError: init() got an unexpected keyword argument 'columntypehints'


User 14 | 5/29/2014, 8:32:47 PM

data = SFrame.readcsv('tiny.csv', columntype_hints=float)


User 327 | 5/29/2014, 8:39:32 PM

Solved it by use SFrame.read_csv() instead. Now the program runs successfully, but there's a new problem: I'm trying to process a input dataset with size of 18MB, each row has ~1700 features. For a test processing in which I only picked out 10 rows of the file, the program runs without problem. But for the full dataset, it failed as:

Unable to reach server for 3 consecutive pings. Server is considered dead. Please exit and restart. Traceback (most recent call last): File "LogisticRegression.py", line 7, in <module> 632','D1633','D1634','D1635','D1636','D1637','D1638','D1639','D1640','D1641','D1642','D1643','D1644','D1645','D1646','D1647','D1648','D1649','D1650','D1651','D1652','D1653','D1654','D1655','D1656','D1657','D1658','D1659','D1660','D1661','D1662','D1663','D1664','D1665','D1666','D1667','D1668','D1669','D1670','D1671','D1672','D1673','D1674','D1675','D1676','D1677','D1678','D1679','D1680','D1681','D1682','D1683','D1684','D1685','D1686','D1687','D1688','D1689','D1690','D1691','D1692','D1693','D1694','D1695','D1696','D1697','D1698','D1699','D1700','D1701','D1702','D1703','D1704','D1705','D1706','D1707','D1708','D1709','D1710','D1711','D1712','D1713','D1714','D1715','D1716','D1717','D1718','D1719','D1720','D1721','D1722','D1723','D1724','D1725','D1726','D1727','D1728','D1729','D1730','D1731','D1732','D1733','D1734','D1735','D1736','D1737','D1738','D1739','D1740','D1741','D1742','D1743','D1744','D1745','D1746','D1747','D1748','D1749','D1750','D1751','D1752','D1753','D1754','D1755','D1756','D1757','D1758','D1759','D1760','D1761','D1762','D1763','D1764','D1765','D1766','D1767','D1768','D1769','D1770','D1771','D1772','D1773','D1774','D1775','D1776']) File "/usr/local/lib/python2.7/dist-packages/graphlab/toolkits/regression/logisticregression.py", line 303, in create ret = graphlab.toolkits.main.run("regressiontraininit", opts) File "/usr/local/lib/python2.7/dist-packages/graphlab/toolkits/main.py", line 73, in run (success, message, params) = unity.runtoolkit(toolkitname, options) File "cyunity.pyx", line 59, in graphlab.cython.cyunity.UnityGlobalProxy.runtoolkit File "cyunity.pyx", line 63, in graphlab.cython.cyunity.UnityGlobalProxy.runtoolkit RuntimeError: Communication Failure: 113. [INFO] Stopping the server connection. Unable to reach server for 4 consecutive pings. Server is considered dead. Please exit and restart. [WARNING] <type 'exceptions.IOError'> [WARNING] <type 'exceptions.ValueError'> [INFO] GraphLab server shutdown

Seems the program could not process this 18MB input file(not so big I think)? I'm running now on a 2 cores machine with 7GB memory. Any suggestions? Thanks!

Best -Suijian


User 14 | 5/29/2014, 8:41:23 PM

Can you send me your training and test files? Thanks.


User 327 | 5/29/2014, 8:45:50 PM

Yes, as in the attachment. Thanks Jay!


User 91 | 5/29/2014, 10:29:18 PM

The SFrame (like most other column stores out there) is not (as yet :) ) optimized to perform well on thousands of columns. We are working on it.

Most importantly, that doesn't mean that you can't work with thousands of features. For this, we provide list and dictionary types each of which can encode thousands of features. We are working on some educational notebooks to help understand how to best use these with our regression packages.

For example, the features D1, D2, D3 can be converted to a "single feature" which is a list of numbers [D1, D2, D3]. Or better yet, we can work with a dictionary (where we only need to store the non-zeros). Eg: {D1: 1, D2: 3, D3: 1}. Your data contains a lot of "0" entries so I would recommend that you use a dictionary type.

The following code will help you work with the dataset that you have. First, we will convert the data into the list and dictionary types and then train a logistic regression model.


import graphlab as gl

Remove the header from your header file. Let's parse it as a string

sf = gl.SFrame.readcsv('trainwithout_head.csv', header=False, delimiter='None')

Lets use the

sf['target'] = sf['X1'].apply(lambda x: int(x[0]))

Convert the string to a list type

a,b,c = [a, b, c]

sf['features'] = sf['X1'].apply(lambda x: map(float, x[2:].split(',')))

Optimization:

Convert the list type to a dictionary type by using only the non-zeros.

sf['sparse_features'] = sf['features'].apply(lambda lst: {i:v for (i,v) in enumerate(lst) if v > 0})

Train regression with the dictionary feature

model = gl.logistic_regression.create(sf, 'target', ['sparse_features'])

Train regression with the list feature

Same as the above but will be slower because your data is sparse.

model = gl.logistic_regression.create(sf, 'target', ['features'])



User 327 | 5/30/2014, 3:41:48 PM

Thanks Srikris and Jay, it works now!

Best, -Suijan