A problem when I run with hdfs

User 872 | 11/3/2014, 8:46:00 PM


When I run pagerank with data in hdfs, I met the problem you have posted in tutorial as below.

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://hydra:9000/home/lyuwei/hadoop-tmp/data, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:390) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:312) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:862) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:887) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:487) Call to org.apache.hadoop.fs.FileSystem::listStatus failed! WARNING: distributedgraph.hpp(loadfrom_hdfs:2235): No files found matching hdfs://hydra:9000/home/lyuwei/hadoop-tmp/data/

However, I don't know what "varify classpath includes all hadoop required folders" really means. I have add several .jar files of hadoop in classpath and my instruction is as below.

mpiexec -n 2 -hostfile ~/machines env GRAPHLABSUBNETID= CLASSPATH=~/hadoop-1.2.1/hadoop-core-1.2.1.jar:~/hadoop-1.2.1/lib/commons-logging-1.1.1.jar:~/hadoop-1.2.1/lib/commons-configuration-1.6.jar:~/hadoop-1.2.1/lib/commons-lang-2.4.jar:~/hadoop-1.2.1/lib/commons-collections-3.2.1.jar:~/hadoop-1.2.1/lib/commons-httpclient-3.0.1.jar /home/lyuwei/graphlab/release/toolkits/graphanalytics/pagerank --graph=hdfs://hydra:9000/home/lyuwei/hadoop-tmp/data/ --format=tsv --iterations=2 --ncpus=32 --saveprefix=/home/lyuwei/testfolder/tout

So, can you help me pick out my error? Thank you a lot.

Best, Lyuwei


User 6 | 11/4/2014, 6:21:46 AM

I think you need to specify the env command twice since you give one subnet id and one classpath. Also I recommend giving full path and not path which begins with "~". Namely it should be something like: mpiexec -n 2 -hostfile ~/machines <b class="Bold">env</b> GRAPHLABSUBNETID= <b class="Bold">env</b> CLASSPATH=~/hadoop-1.2.1/hadoop-core-1.2.1.jar

Please note, we have a newer version of Graphlab that is soon going to deprecate powergraph. It does have HDFS support and you can easily run pagerank there. See documentation here: http://graphlab.com/products/create/docs/graphlab.toolkits.graph_analytics.html

User 872 | 11/4/2014, 9:41:13 AM

Hi, Danny. Thanks for your considerate instruction. After modification, the problem still exist. Would you provide me a list of *.jar I should add into classpath?

Besides, are there some papers about the new version? I am willing to read them.

User 6 | 11/4/2014, 10:40:03 AM

You should run on those machines the command hadoop classpath

and then record the list of jars which are required

User 872 | 11/5/2014, 7:21:21 AM

Thank you Danny. The problem has been solved.