User 944 | 11/19/2014, 3:32:58 AM
I have input a file which is stored as tsv format, like below: 9999 7943 9999 9358
the file is in the HDFS system and I have 3 distributed machines. and then the below command is executed and successful.
mpiexec -n 3 --hostfile /root/machines env CLASSPATH=$CLASSPATH /home/hongsibao/graphlab/graphlab-master/release/toolkits/graph_analytics/pagerank --graph=hdfs://10.67.238.65:9000/pr10W --format=tsv --iterations=9 --engine=synchronous --saveprefix=/home/xuke/1106/
My questions are: 1. I don't think the original file is divided into 3 parts which are addressed by each machine separately because as I check the cost memory, the value of the 3 machines are almost same with the result of execution by single machine.
So I think each machine deal with the whole data not part of it, why?
Any help would be appreciated. BR,