example for simple distributed algorithm

User 241 | 4/22/2014, 11:48:17 AM

Hello, I just started using the graphlab,I went over the example for pagerank that run on a simple machine. I want to run it using distributed system (several machines), where can I find an example or tutorial that explains what I need? Thanks

Comments

User 6 | 4/22/2014, 12:02:10 PM

Here is a distributed pagerank example: http://docs.graphlab.org/usinggraphlabdistributedgraphvertex_program.html


User 241 | 4/22/2014, 12:25:40 PM

Hello, Thanks for the fast reply, however it is not what I was looking for. I already run this example, the graph file is located in one machine. What I am trying to do is to split the graph data into different machines, also the computation effort should be splitted to reduce the need for transfering the whole data between the different machines. Myabe I am missing the concept of graphlab,an rediricting me to a simple example would help me understanding it.


User 6 | 4/22/2014, 1:47:26 PM

Hi, The code you are looking at is distributed. If you run it on a single machine, it can run. However it is more general and can run on a cluster. Basically you need to have a shared file system like NFS and put the graph files split into disjoint parts (you can use linux bash split -l command to do that). When running GraphLab using mutiple machines, each graphlab node will read a disjoint subset of the graph.

See the tutorial "GraphLab cluster deployment" here: http://graphlab.org/projects/tutorials.html Step 2(a).

Note: if you don't have access to a shared file system, you can save all the input files locally in each node and they will still be read in part.


User 241 | 4/22/2014, 2:10:39 PM

Thanks again it makes it much simpler. I am looking at the tutorial to deploy my own clusters,I've got some general question regarding it. 1. the command "mpiexec -n 2 -hostfile ~/machines /path/to/pagerank --powerlaw=100000" what does "--powerlaw=100000" means? 2. naive question (just to make sure), GraphLab in the main machine is supposed to understand that the graph is distributed and different nodes in different machines could have dependencies and make needed communication between machines?


User 6 | 4/22/2014, 2:24:15 PM

1) --powerlow is an optional command line argument which lets you create a synthetic graph on the fly. My guess you will not need it as you want to read the graph in disjoint parts from file. 2) Yes, this is done automatically since graphlab understands the edges connects different node ids, and communication is made across the network between the different graph nodes.


User 241 | 4/22/2014, 4:23:38 PM

Thanks again, I've installed GraphLab on one machine and successfully run page range algorithm on a single machine. Now I am trying to run it on 2 machines. (GraphLab is only installed on the main one) when running : "~/graphlab/scripts/mpirsync" I getting an error like this: "bash: orted: command not found


A daemon (pid 32526) died unexpectedly with status 127 while attempting to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LDLIBRARYPATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.




mpiexec.openmpi: clean termination accomplished

bash: orted: command not found

A daemon (pid 32657) died unexpectedly with status 127 while attempting to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LDLIBRARYPATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.




mpiexec.openmpi: clean termination accomplished " what could be the problem?


User 6 | 4/22/2014, 4:55:29 PM

Are the two nodes configured with the same open mpi version? What is the content of the file ~/machines in your root folder? Basically the mpirsync script calls two mpi calls: https://github.com/graphlab-code/graphlab/blob/master/scripts/mpirsync#L5


User 241 | 4/22/2014, 5:01:41 PM

I believe It is the problem,I will install openMPI on the other machine (I will need IT to do that for me) and update here. Another thing which is not completely clear for me ,what happens when I simply run " mpiexec -n 4 ./myfirstapp" , the same program run in 4 instances without any interaction between those instances?is there anything useful in doing such thing?


User 241 | 4/23/2014, 10:37:06 AM

I solved the problem with openMPI and script is working, there is another problem now. I followed those instructions: "cd ~/graphlab/release/toolkits ~/graphlab/scripts/mpirsync cd ~/graphlab/deps/local ~/graphlab/scripts/mpirsync" it was not enough so I added the following instruction to move the files for the app itself: "cd ~/graphlab/release/apps/pagerank ~/graphlab/scripts/mpirsync" then I run this instruction: "mpiexec -n 2 -hostfile ~/machines /graphlab/release/pagerank/pagerank --powerlaw=100" it didn't work , also running the app from the second machine (where graphlab is installed) didn't work and I get the same error like this : "./pagerank: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory " I tried this solution : "LDLIBRARYPATH=$LDLIBRARYPATH:/home/morada/graphlab/deps/local/lib/:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/" and updating the dependencies : "sudo apt-get update sudo apt-get install gcc g++ build-essential libopenmpi-dev openmpi-bin default-jdk cmake zlib1g-dev git" but still getting the same error. what should I do?


User 6 | 4/23/2014, 10:42:35 AM

Hi It seems that Java is not installed or not configured properly on the second machine. We do recommend to use identical machines. If you are not planning to use HDFS I recommend configuring graphlab using ./configure --no_jvm and don't forget to recompile and copy the excutables to the other machines (using mpirsync)

Disabling Java will solve this issue.

Regarding your question about myfirstapp - this is only a "hello world" demo. Typically in such a program there is no much sense. In general, unlike in Hadoop, graphlab processes do communicate with each other and thus obtain much better performance (relative to writing and reading from disk for communication)


User 241 | 4/23/2014, 12:18:54 PM

Thanks,meanwhile it solved the problem (I will have to solve it later to enable HDFS). 1.I am not sure how am I supposed to split the graph.txt file into 2 files. in page rank example: "The key behind the load() function is that its actual behavior is to load all files which begin with the name provided. In other words, if the graph file is cut into many smaller pieces such as graph.txt.1 graph.txt.2, graph.txt.3, etc, the system will load all the files matching graph.txt*," does it mean to create graph.txt.1 in machine1 and graph.txt.2 in machine 2? 2. regarding my previous answer, if I run locally "mpiexec -n 4 ./page_rank" when having one input file (simply graph.txt), what really happens? I still didn't fully understand the command (you can redirect me to another source of information if available). Thanks