Total Subgraph Centrality with HDFS

User 459 | 7/10/2014, 6:55:07 AM

I am trying to modify code of Total Subgraph Centrality (TSC) from toolkit so that it can load data from HDFS and save data to HDFS . I follow instructions on the documentation but it doesn't seems to work.

So for saving the file I wrote

class graphwriter { public: std::string savevertex(graphtype::vertextype v) { std::stringstream strm; strm << v.id() << "\t" << v.data().TSC << "\n"; return strm.str(); } std::string saveedge(graphtype::edge_type e) { return ""; } };

and In the main function I have add

graph.save("hdfs:localhost:8020/output",graph_writer(), false, // set to true if each output file is to be gzipped true, // whether vertices are saved false); // whether edges are saved);

Please help me here.

Comments

User 459 | 7/10/2014, 5:04:18 PM

Logs suggest me that distributedgraph.hpp(loadfrom_hdfs:2226) Attempting to load a graph from HDFS but GraphLab was built without HDFS.

In previous version I think we can pass --hadoop flag but in newer version the is no option for hadoop. So how to build graph lab project with HDFS, please suggest me


User 6 | 11/7/2014, 3:54:27 PM

Hi, You need to setup mpi env command which points to the Hadoop CLASSPATH