PEGASUS: A peta-scale graph mining system implementation and observations. In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrix-vector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIM-V (Generalized Iterated Matrix-Vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with 6,7 billion edges.

References in zbMATH (referenced in 8 articles )

Showing results 1 to 8 of 8.
Sorted by year (citations)

  1. Aydin, Kevin; Bateni, Mohammadhossein; Mirrokni, Vahab: Distributed balanced partitioning via linear embedding (2019)
  2. Ho, Qirong; Yin, Junming; Xing, Eric P.: Latent space inference of Internet-scale networks (2016)
  3. Slota, George M.; Madduri, Kamesh; Rajamanickam, Sivasankaran: Complex network partitioning using label propagation (2016)
  4. Koutra, Danai; Kang, U.; Vreeken, Jilles; Faloutsos, Christos: Summarizing and understanding large graphs (2015)
  5. Chiou, Tao-Wei; Tsai, Shi-Chun; Lin, Yi-Bing: Network security management with traffic pattern clustering (2014) ioport
  6. Malliaros, Fragkiskos D.; Vazirgiannis, Michalis: Clustering and community detection in directed networks: a survey (2013)
  7. Li, Lei; Wang, Ding-Ding; Zhu, Shun-Zhi; Li, Tao: Personalized news recommendation: a review and an experimental investigation (2011) ioport
  8. Plimpton, Steven J.; Devine, Karen D.: MapReduce in MPI for large-scale graph algorithms (2011) ioport