Mercator

Mercator: A Scalable, Extensible Web Crawler. This paper describes Mercator, a scalable, extensible Web crawler written entirely in Java. Scalable Web crawlers are an important component of many Web services, but their design is not well‐documented in the literature. We enumerate the major components of any scalable Web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be comparable to that of other crawlers for which performance numbers have been published.


References in zbMATH (referenced in 14 articles )

Showing results 1 to 14 of 14.
Sorted by year (citations)

  1. de Assis, Guilherme T.; Laender, Alberto H.F.; Gonçalves, Marcos André; da Silva, Altigran S.: A genre-aware approach to focused crawling (2009)
  2. Nasri, Mitra; Shariati, Saeed; Azgomi, Mohammad Abdollahi: Performance modeling of a distributed web crawler using stochastic activity networks (2008)
  3. Shen, Hong; Zhang, Yu: Improved approximate detection of duplicates for data streams over sliding windows (2008)
  4. Flanagan, Cormac; Freund, Stephen N.; Qadeer, Shaz; Seshia, Sanjit A.: Modular verification of multithreaded programs (2005)
  5. Ngu, Anne H.H.; Rocco, Daniel; Critchlow, Terence; Buttler, David: Automatic discovery and inferencing of complex bioinformatics web interfaces (2005)
  6. Ngu, Anne H.H.; Rocco, Daniel; Critchlow, Terence; Buttler, David: Automatic discovery and inferencing of complex bioinformatics web interfaces (2005)
  7. Baeza-Yates, Ricardo; Castillo, Carlos: Crawling the infinite web: Five levels are enough (2004)
  8. Kim, Sung Jin; Lee, Sang Ho: Implementation of a web robot and statistics on the Korean web (2003)
  9. Bergmark, Donna; Lagoze, Carl; Sbityakov, Alex: Focused crawls, tunneling, and digital libraries (2002)
  10. Flanagan, Cormac; Qadeer, Shaz; Seshia, Sanjit A.: A modular checker for multithreaded programs (2002)
  11. Zeinalipour-Yazti, Demetrios; Dikaiakos, Marios: Design and implementation of a distributed crawler and filtering processor (2002)
  12. Chang, George; Healey, Marcus J.; McHugh, James A.M.; Wang, Jason T.L.: Mining the World Wide Web. An information search approach (2001)
  13. Flanagan, Cormac; Leino, K.Rustan M.: Houdini, an annotation assistant for ESC/Java (2001)
  14. Herzog, Marcus; Gottlob, Georg: InfoPipes: A flexible framework for M-commerce applications (2001)