Automatic web information extraction in the ROADRUNNER system Road Runner is a combined project of the Database Group of Università di Roma Tre and of the Database Group of Università della Basilicata. The project investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. In fact, many Web-based applications today use wrappers to extract data from HTML pages. These wrappers, however, are usually coded by hand, and therefore their generation and maintenance are difficult and labor intensive. To automate the wrapper generation and the data extraction process, the Road Runner project aims at developing original techniques to automatically generate wrappers. A wrapper generation system has been implemented in a working prototype, which has been used to conduct a number of experiments on real-life data-intensive Web sites. These experiments confirm the feasibility of the approach and. The system prototype has been implemented in Java.

References in zbMATH (referenced in 21 articles , 1 standard article )

Showing results 1 to 20 of 21.
Sorted by year (citations)

1 2 next

  1. Han, Wook-Shin; Kwak, Wooseong; Yu, Hwanjo; Lee, Jeong-Hoon; Kim, Min-Soo: Leveraging spatial join for robust tuple extraction from web pages (2014)
  2. Fazzinga, Bettina; Flesca, Sergio; Tagarelli, Andrea: Schema-based Web wrapping (2011)
  3. Liu, Wei; Yan, Hualiang; Xiao, Jianguo: Automatically extracting user reviews from forum sites (2011)
  4. Nachouki, Gilles; Quafafou, Mohamed: MashUp web data sources and services based on semantic queries (2011)
  5. Álvarez, Manuel; Pan, Alberto; Raposo, Juan; Bellas, Fernando; Cacheda, Fidel: Finding and extracting data records from web pages (2010)
  6. Li, Qing; Chen, Jing; Wu, Yipu: Algorithm for extracting loosely structured data records through digging strict patterns (2009)
  7. Michelson, M.; Knoblock, C.A.: Creating relational data from unstructured and ungrammatical data sources (2008)
  8. Mukherjee, Saikat; Ramakrishnan, I.V.: Automated semantic analysis of schematic data (2008)
  9. Wong, Tak-Lam; Lam, Wai: Learning to extract and summarize hot item features from multiple auction web sites (2008)
  10. Zhu, Jun; Nie, Zaiqing; Zhang, Bo; Wen, Ji-Rong: Dynamic hierarchical Markov random fields for integrated web data extraction (2008)
  11. Barbançon, Francois; Miranker, Daniel P.: SPHINX: Schema integration by example (2007)
  12. Zhai, Yanhong; Liu, Bing: Extracting Web data using instance-based learning (2007)
  13. Deng, Xu-Bin; Zhu, Yang-Yong: L-Tree match: a new data extraction model and algorithm for huge text stream with noises (2005)
  14. Li, Zhao; Ng, Wee Keong; Sun, Aixin: Web data extraction based on structural similarity (2005)
  15. Tijerino, Yuri A.; Embley, David W.; Lonsdale, Deryle W.; Ding, Yihong; Nagy, George: Towards ontology generation from tables (2005)
  16. Crescenzi, Valter; Mecca, Giansalvatore: Automatic information extraction from large websites (2004)
  17. Klusch, Matthias; Bergamaschi, Sonia; Petta, Paolo: European research and development of intelligent information agents: The AgentLink perspective (2003)
  18. Neiling, Mattis; Schaal, Markus; Schumann, Martin: WrapIt: Automated integration of web databases with extensional overlaps (2003)
  19. Agarwal, P. K.; Bhattacharya, B. K.; Sen, S.: Improved algorithms for uniform partitions of points (2002)
  20. Crescenzi, Valter; Mecca, Giansalvatore; Merialdo, Paolo: Automatic web information extraction in the ROADRUNNER system (2002)

1 2 next

Further publications can be found at: