ROADRUNNER

Automatic web information extraction in the ROADRUNNER system Road Runner is a combined project of the Database Group of Università di Roma Tre and of the Database Group of Università della Basilicata. The project investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. In fact, many Web-based applications today use wrappers to extract data from HTML pages. These wrappers, however, are usually coded by hand, and therefore their generation and maintenance are difficult and labor intensive. To automate the wrapper generation and the data extraction process, the Road Runner project aims at developing original techniques to automatically generate wrappers. A wrapper generation system has been implemented in a working prototype, which has been used to conduct a number of experiments on real-life data-intensive Web sites. These experiments confirm the feasibility of the approach and. The system prototype has been implemented in Java.


References in zbMATH (referenced in 22 articles , 1 standard article )

Showing results 1 to 20 of 22.
Sorted by year (citations)

1 2 next

  1. Gfrerer, Christine; Vajteršic, Marián; Kutil, Rade: Parallel algorithms to align multiple strings in the context of web data extraction (2017)
  2. Han, Wook-Shin; Kwak, Wooseong; Yu, Hwanjo; Lee, Jeong-Hoon; Kim, Min-Soo: Leveraging spatial join for robust tuple extraction from web pages (2014) ioport
  3. Fazzinga, Bettina; Flesca, Sergio; Tagarelli, Andrea: Schema-based Web wrapping (2011) ioport
  4. Liu, Wei; Yan, Hualiang; Xiao, Jianguo: Automatically extracting user reviews from forum sites (2011) ioport
  5. Nachouki, Gilles; Quafafou, Mohamed: MashUp web data sources and services based on semantic queries (2011) ioport
  6. Álvarez, Manuel; Pan, Alberto; Raposo, Juan; Bellas, Fernando; Cacheda, Fidel: Finding and extracting data records from web pages (2010) ioport
  7. Li, Qing; Chen, Jing; Wu, Yipu: Algorithm for extracting loosely structured data records through digging strict patterns (2009) ioport
  8. Michelson, M.; Knoblock, C. A.: Creating relational data from unstructured and ungrammatical data sources (2008)
  9. Mukherjee, Saikat; Ramakrishnan, I. V.: Automated semantic analysis of schematic data (2008) ioport
  10. Wong, Tak-Lam; Lam, Wai: Learning to extract and summarize hot item features from multiple auction web sites (2008) ioport
  11. Zhu, Jun; Nie, Zaiqing; Zhang, Bo; Wen, Ji-Rong: Dynamic hierarchical Markov random fields for integrated web data extraction (2008)
  12. Barbançon, Francois; Miranker, Daniel P.: SPHINX: Schema integration by example (2007) ioport
  13. Zhai, Yanhong; Liu, Bing: Extracting Web data using instance-based learning (2007) ioport
  14. Deng, Xu-Bin; Zhu, Yang-Yong: L-Tree match: a new data extraction model and algorithm for huge text stream with noises (2005) ioport
  15. Li, Zhao; Ng, Wee Keong; Sun, Aixin: Web data extraction based on structural similarity (2005) ioport
  16. Tijerino, Yuri A.; Embley, David W.; Lonsdale, Deryle W.; Ding, Yihong; Nagy, George: Towards ontology generation from tables (2005) ioport
  17. Crescenzi, Valter; Mecca, Giansalvatore: Automatic information extraction from large websites (2004)
  18. Klusch, Matthias; Bergamaschi, Sonia; Petta, Paolo: European research and development of intelligent information agents: The AgentLink perspective (2003)
  19. Neiling, Mattis; Schaal, Markus; Schumann, Martin: WrapIt: Automated integration of web databases with extensional overlaps (2003)
  20. Agarwal, P. K.; Bhattacharya, B. K.; Sen, S.: Improved algorithms for uniform partitions of points (2002)

1 2 next


Further publications can be found at: http://www.dia.uniroma3.it/db/roadRunner/publications.html