TINTIN

TINTIN: a system for retrieval in text tables. Tables form an important kind of data element in text retrieval. Often, the gist of an entire news article or other exposition can be concisely captured in tabular form. In this paper, we examine the utility of exploiting information other than the key words in a digital document to provide the users with more flexible and powerful query capabilities. More specifically, we exploit the structural information in a document to identify tables and their component fields and let the users query based on these fields. Our empirical results have demonstrated that heuristic method based table extraction and component tagging can be performed effectively and efficiently. Moreover, our experiments in retrieval using the TINTIN system have strongly indicated that such structural decomposition can facilitate better representation of user’s information needs and hence more effective retrieval of tables


References in zbMATH (referenced in 5 articles )

Showing results 1 to 5 of 5.
Sorted by year (citations)

  1. Zhang, Xi-Wen; Lyu, Michael R.; Dai, Guo-Zhong: Extraction and segmentation of tables from Chinese ink documents based on a matrix model (2007)
  2. Embley, David W.; Hurst, Matthew; Lopresti, Daniel; Nagy, George: Table-processing paradigms: a research survey (2006) ioport
  3. Wei, Xing; Croft, Bruce; McCallum, Andrew: Table extraction for answer retrieval (2006) ioport
  4. Shi, Z.; Milios, E.; Zincir-Heywood, N.: Learning stochastic regular grammars by means of a state merging method (2005) ioport
  5. Shi, Z.; Milios, E.; Zincir-Heywood, N.: Post-supervised template induction for information extraction from lists and tables in dynamic Web sources (2005) ioport