CrossMine: Efficient classification across multiple database relations Most of today’s structured data is stored in relational data- bases. Such a database consists of multiple relations that are linked together conceptually via entity-relationship links in the design of relational database schemas. Multi-relational classification can be widely used in many disciplines including financial decision making and medical research. However, most classification approaches only work on single “flat” data relations. It is usually difficult to convert multiple relations into a single flat relation without either introducing huge “universal relation” or losing essential information. Previous works using Inductive Logic Programming approaches (recently also known as Relational Mining) have proven effective with high accuracy in multi-relational classification. Unfortunately, they fail to achieve high scalability w.r.t. the number of relations in databases because they repeatedly join different relations to search for good literals. In this paper we propose CrossMine, an efficient and scalable approach for multi-relational classification. CrossMine employs tuple ID propagation, a novel method for virtually joining relations, which enables flexible and efficient search among multiple relations. CrossMine also uses aggregated information to provide essential statistics for classification. A selective sampling method is used to achieve high scalability w.r.t. the number of tuples in the databases. Our comprehensive experiments on both real and synthetic databases demonstrate the high scalability and accuracy of CrossMine.
Keywords for this software
References in zbMATH (referenced in 12 articles , 1 standard article )
Showing results 1 to 12 of 12.
- Schulte, Oliver; Qian, Zhensong; Kirkpatrick, Arthur E.; Yin, Xiaoqian; Sun, Yan: Fast learning of relational dependency networks (2016)
- Schulte, Oliver; Khosravi, Hassan; Kirkpatrick, Arthur E.; Gao, Tianxiang; Zhu, Yuke: Modelling relational statistics with Bayes nets (2014)
- Jiménez, Aída; Berzal, Fernando; Cubero, Juan-Carlos: Using trees to mine multirelational databases (2012)
- Rossi, Ryan A.; McDowell, Luke K.; Aha, David W.; Neville, Jennifer: Transforming graph data for statistical relational learning (2012)
- Schulte, Oliver; Khosravi, Hassan: Learning graphical models for relational data via lattice search (2012)
- Schulte, Oliver; Khosravi, Hassan; Man, Tong: Learning directed relational models with recursive dependencies (2012)
- Jiménez, Aída; Berzal, Fernando; Cubero, Juan-Carlos: POTMiner: mining ordered, unordered, and partially-ordered trees (2010)
- Guo, Hongyu; Viktor, Herna L.: Multirelational classification: a multiple view approach (2008)
- Mcclean, Sally; Scotney, Bryan; Morrow, Philip; Greer, Kieran: Integrating semantically heterogeneous aggregate views of distributed databases (2008)
- Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.: Crossclus: user-guided multi-relational clustering (2007)
- Corrada Bravo, Héctor; Page, David; Ramakrishnan, Raghu; Shavlik, Jude; Santos Costa, Vitor: A framework for set-oriented computation in inductive logic programming and its application in generalizing inverse entailment (2005)
- Yin, Xiaoxin; Han, Jiawei; Yang, Jiong; Yu, Philip S.: CrossMine: Efficient classification across multiple database relations (2005)