DBLEARN

A heuristic for evaluating databases for knowledge discovery with DBLEARN We propose a heuristic method for choosing databases for attempting knowledge discovery. The DBLEARN knowledge-discovery program uses an attribute-oriented inductive-inference method to discover potentially significant relations in a database. A concept forest defines the possible generalizations that DBLEARN can make for a database. The concept forest consists of trees, each of which represents a concept hierarchy for one attribute. We propose that the potential for discovery in a database be estimated by examining the complexity of its concept forest. One measure which has proven useful is the based on the depths and heights of all the interior nodes in their trees. Higher values for this measure indicate more complex concepts forests, and thus, we believe, more potential for discovery. Given several databases and their concept forests, we rank them according to a heuristic measure, and recommend that DBLEARN be applied to those with the highest values.