Mining entity-identification rules for database integration
- Univ. of Minnesota, Minneapolis, MN (United States)
- Apertus Technologies, Inc., Eden Prairie, MN (United States)
Entity identification (BI) is the identification and integration of all records which represent the same real-world entity, and is an important task in database integration process. When a common identification mechanism for similar records across heterogeneous databases is not readily available, EI is performed by examining the relationships between various attribute values among the records. We propose the use of distances between attribute values as a measure of similarity between the records they represent. Record-matching conditions for EI can then be expressed as constraints on the attribute distances. We show how knowledge discovery techniques can be used to automatically derive these conditions (expressed as decision trees) directly from the data, using a distance-based framework.
- OSTI ID:
- 421296
- Report Number(s):
- CONF-960830-; TRN: 96:005928-0051
- Resource Relation:
- Conference: 2. international conference on knowledge discovery and data mining, Portland, OR (United States), 2-4 Aug 1996; Other Information: PBD: 1996; Related Information: Is Part Of Proceedings of the second international conference on knowledge discovery & data mining; Simoudis, E.; Han, J.; Fayyad, U. [eds.]; PB: 405 p.
- Country of Publication:
- United States
- Language:
- English
Similar Records
A probabilistic NF2 relational algebra for integrated information retrieval and database systems
GENOME ENABLED MODIFICATION OF POPLAR ROOT DEVELOPMENT FOR INCREASED CARBON SEQUESTRATION