A Survey of Probabilistic Models for Relational Data

Koutsourelakis, P S

doi:10.2172/900137

Title: A Survey of Probabilistic Models for Relational Data

Technical Report · Fri Oct 13 00:00:00 EDT 2006

DOI:https://doi.org/10.2172/900137· OSTI ID:900137

Koutsourelakis, P S

Traditional data mining methodologies have focused on ''flat'' data i.e. a collection of identically structured entities, assumed to be independent and identically distributed. However, many real-world datasets are innately relational in that they consist of multi-modal entities and multi-relational links (where each entity- or link-type is characterized by a different set of attributes). Link structure is an important characteristic of a dataset and should not be ignored in modeling efforts, especially when statistical dependencies exist between related entities. These dependencies can in fact significantly improve the accuracy of inference and prediction results, if the relational structure is appropriately leveraged (Figure 1). The need for models that can incorporate relational structure has been accentuated by new technological developments which allow us to easily track, store, and make accessible large amounts of data. Recently, there has been a surge of interest in statistical models for dealing with richly interconnected, heterogeneous data, fueled largely by information mining of web/hypertext data, social networks, bibliographic citation data, epidemiological data and communication networks. Graphical models have a natural formalism for representing complex relational data and for predicting the underlying evolving system in a dynamic framework. The present survey provides an overview of probabilistic methods and techniques that have been developed over the last few years for dealing with relational data. Particular emphasis is paid to approaches pertinent to the research areas of pattern recognition, group discovery, entity/node classification, and anomaly detection. We start with supervised learning tasks, where two basic modeling approaches are discussed--i.e. discriminative and generative. Several discriminative techniques are reviewed and performance results are presented. Generative methods are discussed in a separate survey. A special section is devoted to latent variable models due to their unique characteristics and usefulness in static and dynamic frameworks and in both supervised and unsupervised learning processes. Section 4 contains a brief discussion of unsupervised learning techniques with an emphasis on computational efficiency and large networks. Finally, section 5 discusses performance metrics with an emphasis on classification problems.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 900137

Report Number(s):: UCRL-TR-225637; TRN: US200709%%547

Country of Publication:: United States

Language:: English

Similar Records

Final Technical Report - Applications of Machine Learning Techniques to Geothermal Play Fairway Analysis in the Great Basin Region, Nevada

Technical Report · Sat Feb 10 00:00:00 EST 2024 · OSTI ID:900137

Faulds, James E; Smith, Connor M; Brown, Stephen; +13 more

Unsupervised Group Discovery and LInk Prediction in Relational Datasets: a nonparametric Bayesian approach

Technical Report · Thu May 03 00:00:00 EDT 2007 · OSTI ID:900137

Koutsourelakis, P

Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems

Conference · Mon Feb 06 00:00:00 EST 2017 · OSTI ID:900137

Siegel, Charles M.; Daily, Jeffrey A.; Vishnu, Abhinav

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ACCURACY
CLASSIFICATION
COMMUNICATIONS
DETECTION
EFFICIENCY
FORECASTING
LEARNING
METRICS
MINING
PATTERN RECOGNITION
PERFORMANCE
SIMULATION
STATISTICAL MODELS
SURGES

Title: A Survey of Probabilistic Models for Relational Data

Citation Formats

Similar Records

Related Subjects