skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations

Abstract

To realize the full potential of machine learning in diverse real- world domains, it is necessary for model predictions to be readily interpretable and actionable for the human in the loop. Analysts, who are the users but not the developers of machine learning models, often do not trust a model because of the lack of transparency in associating predictions with the underlying data space. To address this problem, we propose Rivelo, a visual analytic interface that enables analysts to understand the causes behind predictions of binary classifiers by interactively exploring a set of instance-level explanations. These explanations are model-agnostic, treating a model as a black box, and they help analysts in interactively probing the high-dimensional binary data space for detecting features relevant to predictions. We demonstrate the utility of the interface with a case study analyzing a random forest model on the sentiment of Yelp reviews about doctors.

Authors:
; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1358512
Report Number(s):
PNNL-SA-124676
DOE Contract Number:
AC05-76RL01830
Resource Type:
Conference
Resource Relation:
Conference: Proceedings of the 2nd Workshop on Human-in-the-Loop Data Analytics (HILDA 2017) May 14-19, 2017, Chicago, Illinois, Article No. 6
Country of Publication:
United States
Language:
English

Citation Formats

Tamagnini, Paolo, Krause, Josua W., Dasgupta, Aritra, and Bertini, Enrico. Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations. United States: N. p., 2017. Web. doi:10.1145/3077257.3077260.
Tamagnini, Paolo, Krause, Josua W., Dasgupta, Aritra, & Bertini, Enrico. Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations. United States. doi:10.1145/3077257.3077260.
Tamagnini, Paolo, Krause, Josua W., Dasgupta, Aritra, and Bertini, Enrico. 2017. "Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations". United States. doi:10.1145/3077257.3077260.
@article{osti_1358512,
title = {Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations},
author = {Tamagnini, Paolo and Krause, Josua W. and Dasgupta, Aritra and Bertini, Enrico},
abstractNote = {To realize the full potential of machine learning in diverse real- world domains, it is necessary for model predictions to be readily interpretable and actionable for the human in the loop. Analysts, who are the users but not the developers of machine learning models, often do not trust a model because of the lack of transparency in associating predictions with the underlying data space. To address this problem, we propose Rivelo, a visual analytic interface that enables analysts to understand the causes behind predictions of binary classifiers by interactively exploring a set of instance-level explanations. These explanations are model-agnostic, treating a model as a black box, and they help analysts in interactively probing the high-dimensional binary data space for detecting features relevant to predictions. We demonstrate the utility of the interface with a case study analyzing a random forest model on the sentiment of Yelp reviews about doctors.},
doi = {10.1145/3077257.3077260},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month = 5
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approaches we singlemore » out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchmark problems from the U. C. Irvine repository, and compared them against C4.5, naive Bayes, and wrapper-based feature selection methods.« less
  • A graphical method of interpreting both the input and output of a statistical line design program is described. The automatic generation of diagrams and graphs illustrating the input data reduces the chance for input errors and provides a convenient summary. The output graphical comparisons prove to be a practical and economical approach to interpret the results of statistical line design programs. The user may easily identify weak links in the design and explore alternative solutions. The unique abilities of the HYSIM method permit the user maximum flexibility in modifying or redesigning the type of output graphics needed. The graphic displaysmore » are a valuable tool in the practical design of modern transmission lines.« less
  • Modern mass spectrometers reliably measure very small samples, (i.e., 1 {mu}mol of CO{sub 2} and < 100 nanograms of Sr). It is now possible to analyze Sr, O, and C isotope ratios and Ca, Mg, Fe, Mn and Sr concentrations using a total sample of 1.0 mg. Such analyses can provide fine-scale geochemical trends that would be obscured by bulk-rock analyses. As a case study, diagenetic phases from a Late Devonian pinnacle reef from the Alberta basin have been microanalyzed to quantify water-rock interaction (marine cements, caliche, pendant, vadose cements, and sparry calcite cements). Unaltered marine cements have the lowestmore » {sup 87}Sr/{sup 86}Sr ratios and the highest {delta}{sup 18}O values of all carbonate phases (0.70805 {plus minus} 2, {minus}4.8{per thousand} {plus minus}0.5) as well as the highest concentrations of (Mg) and (Sr) (approximately 8,500 and 350 ppm, respectively). These are assumed to represent average, initial rock compositions. In {sup 87}Sr/{sup 86}Sr-{delta}{sup 18}O crossplots, altered marine cement data diverge from this value toward higher {sup 87}Sr/{sup 86}Sr ratios and lower {delta}{sup 18}O values (to 0.70842 and {minus}8.0 {per thousand}). Vadose cements, caliche, and meteoric phreatic spars have {sup 87}Sr/{sup 86}Sr ratios and {delta}{sup 18}O values identical to altered marine cement data, with some spars having the same compositions as unaltered marine cements. These trends can be explained by two-component mixing models that use low W/R ratios, reasonable Sr concentrations, and distribution coefficients (0.05). However, a source of radiogenic strontium for diagenetic fluids is required to change initial {sup 87}Sr/{sup 86}Sr ratios to observed values.« less
  • Remote sensors like radar wind profilers equipped with Radio Acoustic Sounding Systems (RASS) are likely candidates for collecting the upper air meteorological data required for the PAMS network. Upper air winds and temperatures collected for PAMS will be used to analyze and model meteorological processes that accompany periods of high ozone concentrations; to initialize and evaluate the performance of air quality models; and to support analyses of emission control strategies. Profilers offer several advantages for collecting continuously and unmanned, providing improved temporal resolution at lower cost; data are available in near-real time, simplifying quality control (QC) activities; and profilers measuremore » vertical velocity (w), which is an important parameter for diagnosing and accurately modeling many meteorological processes. Wind profilers measure wind speed, wind direction, and vertical velocity from approximately 100 m agl to altitudes as high as 3--5 km with a vertical resolutions of 60--100 m; RASS measures temperature to altitudes of 1--2 km with the same vertical resolution. Profilers also produce lower-level information that is proving extremely useful for identifying and analyzing key atmospheric processes and features that accompany periods of poor air quality, such as mixing depth and turbulence information. Using a number of examples of the types of data provided by profilers, the authors describe uses of profiler data in recent air quality studies and discuss issues related to data management, quality control, and data interpretation.« less