Unpacking Black-Box Models: ExSum Mathematical Framework To Evaluate Explanations of Machine-Learning Models
MIT researchers create a mathematical framework to evaluate explanations of machine-learning models and quantify how well people understand them.
Modern machine-learning models, such as neural networks, are often referred to as “black boxes” because they are so complex that even the people who design them can’t fully understand how they make predictions.
To provide some insights, scientists employ explanation methods that seek to describe individual model decisions. They may, for example, highlight words in a movie review that affected the model’s judgment that the review was favorable.
But these explanation methods don’t do any good if humans can’t easily understand them, and it can be even worse when people misunderstand them. So, MIT researchers created a mathematical framework to formally quantify and evaluate the understandability of explanations for machine-learning models. This can help pinpoint insights about model behavior that might be missed if the researcher is only evaluating a handful of individual explanations to try to understand the entire model.
“With this framework, we can have a very clear picture of not only what we know about the model from these local explanations, but more importantly what we don’t know about it,” says Yilun Zhou, an electrical engineering and computer science graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of a paper presenting this framework.
Zhou’s co-authors include Marco Tulio Ribeiro, a senior researcher at Microsoft Research, and senior author Julie Shah, a professor of aeronautics and astronautics and the director of the Interactive Robotics Group in CSAIL. The research will be presented at the Conference of the North American Chapter of the Association for Computational Linguistics.
Your Comment :