Interpreting AI Is More Than Black and White
Updated: Jul 8, 2019
Article by forbes.com
In the world of software development, there are well-defined testing paradigms and use-cases. Many teams will develop a suite of tests where the internals of the software programs are unknown, to assert the data output from a software system is as expected given the data fed in -- i.e., black-box testing. Conversely, in white-box tests the program structure, design, and implementation being tested are known.
In the world of artificial intelligence & machine learning (AI & ML), black- and white-box categorization of models and algorithms refers to their interpretability. That is, given a model trained to map data inputs to outputs (e.g. emails classified as spam or not), if the mechanisms underlying predictions cannot be looked at or understood, that model is black-box. And just as the software testing dichotomy is high-level behavior vs low-level logic, only white-box AI methods can be readily interpreted to see the logic behind models' predictions.
In recent years with machine learning taking over new industries and applications, where the number of users far outnumber experts that grok the models and algorithms, the conversation around interpretability has become an important one. Exponentially so with the rise of deep learning, a class of machine learning methods known for their powerful performance yet elusive, black-box nature. This is a conversation with implications up and down the stack, from AI scientists to CEOs.
What is interpretable AI?
The definition of interpretable AI isn't exactly black and white. To have a productive conversation it's essential to be clear what model interpretability means to different stakeholders:
The ability to explain a model’s behavior, answering to an ML engineer, "why did the model predict that?" For example, the prior on variable alpha must not be Gaussian, as we can see in the misaligned posterior predictive check .The ability to translate a model to business objectives, answering in natural language, "why did the model predict that?" For example, the predicted spike in insulin levels are correlated to the recent prolonged inactivity picked up from the fitness watch.
Both definitions are clearly useful. The low-level notion of interpretability lends itself to the engineer's ability to develop and debug models and algorithms. High-level transparency and explainability is just as necessary, for humans to understand and trust predictions in areas like financial markets and medicine [4,5].
No matter the definition, developing an AI system to be interpretable is typically challenging and ambiguous. It is often the case that a model or algorithm is too complex to understand or describe because its purpose is to model a complex hypothesis or navigate a high-dimensional space, a catch-22. Not to mention what is interpretable in one application may be useless in another.
Peering inside the box
Consider the visualizations below of deep Gaussian Processes, a probabilistic variety of deep neural networks . The intent of this experiment by Duvenaud et al. is to elucidate a subtle architectural flaw in neural networks as more and more layers are added; the specific insights are beyond the scope of this article, see . Even with the paper's mathematical details of the model architecture and description of the subsequent adverse effects, the "pathology" is unintuitive to understand let alone debug and fix. The visualizations are quite effective in communicating modeling insights that would otherwise go unnoticed.
A figure from "Avoiding pathologies in very deep networks" by Duvenaud et al. showing function warpings through deep Gaussian Process models , where the networks are successively deeper with 1 to 4 to 6 layers. This visualization is useful in illustrating that as more layers are added, the density concentrates along one-dimensional filaments and 0-dimensional manifolds -- i.e. the representational capacity captures fewer and fewer degrees of freedom, a severe limitation in the model.
Post-hoc interpretation methods can provide insights into otherwise black-box AI systems. Nonetheless, the underlying models and algorithms may still be unexplainable. What then is a white-box model? The counterpart to post-hoc is model-based interpretability, where the model itself readily provides insights into the relationships and structures it learns from data.
For example, a Gaussian Process is a flexible Bayesian model that enables feature engineering and incorporating domain expertise, and the predictions are intuitive to trace back to underlying logic -- all properties of white-box ML. A Gaussian Process model can be powerful for many real-world machine learning tasks, from financial markets to autonomous robotics [8, 10]. Sometimes stacking Gaussian Processes into a complex network results in a significant performance boost, but also shifts the resulting model towards the black-box end of the spectrum. Developing an interpretable machine learning system can be a practice in building models for descriptive accuracy, but at the cost of predictive accuracy.
Example outputs from a state-of-art deep learning algorithm that can detect pneumonia from chest X-rays: CheXNet  localizes pathologies it identifies using Class Activation Maps (Zhou et al. '16), which highlight regions that are most important for making a particular pathology classification.
Do visualization methods make black-box models interpretable? Not quite. Even with advanced interpretation techniques such as Google's "Inceptionism" , deep neural networks remain prohibitively complex to understand: the explanations underlying predictions (or the "why") is unknown. Consider the remarkable sensitivity of deep neural networks to adversarial attacks, where slightly perturbed inputs (typically imperceptible to humans, such as a piece of duct tape on a stop sign) can completely throw off predictions , raising questions around what the models are actually learning inside that black-box.
The real issues with AI interpretability
Even with improved methods and algorithms for explaining AI models and predictions, two core issues must first be addressed in order to make legitimate progress towards interpretable, transparent AI: underspecification and misalignment.
The notion of model or algorithmic interpretability is underspecified -- that is, the AI field is without precise metrics of interpretability. How can one argue a given model or algorithm is more interpretable than another, or benchmark improvements in explainability? One method could provide beautifully detailed visualizations, while the other provides coherent natural language rationale behind each prediction. It can be apples-and-oranges to compare models on account of their interpretability.
For example, a heuristic for model-based interpretability is "simulatability": whether or not a human is able to internally simulate and reason about the model's entire decision-making process . Yet there is no way to consistently quantify this heuristic, or any of the potentially dozens of interpretability measures. The development of interpretable AI is on a random walk without metrics to guide and measure progress.
Yet more confounding is having to choose between model performance and interpretability, particularly in medical applications where transparency and trust are critical . Herein lies the second core issue: misalignment. Perhaps when faced with a performance vs interpretation tradeoff, solutions can be found with interpretation interfaces and modules: use powerful black-box models, and layer-on an interpretation module for post-hoc inspection and explainability. This can work well in medical practice, as shown above and others . Yet the misalignment cuts deeper:
We want good models. We also want interpretable models. Thus the human wants something the performance metric does not. Zack Lipton, Asst. Professor, Carnegie Mellon 
In general a machine learning system is built and trained to optimize a specified target objective: classification accuracy in a spam filter or tumor diagnostic, efficiency in route planning or Amazon box packing. Unlike these precise performance metrics, the criteria of safety, trust, and nondiscrimination often cannot be completely quantified. How then can models be trained towards these auxiliary objectives? Case in point, Google's embarrassing episode with a racist image classifier .
Underspecified and misaligned notions of interpretation impede progress towards the rigorous development of understandable, transparent, trusted AI systems.
The path forward
There is strong activity in developing interpretable AI models and algorithms [11,18], largely driven by the implicit motivations of AI scientists and engineers to both advance theoretical foundations and build real-world AI systems. Thought-leaders such as Zack Lipton , Suchi Saria , and Mihaela van der Schaar  are instrumental in moving the field forward.
External forces are also pushing for the development of explainable AI, notably in the arenas it matters most. The Defense Advanced Research Projects Agency (DARPA) is putting $70 million towards a new "Explainable AI" program, with a focus on interpreting the deep learning systems in drones and intelligence-mining . Perhaps more influential is a set of guidelines from the EU towards ethical development and deployment of AI by companies and governments. A main requirement put forth is for transparency, where AI systems must include explanations for their models’ internal logic such that data and decisions could be traced and understood by humans .
One hope is that with more rigorous science [18,16], improved methods for model interpretability can lead to a more fundamental understanding and theory of deep learning. For example, inspecting deep neural networks with counterfactual perturbations to discover which features are most important for resulting predictions , a method which could bear fruit in the development of deep causal inference. Nonetheless deep learning, interpretable or not, is useful across industries and applications, and is not slowing down.