Introduction
Not so long ago, enterprise leadership relied solely on experience and personal judgment in making critical business decisions. Once automated reasoning systems were introduced to support decision-making, these systems relied on rules handcrafted by those leaders. This made both interpreting and modifying their behavior easy tasks. The problem: The systems were not scalable.
Machine learning models arrived to address the need to make modifications. They did not require humans to spell out the rules, and instead could train from data — the more, the better. Yet these models were not as easy to interpret or modify as their rules-based predecessors. Finally, sometime around 2009, deep learning models arrived on the commercial scene, bringing greatly advanced modeling capabilities — at the cost of even greater loss of clarity and flexibility.
Today, artificial intelligence based on deep learning permeates every field of activity, touching and shaping our everyday lives — so much so that the current era is often referred to as the Age of AI. In 2017, a PWC survey estimated the potential contribution of AI to the global economy to be valued at around $15.7 trillion by 20301. Although unsurpassed in their modeling capacity and scope of applicability, deep learning models are mysterious “black boxes,” for the most part, which raises disturbing questions regarding their veracity, trustworthiness and biases, particularly in the context of their widespread use.
There is an urgent need to introduce interpretability into the very fabric of AI modeling. Although the topics surrounding this need are related to a broader subject — the ethics of AI — the focus here is specifically on that of explainability. The following is an overview of what’s known as explainable AI, including the driving needs, approaches and technologies involved, as well as the approach we are adopting for our customer solutions.
Why black-box models result from deep learning
Deep learning (formerly known as neural networks) originates from the connectionist approach to AI, where models comprise layers of nodes that use inner products and nonlinear functions to perform basic operations. Inside a neural network, there are no separate or discernible logical entities but rather an indistinguishable mass of numerical values. Users feed inputs into the neural network in the form of vectors, or numbers derived from original data sources and listed in fixed sizes*; the outputs, too, are vectors. So there is no direct way to trace the reasoning implicitly used by the neural network to reach its conclusions. Despite early attempts to extract rules from neural networks2, networks have become too large (commonly millions of nodes) and diverse for today’s users to pursue tractable rule extraction. The term “black box” comes from this lack of visibility into the internal workings of such systems.
The black-box nature of neural networks hindered their adoption throughout the 1990s. Only when they started outperforming conventional classification and regression models by wide margins, around 2006, were they adopted by industry. Even today, a neural network model that merely matches its conventional counterpart (or even outperforms it by only a narrow margin) is unlikely to be deployed.
Notions of explainability
Although general consensus has industry seeking a broad context in understanding how an AI model comes to its conclusions, there are different flavors of explainability, and specific terms have come to be associated with particular interpretations. Some of the most prominent nuances are as follows:
- Intelligibility: An understanding of the working of the AI system, in terms that humans can interpret.
- Explainability: The kind of information available at the man-machine interface that enables informed use of an AI system’s outputs.
- Transparency: Complete interpretability of AI model internals.
- Confidence: The measure of certainty that the model associates with any given decision. Generally, uncertainty comes in two versions: epistemic, where the uncertainty stems from the inherent variability of data, and aleatory, where it originates from the system’s inability to use prior learning for the input data — in other words, because the system does not know.
For our purposes here, explainability and confidence are the prime objectives for our explainable AI models.
Why Explanations Are Essential
AI models have a whole range of stakeholders, from model designers to decision-makers to society as a whole — after all, who isn’t affected by those decisions? Programs for explainable AI system development have been launched worldwide with these various constituencies in mind. One of the most notable is the DARPA Explainable Artificial Intelligence program3. As of 2019, several nations belonging to the European Commission are setting up rules for trustworthiness of AI systems4. To date, these rules are nonbinding, but there is a distinct possibility that such rules might be legally enforced in the near future. To ensure compliance with criteria imposed by regulators, AI solution providers must monitor requirements regarding explainability and trustworthiness as such requirements materialize.
AI model designers have to understand its scope of operation, underlying issues and foresee areas of malfunction
Explanations for system designers
For designers of AI models, understanding the models’ scope of operation, underlying issues and ways they might malfunction is crucial. AI systems must at no point flout the domain principles of the problem at hand or any other hard constraints imposed by the context, including safety and security. For AI incorporated into autonomous vehicles, for example, the impact of such malfunction directly translates into road accidents. Likewise, criminals can use imperfections in face recognition technologies to fool identification systems, resulting in confusion among law enforcement agencies and harassment of ordinary people. The list goes on and on, so before such technology can be deployed, system designers simply need to identify what AI systems internalize. In addition to exhaustive system testing, explanations are essential.
Explanations for decision-makers
In the corporate world, the impact of decisions prompted by an AI system can be far-reaching. Unearthing the factors that might have contributed to the decision and determining their relevance is critical. One decent illustration is the parable5 of military AI software that was supposed to be programmed to recognize a camouflaged tank from photographs. Instead, the final version of the software came back with information about whether the photograph was taken on a sunny day or a cloudy one. Tracing contributing factors for older logic-driven AI systems was simpler, but with recent AI based on deep learning, the black-box nature of the model does not provide any direct way to trace decisions back through the process.
Another aspect of explainability many decision-makers are interested in is the impact of changing certain input factors for any given situation. This is particularly relevant where the decision-maker must not only understand the outcome under the current scenario but also consider the scope of intervention. What is the use of predicting whether a customer will churn if your business has no insight into how to avert the situation?
Moving into another area of explainability, the question of confidence in the output of the AI system is often fundamental to decision-makers. AI systems interpolate solutions based on data presented to them during training and thus might fail miserably when operating on unfamiliar kinds of data inputs. A neural network that has been trained only to distinguish between dogs and cats will try to pin these labels when presented with images of humans and chairs. In such cases, it would be convenient for the model to say “I don’t know” rather than provide a “solution” based on one of the known alternatives. An ideal AI model would include certainty score or degree of confidence associated with each output, from both the epistemic and aleatory uncertainty perspectives.
Explanations for end users
In the majority of cases where an AI-based solution directly interacts with end users, explanations focus on trustworthiness, fairness and impartiality. The first of these shares aspects with explanations for decision-makers. Users of an AI-based health coach, for example, want to know which factors the model takes into account and the degree of confidence the model has in prescribing the solution, because their personal health is at stake. Trust is an extremely subjective notion, however, and many users demand complete transparency before they place their faith in an AI system. We aren’t there yet, unfortunately; such transparency is unachievable in the short term. But exhaustive and transparent testing is certainly a viable alternative, and business consulting and IT services operationally adopt this stance with reasonable success.
It is essential for AI implementations to be fair, trustworthy and impartial
Fairness is an aspect closely tied to trustworthiness and has more to do with whether the AI system has been designed with the user’s interests in mind than the transparency of the technology itself. An AI-based evaluation of personal loans, for instance, might contemplate the profitability of the lender rather than the creditworthiness of the applicant. What’s more, explanations and responses provided to this end might be, by design, falsified by the AI system6.
The requirement of fairness has become central to AI implementations. The objective study of fairness involves translating our subjective notions into statistical measures that can be applied to datasets on which AI models are trained7. These measures can then be used to evaluate datasets for fairness8, following which resampling and calibration methods can be employed as corrective measures9.
Some part of fairness is tied to the notion of impartiality, and for AI systems to be impartial requires that biases10 be removed from the datasets used to train them. Thus, the focus here turns to the data rather than the AI models at hand. The past few years have seen big strides in the formalization of bias identification and the removal of partiality from data. These days, various methods and tools systematically analyze datasets for bias — and employ corrective measures wherever required.
A Practical Approach to Explainable AI
As business and IT organizations devise AI solutions, the general principle should be to keep “humans in the loop” at their core. This ensures responsibility for decisions lies with a human decision-maker, but also bakes in scope for scrutiny of the AI system’s recommendations.
From our work with clients and at our own company Infosys, we recommend formal requirements for AI explainability — in plain language.
The following table summarizes the primary explainability requirements for AI solutions at Infosys, as well as the approaches the company has adopted to realize them:
AI explainability requirement | Approach to address the requirement |
---|---|
Explainability for model developers | Visualization of model internals and testing that verifies operation at the model boundary and beyond |
Grounds for automated decision | Computation of the relative importance of factors, often portrayed through vertical charts or highlighting of factors in the source stream |
Impact of modifying input factors | Combination of sensitivity analysis and forward model operation |
Confidence associated with decisions | Computation of uncertainty associated with a given decision |
Fairness and impartiality | Analysis and modification of the dataset used for model training using tools for bias evaluation |
Effective AI solutions keep humans at the forefront of decision-making
Why do these AI solutions aim for explainability and confidence assessment, rather than for complete transparency of AI models? Looking at the brisk pace of technology advancement, the unsolved problem of transparency might not persist for long. But what if complete transparency for AI solutions were not desirable after all? Protecting the intellectual property associated with such a solution will be difficult, if not impossible. Even more important, transparency will open up AI-based applications to endless sequences of adversarial attacks11, eventually nullifying the benefits of this technology.
General Methods for Explainability
Broadly speaking, there are two primary categories of technological methods that can address explainability in machine learning models. Which method to use depends on the nature of the model. The first category applies to the so-called transparent models — linear regression, logistic regression, decision trees, random forests,* etc. — where the model’s structure allows for ease of computing the relative importance of various factors, as well as uncertainty both at the model level and for individual pieces of data.
With black-box machine learning models, post hoc explainability is primary. In other words, the explanations are derived from the nature and properties of the outputs generated by the model. One significant explainability mechanism in this category is LIME (Local Interpretable Model-Agnostic Explanations)12, which explains by building locally linear models around the predictions of an opaque model and can be used generically across model types. Another is SHAP (SHapley Additive exPlanations)13, which reveals the relative contribution of input factors by using a mechanism of additive feature attributions. It employs reward-sharing among cooperative participants — a familiar game theory approach — to incorporate inputs from multiple explanatory mechanisms.
Explainability of Deep Learning Models
Deep learning models can implement several methodologies to compute feature relevance, each with its respective advantages and disadvantages in terms of applicability, accuracy and computational requirements. None has yet emerged as the clear winner, but the top-shelf alternatives are as follows:
- DeepLIFT14, an approach for computing importance scores in a multilayer neural network, by comparing activation of neurons to reference activations
- Layer-wise Relevance Propagation15, a technique that relies on a Taylor series approximation close to the prediction poinA Practical Approach to Explainable AIt for local sensitivity modeling as well as relevance back-propagation
- Integrated Gradients16, a technique that uses axioms on sensitivity and implementation invariance to derive a robust mechanism for feature relevance computation
- RETAIN17, which interprets specially designed recursive neural networks used to model sequences of multifactor events, based on computing an “attention” component at both a step level and a factor level
The problem of computing feature relevance, however, is far from being resolved. The area is rife with activity: older techniques are patched to compensate for their shortcomings, and newer techniques are researched and reported every so often. Most cloud-based AI toolkits have begun to bundle one or more of the algorithms described above as an integral part of their offerings.
Similarly, deep learning models offer a host of techniques for assessing uncertainty associated with computed output, with Bayesian model-building as the common underlying thread. In Bayesian modeling, every model parameter is treated as a random variable with a probability distribution. This allows programmers to estimate the variances among the model’s output values and use these as a direct measure of uncertainty. Bayesian model training requires more parameters and programming modifications, as well as higher training complexity, so current techniques attempt indirect mechanisms to achieve the same end18.
Incorporating Explainable AI Into Solutions
Despite the common perceptions in the media, AI solutions do not stand apart as a distinctive technology offering. Rather, the majority fall on the continuum of data science driven by big data, which is dominated by models for regression and classification. A traditional, transparent data science model based on structured data as a solution typically provides a target response as well as the following:
- A global assessment of feature importance for the overall model
- A local assessment of feature importance for the particular classification/regression instance
- An assessment on uncertainty derived from the output score
AI allows us to use unstructured data inputs in the form of text, images, video and so forth, whereas deep learning models provide inputs for decision support based on, for example, classification of reports on issues with a company’s particular product. The AI model offers, in the above context, the following benefits to a human agent:
- The recommendation based on its classification of the problem/report
- A certainty measure that reflects the system’s confidence in the particular classification
- Color-coded highlighting of words in the report that reflect textual evidence for various solution alternatives
Conclusion
This article charts the general landscape of explainable AI and reveals the direction and scope of explainability that Infosys incorporates into our AI solutions. In our opinion, the focus belongs specifically on an assessment of explainability and confidence. These aspects of explainability are certain to be retained as a necessary part of every AI solution that business IT consultants provide to customers, becoming more refined as the technology advances.
References
- “Sizing the prize: What’s the real value of AI for your business and how can you capitalise?” PwC, 2017.
- Fu, LiMin. “Knowledge discovery based on neural networks.” Communications of the ACM 42.11 (1999): 47-50.
- “Broad Agency Announcement: Explainable Artificial Intelligence (XAI).” DARPA-BAA-16-53, August 10, 2016.
- “Ethics Guidelines for Trustworthy AI.” Independent High-Level Expert Group on Artificial Intelligence, European Commission, April 2019.
- Yudkowsky, Eliezer. “Artificial Intelligence as a Positive and Negative Factor in Global Risk.” In Global Catastrophic Risks, eds. Nick Bostrom and Milan M. Ćirković, 308–345. New York: Oxford University Press, 2008, 308–345.
- Le Merrer, Erwan, and Gilles Trédan. The Bouncer Problem: Challenges to Remote Explainability. arXiv preprint. arXiv:1910.01432 (2019).
- Verma, Sahil, and Julia Rubin. “Fairness definitions explained.” IEEE/ACM International Workshop on Software Fairness (FairWare). IEEE, 2018.
- Žliobaitė, Indrė. “Measuring discrimination in algorithmic decision making.” Data Mining and Knowledge Discovery 31.4 (2017): 1060-1089.
- Bellamy, Rachel K.E., et al. “AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.” arXiv preprint. arXiv:1810.01943 (2018).
- Barocas, Solon, and Andrew D. Selbst. “Big Data’s Disparate Impact.” Calif. L. Rev. 104 (2016): 671
- Yuan, Xiaoyong, et al. “Adversarial Examples: Attacks and Defenses for Deep Learning.” IEEE Transactions on Neural Networks and Learning Systems 30.9 (2019): 2805-2824.
- Ribeiro, Marco T., Sameer Singh, and Carlos Guestrin, “Why Should I Trust You?: Explaining the Predictions of Any Classifier.” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, 1135–1144.
- Lundberg, Scott M., and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” 31st Conference on Neural Information Processing Systems, Long Beach, California, 2017.
- Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. “Learning Important Features Through Propagating Activation Differences.” Proceedings of the 34th International Conference on Machine Learning, Volume 70. JMLR.org, 2017.
- Bach, Sebastian, et al. “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.” PLOS ONE 10.7 (2015).
- Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. “Axiomatic Attribution for Deep Networks.” Proceedings of the 34th International Conference on Machine Learning, Volume 70. JMLR.org, 2017.
- Choi, Edward, et al. “RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism.” 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.
- Gal, Yarin, and Zoubin Ghahramani. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” International Conference on Machine Learning, New York, 2016.