Deep Learning – Deep Forgetting

Where does our initial learning come from?

deep_2022 — Thu, 21 Oct 2021 00:21:04 +0000

When a baby emerges from the womb, the baby’s brain already has all the learning essential for the correct functioning of the body. The neural circuits for breathing, heartbeat, kicking, sleeping, blood circulation, etc., are all ready! The baby can track a moving object, orient towards Mom or Dad’s face, feed, or even has the desire to walk when you hold it up with feet touching a flat surface. Isn’t that interesting when our whole premise is learning based on “DATA”?

Let’s start from the beginning.

We start from a zygote, a cell formed by the fusion of the male and female gamete (gametes are the male and female with only 23 chromosomes, unlike the regular cells with 46 chromosomes). After the fusion, the zygote has a complete set of 46 chromosomes. In case you forgot, genes are segments of DNA, and chromosomes are the cells’ structures that contain the genes – hundreds to thousands of genes. So zygote contains the DNA – the complete DNA that defines us. DNA is a fascinating multidimensional code ever known to humankind. It is not your typical software function. It has the encoded instruction set for complete development.

From one cell, the zygote will soon become multicellular, as each cell will undergo mitosis. The cells divide, and the new cells have identical copies of the DNA as the original cells. The cells rearrange themselves spatially to form three layers that differentiate into different organ systems. The cells multiply and change. They produce cells that become white blood cells, red blood cells, nerve cells, eyes, liver, fingers, toes, hair, heart, skin, etc. A cell becomes a palm, an adjacent cell becomes a finger, and another becomes a nail. Not only does a cell know what to do, but it also knows what neighboring cells should do. The DNA instruction set is controlling this.

But to me, the most fascinating thing is the learning that gets developed in the brain. The cells give rise to all the neurons in the brain stem so that a newborn knows what to do – kicking, grasping, crying, sleeping, rooting, feeding, etc. The correct synaptic connections, the correct synaptic weights, the correct inhibitors, the correct circuits! Nothing is random – a well planned execution of the DNA code.

The code in DNA builds these initial learning/inference models in the brain.

Initial learning is coming from DNA … transfer learning. But the kicking, grasping, crying, sleeping, rooting, feeding are not the only elements of learning that a newborn is born with. There is lot more learning in the brain – the unconscious bias – the things that we will discover about the baby as the baby grows.

Let me ask you this. What is something that you know today but you have never learned?

When we use the term DNA, it may sound that learning is influenced only by the parental genomes, but it is a lot more interesting. Along with the genome, we also have an epigenome.

The epigenome is a series of chemical tags (called epigenetics markers) that promote or repress the expression of genes without altering the genome. These epigenetic modifications are added in response to the physiological or environmental stimuli our cells receive. Diet, stress, habits, learning, exercise are a few examples that can permanently alter the epigenetic profile of an individual. Researchers use the term “imprinted genes” to refer to genes influenced by the epigenetic markers. The DNA is not modified, but these epigenetic modifications influence the instruction code contained in the DNA.

Epigenome is the software layer on top of the DNA code.

The learning in the newborn is not just coming from the DNA i.e. defined by the parental genome – the default learning of a baby is influenced by the epigenetic modifications too. These markers fully influence the zygote development. Our learnings, Karma, habits, environment, and lifestyle influence what our newborns will know by default. A child may have the same habits as the father. A child may know things the mother spent years learning. All these sentences make sense biologically. The genome and epigenome take part in that initial learning. It is not just the kicking, grasping, crying, sleeping, rooting, and feeding; unconscious bias, aptitude, habits, and a lot of learning is transferred from generation to generation.

Are people responsible for the (unconscious) actions based on the learning/behaviors they are born with? That is a debate we can discuss another time.

Now, let me ask you this. What are the epigenetics-specific learnings that you think came from your parents?

Regardless, this is very interesting for AI/ML engineers as brain development from the genome and epigenome offers a cue to storing model, transfer learning and incremental model change mechanisms. Brain is fascinating and to build real intelligence, we need to dig a lot deeper – not just neurons but also genome and epigenetics. Answers are in the nature, we just need to look closer.

There are two kinds of AI: BIG Data or MIN Data

deep_2022 — Thu, 01 Jul 2021 02:47:00 +0000

There are two kinds of people in the world — those who divide the world into groups of two and those who don’t. And, here is, now, my take on the AI. There are two kinds of AIs — those that are built using BIG DATA and others that are NOT.

The BIG DATA AI

In simplistic terms, the BIG Data powered AI models “learn” without being explicitly programed to do so. The learning is basically a process that examines a large set of samples and creates a formula using a large number of parameters many of them seem indistinguishable to the naked eye. It is a time-consuming trial and error process where the parameters are continually adjusted until training data with the same labels consistently yield similar outputs. A classic example of this is using a data set that has 50000 images of digits (0 to 9) and the AI model learns to recognize the digit.

The BIG DATA AI is all about “mathematically recognizing” incredibly subtle patterns within the mountains of data.

Using this “pattern recognition technique” you can now “provide” an answer, a decision or a prediction to an input data that you have never seen before. It is a great tool but is unlike the human brain in basic learning. These algorithms require mountains of data and high CPU/GPU power to train. It has no common sense, conceptual learning, creativity, planning, human-like intuition, imagination, cross-domain thinking, self-awareness, emotions, etc. It has trouble when it comes to extreme edges of data for which it had limited samples.

BIG DATA AI is basically an optimizer based on a large volume of data for a very specific vertical single task.

The MIN DATA AI

In MIN DATA AI, the AI models “learn” with minimal data that is needed to learn. A formula is indeed created however with minimal data. A classic example of this is how some people say they never forget a face even if they met in passing, or, how you recognize a taste even if you have experienced it only once before.

MIN DATA AI is closer to how we learn and do problem-solving. I don’t need to drink 10,000 OLD FASHIONED (cocktail) to tell the drink in my hand is old fashioned (although may be in my case I already have) or to recognize that the old fashioned is from HaberDasher San Jose or NOT (and, yes, HaberDasher is a great place). With this AI, you will be able to make bets even though data says something else. Your learning can be super fast.

Here is how you can start your journey for MIN DATA AI:

Stop looking for BIG DATA. Like any other addiction, it will be difficult to cope with BIG DATA addiction but you have to do it.
Focus more on algorithms.
Innovate how to create BIG DATA from MIN DATA.

In next 3 years, AI will rely less on BIG DATA and more on MIN DATA. This new MIN Data approach of AI will enable a whole new set of use cases that seemed unsuited before.

MIND is perhaps MIN D(ata)

Future of AI making devices, processes and automation intelligent (and not just high confidence level decision making based on BIG Data) is not far.

Deep learning to avoid real time computation

deep_2022 — Sat, 19 Jun 2021 16:16:45 +0000

“The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble. It therefore becomes desirable that approximate practical methods of applying quantum mechanics should be developed, which can lead to an explanation of the main features of complex atomic systems without too much computation” — Paul Dirac, 1929. For example:

Navier–Stokes equations are the fundamental basis of almost all Computational Fluid Dynamics (CFD) problems. These are extremely useful to model the weather, airflow around an airplane wing, ocean currents, water flow in a pipe, and the analysis of pollution. But these cannot be used in real-time due to the time they need for computation.
Fully describing an arbitrary many-body state in quantum mechanics requires an exponential amount of information. While simulating a quantum system with 30 qubits requires just tens of gigabytes, simulating 300 qubits requires more bytes than the number of atoms in the observable universe! Even the state-of-the-art approximation methods for quantum mechanics such as Hartree-Fock Theory (HF) and Density Functional Theory (DFT) can take a long time: hours, days, or even weeks to compute. Learn more about QuBits.
Even with the computing power available today, simulations or real-time analysis for acoustic transmission or absorptions in buildings can take a long time.
Computer simulation of electrons in the potential of atomic nuclei is the workhorse of modeling material properties such as phase stability, mechanical behavior, and thermal conductivity. However, these simulations are limited by their computational cost.

Deep learning has the potential to break through this limitation.

Imagine creating a deep learning model by constructing a dataset that covers the entire physically relevant set of configurations for the problem and then just using the model to completely bypass costly calculations in the future.

You can use the predictive power of deep neural networks to cut the computation time down to a couple of seconds.

Yes, you will do complex simulations once, but only once! Once you have built your dataset, decided the fitting methodology such as a simple FFN (feed forward neural network), RBM (a restricted-Boltzmann-machine) or any other neural network architecture, your model can serve as a template for all future work!

To me, this is one of the best uses of deep learning to build an explanation without too much computation!

People often think of AI for boosting growth by substituting humans, but actually, huge value is going to come from how humans will use AI. This is yet another perfect example how deep learning will help us advance more.

Explainable Intelligence Part 3 – The Strategy for XAI

deep_2022 — Mon, 15 Jul 2019 04:35:00 +0000

It is not enough to say that something is true just because ‘I know it’s true!’ – we have to have some evidence or argument that gives a justification for our belief. Explanations, justifications, and more broadly epistemology have been the focus of philosophy for thousands of years. For Plato, being puzzled and therefore wanting to be unpuzzled is the origin of philosophy.

Theaetetus: Yes, Socrates, and I am amazed when I think of them; by the Gods I am! And I want to know what on earth they mean; and there are times when my head quite swims with the contemplation of them.

And, here we are, 2500 years later, longing for clarity, in something we have created – artificial intelligence.

The “Explanation” in AI aims to enable human users to understand the reasons behind the predictions. Comprehending the reasons behind predictions is fundamental if one plans to take action based on a prediction. There are three essential needs for explainability:

Explain to Justify – For users to understand and trust. Why did the AI system do that? Why didn’t the AI system do something else?
Explain to Control – To comply and sustain. In fields like medical, defense, judiciary, education etc., models must be strictly answerable for their predictions. You need to be able to explain so as to comply with accountability and regulatory requirements. When you can control, you can sustain performance and prevent things from going wrong. Bias is potentially present in any dataset and explainability help you to identify and mitigate bias. And, for Debugging/Troubleshooting.
Explain to Improve – To Iterate and improve performance. When you understand the underlying mechanics of that technique, you will know the potential pitfalls associated with it and how to improve. Explanations help in continuous optimizations for better decision making.

Explanations make AI – fair, robust, certifiable, ethical, privacy-preserving, and human interpretable. Explainability enables human users to effectively manage the AI systems.

However, what makes machine learning algorithms excellent predictors, also makes them difficult to understand. They look like “black boxes” to most users.

XAI aims to “produce more explainable models while maintaining a high level of learning performance (prediction accuracy); and enable human users to understand, appropriately, trust, and effectively manage the emerging generation of artificially intelligent partners” – DARPA. It aims to answer the following questions:

Why a particular prediction was made, as opposed to others?
Will the model always do that?
When does it fail & why? When to disregard the model output or how to fix the error?

Before we go into how XAI is achieved, lets discuss how it is delivered, in what form and what is the scope of explainability.

Type of Explanations

The purpose of the explanations is to allow affected parties, regulators, and other non-insiders to understand, discuss, and potentially contest decisions made by black-box algorithmic models. An explanation system can provide explanations mainly in multiple forms:

Why-type explanations to describe why a result was generated for particular input. Such explanations aim to communicate what features in input data or what logic in the model resulted in a given machine output.
Contrastive Explanations that highlight not only the pertinent positives but also the pertinent negatives. For example in medicine, a patient showing symptoms of cough, cold and fever, but no sputum or chills, will most likely be diagnosed as having flu rather than having pneumonia.
Explanations by Example: Explaining the decisions of a model might be to report other examples the model considers to be most similar.
What If Explanations is an interactive approach to help user explore what would the system do if a particular input was different. In this approach, user can simulate a few changes to explore the change in output. For example, checking if one’s mortgage request would still be declined if the requester was a male.
How to Explanations is another interactive approach to help user explore how to get the system to produce a chosen output value. The goal is to provide user with the hypothetical input conditions to produce that output. For example, finding out what annual income would have helped to get the mortgage request approved.

Why-type explanations, contrastive explanations and explanations by example address evaluation (how a system works), while what if and how to explanations address the curiosity (build a mental model of system).

How Explanations are Delivered

An explanation system can deliver explanations to humans in various forms:

Verbal/Textual Explanations

Humans often justify decisions verbally. Verbal explanations describe the machine learning model and reasoning with words, text, or natural language. Verbal explanations are popular in applications like question answering explanations, decision lists, and explanation interfaces.

This form of explanation has also been implemented in recommendation systems and robotics. An example technique of building verbal explanation is by training one model to generate predictions and a separate model, such as a recurrent neural network language model, to generate an explanation.

Visual Explanations

Visual explanations use visual elements to describe the reasoning behind the machine learning models. Visual explanations include visualizations of learned model parameters, evaluation metrics, computational graphs, data-flow graphs etc.

Visualizations can be in form of scatter plots, line charts, heat-maps, node-link diagrams, hierarchical (decision trees) etc. For example, line charts for temporal metrics, heat-maps that overlay images to highlight important regions that contribute towards classification and their sensitivity, synthetic images that are representative of what the model has learned about the chosen features, visual back-propagation to visualize which parts of an image have contributed to the classification, visualizing CNN filters, prediction difference analysis to highlight features in an image to provide evidence for or against a certain class and more.

Scope of Explainability

Does the interpretation method explain the entire model behavior or an individual prediction? Or is the scope somewhere in between? An explanation can be either at a global level or local level.

Global level

This is all about trying to understand “How does the model as a whole make predictions?” It is all about being able to explain the entire reasoning leading to all the different possible outcomes.

Understanding of conditional interactions between the response variable(s) and the predictor features on the complete dataset provides the first measure in assessing the trust of any model. The trust is established when the list of important variables is consistent with domain expectations and can also stay stable, with slight data variations.

Even though a multitude of techniques (such as decision trees, rule lists, feature importance, etc.) can be used to enable global interpretability, analyzing and explaining the feature interactions (for model decisions) when there are more than three or four features is quite difficult.

This class of methods is helpful to explain population level decisions, such as alcohol consumption trends or ML course enrollment trend.

Local level

This is all about providing justification for a single prediction. It is about understanding “Why did the model make specific decisions for a single instance?” and “Why did the model make specific decisions for a group of instances?”.

Local explanations identify the specific variables that contributed to an individual decision. For example, when trying to explain why a machine learning algorithm declined mortgage to an individual.

Although global interpretability allows you to verify the hypotheses and whether the model is overfitting to noise, it is hard to diagnose specific model predictions. Local interpretability, on the other hand, tries to answer: why was this prediction made or which variables caused the prediction?

Several techniques such as LIME, LOCO, Anchors, Saliency maps, Local Explanation Vectors etc. can be used to enable local interpretability.

As you can see, XAI is not an AI that can explain itself, it is a design decision in implementation. In the next posts, I will go over the approach of designing XAI, model dependency of XAI, details of inner workings of various techniques as well as how explainability is evaluated.

Explainable Intelligence Part 2 – Illusion of the Free Will

deep_2022 — Sat, 29 Jun 2019 04:30:00 +0000

Explainable Artificial Intelligence (XAI) is getting a lot of attention these days, and like most people, you’re drawn to it because the very nature of neural networks – opacity induces the feeling of deprivation that arises from the perception of a gap in knowledge and understanding.

Many of us are even willing to refuse the use of artificial intelligence because we cannot explain how artificial intelligence gives its decision. The governments are creating laws around it. For example, GDPR Regulations prohibit any automated decision that ‘significantly affects’ EU citizens. These new rules give citizens the right to review how digital services made specific algorithmic choices affecting people.

And, that does make a compelling case for investing in Explainable Artificial Intelligence but, before we demand machines to explain why they did what they did, let us look at one fact – can humans explain their choices?

I know I am opening the old and heated debate of the existence of free will in human behavior. But, it is crucial to dive into the secrets of the brain (artificial intelligence models are based after real brain after all) to understand the mechanics of making Artificial Intelligence Explainable.

Every day, humans face many decisions, some trivial, others more complex. How to dress for the day, what ice-cream flavor to eat, where to eat lunch today – the choices we make in life have a significant impact on our own lives, and the lives of others.

Although we think that it is the “conscious” that is making a decision, it is the “brain” that’s running the show. I am sure you have seen visual illusions. It is incredible how many of us see the same illusion. Its almost as if there is the same “program” in our brains that is constructing the image and telling us. The “reality” we experience is the result of what our brain tells us.

Hermann Grid is one of the most classic examples of an optical illusion, where your mind is being tricked into seeing something that’s not there. I bet you see the gray blobs appearing at the intersection of black and white squares. There are none!

This is true not only of visual experiences but of all sensory perceptions. Whether we are experiencing the feeling of “hate”, the appearance of “gray blobs” or hearing things, these are the result of the “decisions” made in our brain.

The brain is a hugely complex, highly recurrent, and nonlinear network made up of billions of nerve cells, or neurons. Everything you do, from breathing, walking, and talking to thinking, reading, and problem-solving is the result of neurons communicating with each other.

For neurons to communicate, they need to transmit information both within the neuron and from one neuron to the next. This process utilizes both electrical signals as well as chemical messengers.

A typical neuron possesses a soma (the bulbous cell body which contains the cell nucleus), dendrites (long, feathery filaments attached to the cell body in a complex branching “dendritic tree”), a single axon (an extra-long, branched cellular filament, which may be thousands of times the length of the soma) and maintains a voltage gradient across its membrane. The tips of the branches of the axon, called axon terminals or boutons, impinge on other neurons or effectors.

Any two neurons do not touch; they have a space between them. This locus of interaction between a bouton and the cell on which it impinges is called a synapse (the actual gap, also known as the synaptic cleft, is of the order of 20 nanometers), and we say that the cell with the bouton synapses upon the cell with which the connection is made. Each neuron can be connected to tens of thousands of other neurons.

The dendrites of neurons receive information from sensory receptors or other neurons. This information is then passed down to the cell body and on to the axon. Once the information has arrived at the axon, it travels down the length of the axon in the form of an electrical signal known as an action potential. The action potential travels down the axon until it reaches the synapses, where it then causes the release of neurotransmitters of various types. The neurotransmitters cross the cell membrane into the synaptic gap between neurons. These chemicals then bind to chemical receptors in the dendrites of the receiving (postsynaptic) neuron. In the process, they result in a decrease or increase of the membrane potential of the postsynaptic neuron. If it causes the membrane potential to pass the firing threshold, then it will activate an action potential in the postsynaptic neuron and send it down its axon. And it goes on.

The number of neurotransmitter receptors at a synapse is a key element determining synaptic transmission efficacy. The changes in the number of receptors at synapses is one of the main underlying molecular events that happen as a result of numerous transient interactions between neurons. Not only synaptic strength, but the neuronal activity also modifies the morphology of the dendrites in response to the stimuli received during a learning process. Neurons grow new and retract old branches, attaching (synapsing) to other neurons in new places and removing old connections.

Learning is continually rewiring the brain. However, the vast majority of these cells are produced in the womb – during early pregnancy. The majority of the remaining neurons are produced during a short period of time after birth. Babies are not born as blank slates – we basically inherit these neural connections, synaptic weights, thresholds as a coded list of amino acids (DNA) and, many encoded in our brain in our most formative years as we are born into a world over which we have no choice.

Yes, there is a real world out there, and we perceive events that occur around us, however we’re not perceiving what’s out there. We’re perceiving whatever our brain tells us. Our decisions, even when we think that we are involving your conscious self, are based on the “weights”, “thresholds” and “connections” that had been burned deep into the brain’s circuitry. To quote David Eagleman, “Our decisions are not free-will choices but result of the hands of cards we’re dealt.”

What we have in form of DNA and brain’s circuitry is not a blueprint, but a fortune teller.

We are not at the center of ourselves, and we can not explain our choices. The levels of subjective wellbeing we feel, our problem solving skills, how we ideate, and so on. You might watch less TV, drink less alcohol, might have invented the next ‘as seen on TV’ gadget, and, your wife might not have left you! Recent reviews of neuroscientific work confirm that many of Freud’s original observations, not least the pervasive influence of non-conscious processes and the organizing function of emotions for thinking, have found confirmation in laboratory studies.

However, I will stop right here while you ponder the question of free will.

AI needs to be explainable, and in the next articles, I will share my learnings of the strategy, approach, scope, and techniques for XAI.

Explainable Intelligence Part 1 – XAI, the third wave of AI

deep_2022 — Sun, 23 Jun 2019 16:20:00 +0000

Artificial Intelligence (AI) is democratized in our everyday life. Tractica forecasts the global artificial intelligence software market revenues will grow from around 9.5 billion US dollars in 2018 to an expected 118.6 billion by 2025.

Gartner predicts that by 2020, AI technologies will be virtually pervasive in almost every new software product and service. It also predicts that the business value created by AI will reach $3.9T in 2022.

We are becoming accustomed to AI making decisions for us in our daily life, from product and movie recommendations on Netflix and Amazon to friend suggestions on Facebook and tailored advertisements on Google search result pages.

NetFlix uses AI for personalization of movie recommendations, auto-generation and personalization of thumbnails / artwork, location scouting for movie production (pre-production), movie editing (post-production), streaming quality and a lot more (Business Insider).

AI plays a huge role in Amazon’s recommendation engine, which generates 35% of the company’s revenue as well as in Amazon’s GO hardware, which includes color and depth cameras, as well as weight sensors and algorithms.

23% of North American enterprises have machine learning embedded in at least one company function as of last year. 19% of enterprises in developing markets including China and 21% in Europe also have successfully integrated machine learning into functions. Most businesses are considering adding AI capability in their systems for certain Business Value Proposition.

By 2021, the term AI will no longer be considered a differentiator in marketing tech provider solutions.

Although AI is powerful in terms of results and predictions, AI algorithms suffer from opacity, that it is difficult to get insight into their internal mechanism of work. And, there are times when we need to know why it made a specific decision.

In March 2018, an autonomous vehicle operated by Uber hit and killed a woman in Tempe, Ariz., as she was walking her bicycle across the street.

So what happened? Uber car’s computer system had spotted Ms. Herzberg six seconds before impact, but classified Ms. Herzberg, who was not in a crosswalk, first as an unrecognized object, then as another vehicle and finally as a bicycle. The self-driving software decided not to take any actions after the car’s sensors detected the pedestrian. Uber’s autonomous mode disables Volvo’s factory-installed automatic emergency braking system, according to the US National Transportation Safety Board preliminary report on the accident. Uber suspended testing of self-driving vehicles after the crash. In December, the vehicles returned to public roads, though at reduced speeds and in less-challenging environments.

“This product is a piece of sh**” wrote a doctor at Florida’s Jupiter Hospital regarding IBM’s flagship AI program Watson, according to internal documents obtained by Stat. It recommended ‘unsafe and incorrect’ cancer treatments.

In 2013 IBM developed Watson’s first commercial application for cancer treatment recommendation, and the company secured a number of key partnerships with hospitals and research centers over the past five years. But Watson AI Health has not impressed doctors. Some complained it gave wrong recommendations on cancer treatments that could cause severe and even fatal consequences. In February 2017, Forbes reported that MD Anderson had “benched” the Watson for Oncology project.

Amazon HR reportedly used an AI-enabled recruiting software between 2014 and 2017 to help review resumes and make recommendations. The software was, however, found to be more favorable to male applicants.

The software reportedly downgraded resumes that contain the word “women” or implied the applicant was female, for example, because they had attended a women’s college. Amazon has since abandoned the software.

AI is a journey, and in no way, any of these failures are suggesting that we should do to Artificial Intelligence what Luddites tried.

Although Artificial Intelligence has captured the imagination of the world since its inception in 1956, at a historic conference at Dartmouth, it has been stuck in the discovery phase with hand-crafted and custom engineered AI applications. Researchers worked on figuring out how to solve a particular problem, and, then do traditional coding. A key characteristic these applications shared was no learning ability and poor handling of uncertainty. It is the Deep Learning systems, starting in the early 2010’s, aided by huge amounts of training data and massive computational power that have shown us the substantive glimpses of the power and application of AI. The period in AI until 2010 essentially represents the “First Wave” and then the deep learning starting the “Second Wave”.

Deep learning is an architecture modeled loosely on the human brain. It makes use of neural networks that consist of thousands or even millions of nodes (neurons) that are densely interconnected and organized into multiple layers. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data. The essential concept is that there is a “weight” assigned to each of the incoming connection of a node, the node computes the sum of the weighted inputs based on data and weight from all connected nodes beneath it, and then uses a threshold to pass or not pass the data to all its outgoing connections to the nodes in the layer above it (akin to “firing of a neuron”).

For training, huge amounts of labeled data is fed into the neural network. An object recognition system, for instance, might be fed thousands of labeled images of cats, dogs, digits, and so on, and it would find visual patterns in the images that consistently correlate with particular labels. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

You have “learned” to mathematically recognize incredibly subtle patterns within the mountains of data

Using this “pattern recognition,” you can now “provide” an answer, a decision or a prediction to an input data that you have never seen before.

However, this AI using deep learning still has a long way to go to approach human-level learning, thinking, and problem-solving ability. It is a great tool but is unlike the human brain in basic learning. These algorithms suffer from opacity, and do not provide reasoning. They require mountains of data to train. It has no common sense, conceptual learning, creativity, planning, cross-domain thinking, self-awareness, emotions, etc. It is basically an optimizer based on data for a very specific vertical single task.

Deep Learning is still very weak in intelligence.

Until we figure out the engineering paths to the general intelligence capabilities, the “fourth wave”, the focus of AI is shifting to opening the blackbox.

We are now entering “third wave” of AI where AI systems will become capable of explaining the reasoning behind every decision made by them. The AI systems themselves will construct models that will explain how it works.

XAI is all about improving trust of AI-based systems. At one end it brings fairness, accountability and transparency to the front and center of AI; and on the other end it enables us to control and continuously improve our AI systems.

To me, this third wave, XAI, is the sine qua non for AI to continue making steady progress without disruption. In last year and a half, I have been spending a lot of time on learning XAI. In this series of articles, I will share my learnings of the strategy, approach, scope, and techniques for XAI.