Embecosm: The Open Source Software Tool Chain Experts

The Open Source Software Tool Chain Experts

Dynamic Causal Modeling

Embecosm provides the first commercially robust, high performance implementation of Dynamic Causal Modeling (dcEmb), a revolutionary Machine Learning technique that provides high levels of interpretability and explainability, most notably in time series data.

icon1

Explain Solutions

dcEmb evaluates solutions in an intrinsically explainable way that accounts for how every part of an outcome is related to every other part.

Explore Uncertainty

dcEmb calculates not just a single, best, solution but the relative likelihood of all solutions.

Use Existing Data

dcEmb provides a systematic way of pre-encoding knowledge into models, preventing it from having to learn complicated or nuanced datasets fully from scratch.

Explain Solutions

dcEmb is highly focused on explainability, especially in the challenging domain of time series analysis.

What is Explainability?

Explainability in AI is ability of an AI to provide human interpretable accounts of the decisions and predictions that an AI system makes. Explainability is crucial for building trust and transparency in AI systems, and ensuring that they make fair and unbiased decisions. It also allows developers and users to identify and correct errors or biases in the models.

Case Study: Amazon Automated Hiring

Explainability is very important, even for the biggest tech companies in the world. In 2018, Amazon suffered a public failure of an artificial intelligence system intended to help automate the hiring process for job openings in their company. This failure could have been mitigated with an explainable AI system.

The AI was designed to rank resumes and evaluate applicants on a scale of 1 - 5 based on their qualifications, skills, and experience. It was trained on 10 years of resumes submitted to Amazon, and contained information on educational backgrounds, work history, and many other factors.

Despite the potential benefits such a system might have offered Amazon, it quickly became clear that the realization of the system was problematic. It exhibited a range of biases, most prominently strong biases against female candidates, especially those applying for male-dominated tech roles.

The issue stemmed from the training dataset - Amazon’s hiring processes seemed to (like many others in the tech industry) heavily skew toward male applicants. Furthermore, it wasn’t enough to just remove indirect references to gender. When direct references to gender were removed, the AI simply found indirect references instead.

Ultimately, even with all the resources at Amazon’s disposal, the flawed system was eventually scrapped, and Amazon’s HR department continued to use human recruiters to screen job applications. The incident highlighted the risks of using AI for decisions like these, and the need for fairness and transparency in AI systems (and hiring processes).

With a transparent and explainable system like dcEmb, problematic biases can be traced back to their source, and alternatives put in place. Even further, models created by dcEmb can be modified and interrogated to provide and evaluate “what if” scenarios.

Explore Uncertainty

dcEmb calculates not just an individual “best” solution, but the relative likelihood of all outcomes.

What is Evaluating “All” Outcomes?

Almost all AI algorithms provide a single, fixed solutions to problems posed to them. However, in many real world situations, it’s important to understand uncertainty and variability surrounding a proposed solution to understand the risks and rewards of different outcomes. Algorithms that do this, like dcEmb, achieve this by calculating a “probability distribution” of outcomes, effectively calculating all possible solutions at once and their relative likelihoods.

Case Study: Autonomous Vehicles

Evaluating all possible solutions is particularly important in situations in which real time, safety critical decisions must be made. One prominent example of this are autonomous vehicles, which have suffered several high profile incidents resulting in deaths over recent years.

Between 2018 and 2019, there were at least 3 notable incidents. In March 2018, an Uber autonomous vehicle struck and killed a pedestrian in Tempe, Arizona. In May 2018, a Tesla Model X crashed while on Autopilot mode, resulting in the death of the driver. In March 2019, a Tesla Model 3 also crashed while on Autopilot mode, resulting in the death of the driver.

While the specifics of individual incidents vary, a unifying theme of these incidents is misidentification of stimuli by the vehicles - mistaken perception. This has been a long identified problem with these systems. Teams at Mcafee (2015), Microsoft and The University of Michigan (2020), and Georgia Tech and UNC Chapel Hill (2021) have all identified adversarial attacks on autonomous vehicles that allow an attackers to cause Vehicles to mispercieve items in it’s visual field. For example, to misclassify a stop sign as a speed limit, or to force a lane change with a small piece of black tape.

Despite the best efforts of the automotive industry to counteract these problems, they remain in no small part due to the limitations of the neural networks that these systems employ for perception. These networks struggle to evaluate stimuli significantly different to those they have been trained on, with sufficiently novel stimuli causing erratic and unpredictable behavior. These systems can’t fully express uncertainty.

With technology like dcEmb that evaluates the relatively likelihood of all solutions, you get a picture that captures the uncertainty and complexity inherent to real world problems that allows nuanced and detailed decision making.

Use Existing Data

dcEmb can pre-encode knowledge of problems into models, to prevent them from having to learn a solution from scratch.

Using Existing Data

Prior knowledge encoding in AI refers to incorporating domain-specific information or expert knowledge into an AI system to help guide its decision-making and improve accuracy. It is important because it can lead to more efficient learning, reduce the amount of data required for training, and improve the transparency and interpretability of AI systems, ultimately building trust and confidence in their predictions and decisions.

Case Study: Learning to Play Games

The ability to encode prior knowledge is especially useful when training data is limited. In the original application area of Dynamic Causal Modeling, neuroimaging, it’s desirable to use as much prior information as possible due to the expense and ethical challenges of collecting more data. However, the benefits of prior knowledge encoding are more ubiqutious that than this. Most recently, the importance of prior knowledge encoding was demonstrated with Large Language Models.

In 2013, DeepMind published research on an AI that learned to play Atari games from raw pixel inputs. The technology that drove this was called deep reinforcement learning, and represented, at the time, a huge leap forward in the capability of AI systems to ingest and reason about image data.

One of the selling points of the original DeepMind system was that it has learned how to play these games entirely from scratch, just by observing the pixels on the screen. This took some cumulative 50 million frames of training data and an enormous amount of computing power, quantities that, as far as possible, we’d like to reduce.

One of the other advances in the last 10 years that has been very formative in the field of AI are Large Language Models. These models are trained on huge corpuses of natural language text and, over the last few years especially, have achieved outstanding feats of natural language processing.

Recent research (2023) demonstrated that, by integrating a large language model into the deep reinforcement learning system and providing it with the instructions for the original Atari games in question, training could be sped up almost 6000 fold.

To give an idea of how valuable this could be, one of the recent Large Language Models, GPT-3, has been cited as producing almost 200,000 kg of CO2 to train. A similar gain in efficiently realized here would represent a large, tangible benefit, both economically and environmentally.

dcEmb provides a coherent and complete system for accounting for any and all pre-existing knowledge that you would want to encode into your models. This allows you to get the best possible solution to your problem with the smallest amount of training data.