The AI Black Box

UX-driven research project dedicated to bridging the "trust gap" in industrial Artificial Intelligence. Unlike traditional AI development which focuses solely on algorithms, this project treats User Experience as the missing link in high-stakes environments like supply chains and manufacturing.

blue and white striped round textile
blue and white striped round textile
an abstract photograph of a curved wall
an abstract photograph of a curved wall
low-angle photography of blue glass walled building during daytime
low-angle photography of blue glass walled building during daytime

Read Time

Timeline

70 Minutes

May 2025 - Present

Types of AI Machine Learning

Unsupervised

The algorithm analyzes internal structure and mathematical distribution of the data.

Supervised

The algorithm learns direct mapping function between the input and the result.

Project Focus

In Machine Learning. Supervised models offer a clear path to results because they rely on predefined answers. however, unsupervised Learning remains a "Black Box" a Challenge where it is notoriously difficult for users to understand why a specific anomaly was flagged.

My project embraces this challenge by focusing on Unsupervised Learning, specifically aiming to pull back the curtain on its complexity. I am developing a framework that translates these 'Black Box' outputs into transparent, intuitive insights that even a non technical stakeholder can trust and understand.

Unsupervised

Anomaly Detection

The Topic

Why

Reducing unplnned downtime in the supply chain directly impacts the bottom line by preventing production bottlenecks aand delivery delays. Moving away from "fix it when it breaks" (which is costly and disruptive) to "fix it before it fails"

What

Anomalies are data points that stand out amongst other data points in the dataset and not confirm the normal behavior in the data. This data points or observation deviate from the dataset's normal behavior patterns

Predictiveness

Monitoring Anomaly Detection (Proactive)

60%

Anticipatory / Recommendation / Personalization

30%
10%

Automating Actions

Project Focus

The Stakes

Bhopal Disaster

Background Examples

Warnings were ignored, safety systems were disabled, and catastrophic delays ensued not due to lack of data, but a tragic failure of system

  1. The Trigger Anomaly: High pressure and temperature inside tank E610, far exceeding safe limits

  2. Alarm Fatigue: The primary pressure alarms were known to frequently sound false warnings, leading operators to ignore or silence them routinely

Motivation

Why UX is the Missing Link

2.

Operators often override or ignore poorly explained alerts, leading to delayed or incorrect decisions.

3.

User experience has been neglected in the "last mile" intelligence delivery layer

1.

Trust and transparency are critical safety enablers in high stakes environments.

Overall Problem Statement

Users cannot easily choose which data to include or define specific timeframes for analysis (CRUD and Configuration).

Lack of Control
The "Data Mystery"

There is no guidance or "hints" to help non experts select the right sensor data to train an effective model

Short title
In modern industrial settings, Anomaly Detection is vital for keeping assets running. However, most systems fail because their AI models are "Black Boxes" they tell a maintenance engineer that something is wrong, but never why.

When engineers don't understand the "WHY," they experience alarm fatigue and lose trust in the system. This leads to ignored alerts, delayed repairs, and increased operational risk.

The Trust Gap

Engineers have no clear way to validate results, making it impossible to distinguish between a real machine failure and simple data "noise"

Static Models

Systems often fail to update as machine conditions change, making the AI's insights outdated almost immediately.

Insights often stay stuck in a dashboard rather than automatically triggering a "Closed Loop" action (like a Work Order) in the System

Broken Loops
Pain Points
Based on the Discovery workshop on the existing features

Goal and Scope

The core goal is to define, prototype and validate a foundational Human AI Interaction Framework that can build user trust in automated systems. To pioneer a human centered design approach that makes AI explainable, Controllable and Actionable

Universal Impact

This research is designed to go beyond the supply chain. the resulting AI Experience (AIX) patterns are domain agnostic, providing a blueprint for transparency and trust in any high stakes field, form industrial assets to medical diagnostic and scientific data analysis

Critically, this research is designed to cover the entire operational journey

From the initial design and creation of the model > review the model > feedback > retrain > use the Model > feedback > Retrain > Continuous use of feedback. this establishes a truly End-to-End, Closed loop system.

Testbed

We use APM as the rich, high stakes testing ground, working with multi modal data (sensor, visual, reports) and real world operational experts.

The Human-Centered AI Trust Loop

The Project is organized around the three critical phases of human interaction with an anomaly detection system

Control in Model Creation
and Configuration
Trust in Decision Review
Automation & Continuous Learning

Control in Model Creation & Configuration

Explainable Setup

blue and white striped round textile
blue and white striped round textile
Key Mechanism

Guided visual setup allowing non-data scientists to configure.

Output

A model whose initial assumptions are visible and approved by the expert

Why it matters

Users trust what they help build and understand.

  • Input data normalization strategies.

  • Feature inclusion/exclusion (eg - exclude sensor X on weekends)

  • Anomaly sensitivity thresholds.

Problem Statement

The maintenance expert understands the machine’s "physical reality," but the system lacks the flexibility to let them translate that expertise into the AI model.

This creates a fundamental "Configuration Gap": if an engineer cannot easily control what goes into the model, they will never trust the anomalies that come out of it.

Pain Points

Data preparation is a HUGE and exhausting task. Manually cleaning sensor data, aligning timestamps, and filtering out "noise" (like when a machine is turned off for maintenance) can take weeks of manual effort.

The Data prep "Mountain":
The Selection Paradox:

Engineers often don't know which specific sensors (temperature, vibration, pressure) the AI needs to see to be effective. Without "hints," they are just guessing.

Users often cannot easily Create, Update, or Delete specific data inputs. If a sensor breaks or is replaced, the model becomes obsolete because the user lacks the control to adjust the configuration on the fly.

Lack of CRUD Flexibility:

Industrial assets change over time. Being unable to easily define when a machine was "healthy" vs. when it was "unstable" leads to models trained on bad data.

Fixed Timeframes:

User Flow

Wireframe and concept for validation

Research interview and Insights

To validate the effectiveness and efficiency of the designs for the APM Manage Anomaly Detection Models app, and evaluate its ease of use and user satisfaction while using the app.

Goal:
Expectation:

The results of this research will clarify the task flow(s), pain points, ideas the end user expects for the app. With this information we will be able to better improve the solution we’re planning to build.

As a Reliability Engineer / Maintenance Planner, I want to set up, train and review a model for anomaly detection. Since I am not a data scientist, the setup should be simple, yet with an accurate output.

Scenario:

All related to creating, training, and reviewing a model, from start to finish through all of its aspects.

Tasks:

Feedback Prioritization

Received a Positive feedback for our new approach and freedom what end user is getting is really very well accepted and also was expected from the user

Segregating insights for short term, mid term and Long term

Model Failed

You didn’t come this far to stop

Trust in Decision Review

Actionable Transparency

blue and white striped round textile
blue and white striped round textile

Designing a Multi-Modal Decision Review UI that fuses all relevant data inputs instantly. The UI must clearly communicate three pieces of critical information: the Anomaly Score, the Prediction Certainty (AI's confidence), and the Context/Impact (why it matters).


Quantifying Trust Calibration, which means users correctly rely on high-certainty predictions and investigate low-certainty ones.

Bright living room with modern inventory
Bright living room with modern inventory

Simulation

The AI is designed to detect statistical anomalies, but the maintenance engineer needs to make operational decisions. Even after a model is trained, it remains a "Black Box" of uncertainty. Without the ability to validate the model's logic or test its limits, the engineer cannot trust the system to act as a reliable guardian of the asset.

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp

Problem Statement

Pain Points

Once training is complete, there is no way for the user to verify if they used the right data. Users are left wondering: "Is this data sufficient, or am I missing the specific sensor that actually signals a failure?"

The "Optimal Data" Mystery
The "Utility" Doubt:

A trained model doesn't prove its worth until it sees an anomaly. This creates a dangerous period of doubt: "Is this model actually useful, or will it miss the very anomalies it was built to find?"

Defining the "optimal" alert threshold is a guessing game. If it's too sensitive, it creates alarm fatigue; if it's too loose, it misses critical failures. Users lack the tools to simulate and balance these thresholds before going productive.

The Threshold Tug of War:

There is no "glass box" view to see which sensors contributed to an alert. Without this, engineers cannot verify the AI's findings against their physical expertise.

Inability to Validate:

Whiteboarding

Use this space to introduce yourself or your business to site visitors. Share who you are, what you do, and the purpose of this website.

photo of white staircase
photo of white staircase
white concrete building during daytime
white concrete building during daytime

Hypothesis A

After training completed, user needs to validate training data details, to see if Anomaly Detection model has been trained with the right/optimal data to detect anomalies.

What can go wrong

User defines in training period(s) to less data (but more then the minimal sample size) => AD model could perform insufficient on new data

  • Exclusions by indicator value (unknown effect) leading to little data

  • Exclusions by time (semi known effect) leading to little data

  • Exclusion by data quality issue (e.g. caused by irregular data ingestions)

User defines in training period(s) with to much data but the system can process on some of it. Some training periods are not or only partly represented in training data

  • Data from relevant TOs (e.g. TOs which user can use to validate AD model performance)

  • Data contains no or to less contaminated data (data with notifications)

Hypothesis B

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp
  • User can validate Anomaly Detection model on historical data (training data) by comparison of simulated alerts with notification or knowledge of issues in the past (by date and technical object).

  • User can assess alert within a certain date range (of notification date) and categorize as useful/good or useless/false alert.

  • User can tune anomaly score and alert thresholds to adjust the alerts to tune it to his/her expectations.

Research interview and Insights

To validate the Problem Statements and initial wireframe mockups for the next phases (review and simulation) of managing anomaly detection models, to ensure ease of use and user satisfaction while using the app.

Goal
Expectation

The results of this workshop will clarify the Problem Statements and whether the mockups fulfill the initial expectations of the users. With this information we will be able to better improve the solution we’re planning to build.

Feedback Prioritize

Bright living room with modern inventory
Bright living room with modern inventory

Finial Design

Use this space to introduce yourself or your business to site visitors. Share who you are, what you do, and the purpose of this website.

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp

Automation &
Continuous Learning

Human-in-the-Loop

blue and white striped round textile
blue and white striped round textile

Defining simple Human-in-the-Loop actions (e.g., a single "Accept/Reject" button) for real-time feedback that seamlessly trains the model. This theme also standardizes the experience into a clear, three-part hierarchy for all alerts: Prediction, Explanation, and Action.

Bright living room with modern inventory
Bright living room with modern inventory

Simulation

AI remains a static laboratory experiment rather than a resilient operational partner. 



Even the most accurate AI model is a liability if it cannot be translated into a business decision. Currently, there is a "Last Mile" disconnect where sophisticated mathematical predictions fail to become productive maintenance actions. Without a standardized hierarchy to interpret alerts or a mechanism for experts to "talk back" to the system

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp

Problem Statement 3

Pain Points

Because retraining is often a manual, technical hurdle involving complex data re-selection, models are rarely updated. They quickly become obsolete as the asset ages or operational conditions change.

The "Static Model" Trap:
Information Overload:

Alerts are often presented as raw data or complex scores without context. This forces engineers to spend valuable time "investigating the AI" rather than fixing the machine.

There is no simple way for a user to provide a "Human-in-the-Loop" verdict. If an engineer identifies a false alarm, that knowledge isn't captured, forcing the system to repeat the same error indefinitely.

The Missing Feedback Loop:

Simulation

Most Predictive Maintenance projects fail because they are "high-maintenance." To keep a model accurate, a human usually has to manually intervene: re-selecting hundreds of sensors and identifying specific historical dates for "healthy" data. This creates a bottleneck where the AI cannot scale because it requires constant manual reconfiguration.

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp

Problem Statement 4

Pain Points

Manually identifying and grouping relevant Sensor Objects for a specific asset is a tedious, error-prone task that consumes hours of engineering time.

The "Data Janitor" Burden:
The Training Window Guessing Game:

There is no automated way to identify "Gold Standard" historical periods. Engineers must manually scan logs to find stable Training Timeframes, leading to inconsistent model quality.

Because retraining is a manual chore, models are rarely updated. They become "static snapshots" that eventually lose accuracy as the physical asset undergoes repairs or environmental changes.

The "Frozen" Model Problem: