Unifying Bioprocessing Data to Transform Biomanufacturing with AI

Artificial intelligence (AI) is edging closer to the bioreactor. But its progress is being slowed by an unglamorous obstacle: the disarray of bioprocessing data. According to Phil Mounteney, vice president of science & technology at Dotmatics, the industry’s biggest hurdle is not the sophistication of algorithms but the fractured information ecosystem they must learn from.

Bioprocessing data is often scattered across electronic laboratory notebooks (ELNs), laboratory information management systems (LIMS), instruments, spreadsheets, and legacy systems that don’t talk to each other, Mounteney says. The most problematic gaps arise from real-time analytics in bioreactors—high-frequency time-series signals such as pH, dissolved oxygen, feeding rates, agitation, and inline spectroscopy. These streams are typically captured in supervisory control systems, such as SCADA (supervisory control and data acquisition), or databases, such as time-series systems that log equipment and sensor data. Still, this information remains siloed from batch records and offline assays. “When all of this data is fragmented and inconsistent, AI models cannot learn reliable patterns,” Mounteney explains.

Equally limiting is the absence of a continuous digital lineage. “Real-time bioreactor traces and equipment logs are often not cleanly linked to the corresponding batches, cell lines, raw materials, or analytical results,” Mounteney says. Without this end-to-end context, even advanced algorithms struggle to differentiate a well-behaved run from one drifting toward failure.

Mounteney argues that the remedy requires treating bioprocess information as “a first-class, shared asset rather than a byproduct of running experiments.” He envisions a unified data layer capable of ingesting signals from ELNs, LIMS, SCADA systems, process analytical technology (PAT) tools, and downstream analytics, then shaping them into a harmonized, AI-ready model.

A crucial step is the integration of high-frequency sensor data with lower-frequency batch and assay measurements. This demands robust connectors to bioreactors and inline analytical systems such as Raman spectroscopy, near-infrared (NIR) spectroscopy, and capacitance probes, as well as metadata frameworks that capture setpoints, feed strategies, alarms, deviations, and manual interventions. “You need a contextualization layer that can time-align sensor traces with batch IDs, seed trains, unit operations, and sampling events,” Mounteney says. Automated lineage capture, master data management, and consistent ontologies—including harmonized naming conventions and standardized units—are required to ensure that terms such as “glucose feed,” “Glc feed,” and “C-feed” are treated as the same concept.

The payoff, Mounteney suggests, could reshape biologics development. “Once you have unified, high-quality, well-contextualized data, the character of AI in bioprocessing changes completely.” Model accuracy improves, critical process parameters (CPPs) are identified more reliably, and design-of-experiments (DoE) cycles shrink from brute-force matrices to targeted, information-rich studies. Scale-up becomes less perilous, as models trained across volumes—from 200-L pilot reactors to 2,000-L production vessels—better anticipate facility- or equipment-specific risks. Regulatory interactions also benefit from enhanced traceability: “You can present a coherent, data-driven narrative rather than stitching it together from PDFs and spreadsheets,” Mounteney explains.

In the near term, Mounteney foresees AI moving from offline analytics to embedded process intelligence. Soft sensors will infer critical quality attributes (CQAs), such as titer or glycosylation, from real-time signals, including off-gas composition and spectroscopy, enabling earlier detection of off-track batches. Hybrid models that combine mechanistic understanding—mass transfer, cell-growth kinetics—with machine learning will offer interpretable guidance for process optimization. “We’ll see AI suggesting which conditions to explore next, or proposing changes to feed strategies and control logic,” he predicts.

As organizations invest in integrated data infrastructure—harmonized ontologies, unified identifiers, and robust digital threads—Mounteney expects adoption to accelerate. “That’s when we’ll see genuinely measurable impact,” he says.