Unlocking the Hidden Value of Failed Chemical Reactions for Drug Discovery and Development: an Opportunity with Federated AI
- David Brooks
- Jul 16
- 4 min read
In chemical R&D, failure is often the most valuable teacher, but only if you're able to learn from it.
At Qubigen, our latest pilot study showed that ‘dark’ reactions, real-world failed chemical synthesis reactions, may be one of the most underutilized assets in medicinal chemistry and drug development. For industry, these hidden data represent untapped insight, cost-saving potential, and a strategic opportunity for innovation potentially worth billions of dollars per year or more.
The Problem: AI Fails to Recognize Failure
Most AI models in retrosynthesis and reaction prediction today are built on datasets containing principally successful chemical reactions, with examples of genuine “failed” reactions either absent or simulated artificially. This is part-and-parcel of the inherent bias in publications and patents; what is published is invariably successful outcomes, and not failed attempts.
As a consequence, efforts to model chemical synthesis pathways tend to overestimate reaction feasibility, making them appear accurate for predicting positive examples but fundamentally unreliable in real-world settings.
In the pilot study the Qubigen AI team trained a neural network using large dataset of 1.7 million chemical reactions to predict and classify the reactions - similar to an industry model developed by AstraZeneca(1).
After testing the model on a blind holdback dataset that the AI had never seen before, the issues associated with training dataset bias became apparent:
Drawing on all available data, the overall accuracy reached up to 92.9%;
BUT the accuracy on a sample of 800 genuine failed reactions dropped as low as 0.13%, with the best baseline accuracy at just 30.1%.
In other words, AI models trained without real failure data can’t reliably identify dead-end reactions, leading to costly false positives, misallocated resources, and wasted time for chemists.
The Breakthrough: Train on Real Failures, Not Fakes
Qubigen introduced just 3,000 genuine failed reactions into the training set and saw a dramatic turnaround:
Accuracy on a holdback set of genuine failed reactions jumped to 97–100%;
Overall model accuracy on successful and failed reactions together improved to 96%;
Using upgraded, state-of-the-art AI architectures with enhanced translatability, the performance rose to 97% overall, with 100% accuracy on the failed reactions – not dropping a single example.
![]() | ![]() |
These results confirm the clear value of integrating real-world failure data into reaction prediction pipelines. This approach can directly translate to better decision making, resource efficiency, and reduced trial and error in medicinal chemistry workflows.
The Barrier: Sensitivity and Data Access
Within the Pharma industry, the real challenge isn't a lack of failed reaction data, it’s the inaccessibility of those data insights to guide real-world action. Most valuable data remain buried in hand-written lab notebooks, scattered across spreadsheets, or siloed in electronic lab books. Even when centralized internally, there are valid concerns around IP sensitivity, confidentiality, and competitive risk.
This is where Federated AI becomes a critical enabler.
The Solution: Federated AI for Secure, Scalable Learning
Qubigen’s Federated AI framework allows an industry partner to train predictive models on their proprietary failure data, without ever moving or exposing it. By keeping data in place and training models in secure, decentralized environments, organizations retain full ownership and privacy while contributing to more powerful AI systems that they can then use internally to streamline their R&D pipeline and keep waste costs low.
Qubigen’s cost-effective Federated AI, an at-arms-length approach to machine learning, is ideally suited to environments where data sensitivity is non-negotiable, yet predictive accuracy is essential.
Qubigen's technology has been reported to be 187x more cost-effective (2) than traditional distributed systems federated learning, the norm in the field at this moment, thanks to its ultra-light network costs and patented design.
This allows organizations working with Qubigen to extracting insights from their data without moving or directly accessing it.
The Opportunity: Turn Failure into Value
For industry, the case is clear:
You already have a wide range of confidential, failed reaction data; BUT
You haven’t leveraged it to help you, at a crucial step in the R&D journey; HOWEVER
With Qubigen’s Federated AI, you can do so without giving up control.
We believe failed reactions shouldn’t be buried, they should be built upon. Let us help you turn hidden data into a competitive advantage.
The combination of Federated AI and its ability to train on data at-arms-length without moving or exposing the data, together with industry AI expertise to leverage state-of-the-art AI architectures, means there is a clear opportunity to quickly develop a highly accurate reaction-synthesis AI in partnership with an industry partner looking for a competitive advantage. This could lead to significant cost savings in the R&D process.
Qubigen - accelerate drug design without exposing secrets
Whether you're advancing active programs, reviving dormant data, or starting from scratch, Qubigen’s secure AI platform and virtual screening capabilities can help you identify, optimize, and accelerate the path to promising lead drug candidates. Get in touch to explore how we can support your next development.
References
[1] Genheden, S., Thakkar, A., Chadimová, V. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J Cheminform 12, 70 (2020). https://doi.org/10.1186/s13321-020-00472-1
[2] Nguyen, T.V., Dakka, M.A., Diakiw, S.M. et al. A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data. Sci Rep 12, 8888 (2022). https://doi.org/10.1038/s41598-022-12833-x





