Using Zeptomics for hit discovery

Learn more about the performance of ZeptoHit, the hit prediction algorithm of Zeptomics

Finding safe and effective drugs is a fiendishly difficult problem. Traditionally, one of the early steps involved in finding drugs is achieved through “high throughput screening” (HTS) assays. HTS involves synthesizing or ordering a vast quantity of diverse chemical compounds (ranging from tens of thousands all the way to millions of compounds) and testing, in vitro, whether they interact with a protein that is thought to be involved in a particular disease (also known as a “drug target”). Sadly, this process is slow, costly, wasteful, and is an important barrier to discovering promising chemical compounds that can be turned into drug candidates to treat diseases. Indeed, a screening campaign can easily take years to plan and execute, can cost anywhere from hundreds of thousands to millions of dollars, and uses a large quantity of consumables that are usually hard to recycle.

As such, there’s an urgent need to significantly improve this process, which is exactly what we set out to do with our Artificial Intelligence (AI) based drug discovery technology: Zeptomics. One of the sub-components of Zeptomics, Zepto.Hit, is a computer algorithm that specializes in predicting which chemical compounds will bind to which drug targets.

While other computer algorithms exist that aim to achieve similar tasks, Zepto.Hit is unique in three ways: Firstly, while Zepto.Hit is an AI-based tool, it does not require new data to make predictions, nor is it restricted to finding hits that are based on the chemical similarity to other, known hits, unlike current so-called “ligand-based” virtual screening algorithms. Zepto.Hit achieves this by being pre-trained on an extremely large, varied dataset of hits and non-hits for a wide variety of drug targets. In this sense, Zepto.Hit is similar to “structure-based” virtual screening algorithms, such as “protein-ligand docking” methods.

Secondly, Zepto.Hit is extremely fast. Currently, “structure-based” screening algorithms such as Autodock Vina are able to predict the binding affinity of around 10 compounds on one target in one hour on modern computer hardware. While progress has been made to improve the speed of these algorithms, the speedups often come at a cost of predictive performance. For example, MIT’s EquiBind, is reported to be able to make 180,000 predictions per hour, but the recommended methodology for best predictive performance reduces this speed to just 24 predictions per hour. On similar hardware, Zepto.Hit is capable of making around 5.3 million predictions per hour. Thanks to this speed, it becomes feasible to not just evaluate very large parts of the chemical search space, but also to evaluate possible off-target effects of drugs that could both be beneficial or detrimental in a clinical setting.

Finally, Zepto.Hit achieves state-of-the-art predictive results on both internal and publicly available benchmarks, such as the Directory of Useful Decoys. Indeed, because Zepto.Hit is constantly being improved , we consistently evaluate its performance. One of the ways we execute these evaluations is “In Silico”. Here, we train our Zepto.Hit on a given dataset (the “train split”), and test its performance on a new dataset that the model has not seen yet (the “test split”). This split is made in such a way that a subset of chemical compounds and targets are never seen by the model, meaning that we can evaluate how well the model will do both in known and new chemical search space and on seen and unseen targets.

The chance of finding a hit by randomly selecting a compound varies depending on the protein in question, but usually ranges anywhere from 0.01% to 0.00001%.

When evaluating drug targets that Zepto.Hit has some information about (both hits and non-hits are known), the top 100 compound predicted by Zepto.Hit will, on average, contain 75 hits (a 75% hit rate), depicted as the dark blue line in the graph below. The bottom quintile performance on known targets is 49 hits out of 100 predictions (red line) and a top quintile performance of 94 hits out of 100 predictions (green line). For some of our top performing known targets, such as the Isoform 1 of Serine/Threonine-Protein Kinase mTOR, Zepto.Hit finds 99 out of 100 possible hits. Furthermore, in our own internal benchmarks, Zepto.Hit consistently outperforms state of the art ligand-based virtual screenings in predictive performance by very wide margins.

As mentioned before, Zepto.Hit does not need to have prior information about hits/non hits to be able to make predictions for a given target. On drug targets that are unknown to Zepto.Hit, 25 of the 100 top compounds predicted by Zepto.Hit will actually be hits, with a bottom quintile performance of 4 hits out of 100, and a top quintile performance of 54 hits out of 100.

As far as we know, we demonstrate that Zepto.Hit is the first algorithm capable of outperforming both structure and ligand-based virtual screening algorithms both in terms of capability, speed, and predictive performance.

Get in touch !