Best practices for AI and ML use across the drug discovery and development lifecycle

As AI becomes part of the fabric of our personal and professional lives, its use continues to reveal both its potential and limitations. The speed and accuracy of drug discovery and development are being transformed by AI and machine learning (ML) applications. AI and life science companies alike need to conduct due diligence to ensure optimal results and avoid the pitfalls of poor data sources and an over-reliance on technology’s capabilities.

At Clarivate, a dedicated AI/ML team has been delivering innovative AI solutions across the company’s portfolio for over five years. Leveraging billions of proprietary best-in-class, expertly curated data assets and cross-departmental collaborations, our solutions support multiple use cases, from the use of generative AI (GenAI) for natural language queries of life science data to predictive tools identifying the likelihood of success for deals, clinical trials, drug approvals and more.

Interviews with experts across Clarivate surfaced a list of best practices for the use of AI/ML in drug discovery and development. Briefly covered in our Companies to Watch 2023 report, which spotlights seven innovators changing the drug discovery and development paradigm, we will delve deeper into these best practices here.

Data quality is paramount: garbage in, garbage out

The overwhelming task of curating, connecting and gleaning intelligence from many disparate data assets remains out of reach for many life science companies. Although AI promises to streamline this process, the outputs are only as reliable as the incoming data. Poor quality data can result in an incorrectly chosen target, biased information that has limited relevance for many populations or decisions based on outdated information.

Anyone using a data set, whether in-house, in-licensed or as a partnership, has some responsibility for the quality of the incoming data: from establishing good data governance practices, knowing the data provenance and understanding how data are cleaned and harmonized to continuously checking models for bias and performing quality checks, especially as new data come in.

“High-quality data for AI models is key to achieving high-quality insights. At Clarivate, we have rigorous quality control procedures, and every data point is ultimately overseen by a human expert, even when an AI technology has curated or cleaned the data.”

Ketan Patel, Vice President, Cortellis Product Platform, Clarivate

Transparency around AI/ML outputs can shift perceptions around the technology

Many users of AI systems view the technology as a black box. To instill greater confidence in the outputs, AI developers can provide additional context around the decision-making processes. Although it remains nearly impossible to describe the specific calculations and permutations undertaken by a system to reach a decision, the following information is useful for users to judge the results:

  • Strengths and limitations of the data sets
  • Weighting of the data
  • Specific purpose of the model
  • Constraints placed on the model
  • Assumptions inherent in the model
  • Validation processes

For example, Cortellis Drug Timeline and Success Rates enhances the transparency around its predictions using success indicators. Drug Timeline and Success Rates predicts the likelihood and timing of competing drug launches in the United States, Europe and Asia using historical data, statistical modeling and ML-based predictive analytics. Its predictions can be used to validate internal life science company predictions about an asset’s success and determine if assets are being under- or oversold. The predictions also inform merger and acquisition decisions, by providing an unbiased assessment of which drugs are likely to make it to market.

The success indicators comprise 12 groups of the tool’s more than 100 traits that predict both the timeline and success. These traits include whether the mechanism of action is known, the use of biomarkers to select the molecule, the company’s history with successful drug launches, history of clinical trials for that molecule and more. The tool then visualizes which indicators are positively, negatively or neutrally affecting a prediction, providing visibility into how the forecast was obtained.



“We have taken a very deliberate approach to our AI technologies, by designing safety, security and truth at its center. Not only do we use information that we know to be accurate and true but we also provide perfect traceability of that data so our customers are not burdened with verifying that an answer is correct.”

Hassan Malik, Senior Vice President, Advanced Analytics and Search, Clarivate

Close collaboration between data scientists, therapeutic area and compliance experts provides the full view needed to develop effective and reliable models

Generalist platforms relying solely on data scientists often lack the granular life science insights needed to take an asset from discovery to market. Combining domain expertise with technical know-how to design AI/ML models and inform algorithm decisions produces meaningful outputs, such as those provided by the new Clarivate GenAI-driven search platform for life sciences.

A large-scale, company-wide knowledge graph established and refined by our dedicated AI/ML team of data scientists, industry experts, therapeutic area experts and clients underpins the search platform. Expert input incorporated during the design phase enables the platform to place complex, natural language questions within the correct context and return appropriate, understandable GenAI-derived summaries. Suitable for drug discovery, preclinical, clinical, regulatory affairs and portfolio strategy teams, the platform draws from a wide range of validated, traceable Clarivate data sets, as well as our people’s expertise and understanding of what matters to our customers, partners and investors.

“At Clarivate, we have uniquely approached AI development with our dedicated team of data scientists and scientific and industry experts regularly meeting over the last six years to collectively develop our AI solutions. This convergence of a deep understanding of the complexities of AI algorithms and statistical modeling with extensive life science knowledge allows us to rapidly implement AI tools that solve real problems for our customers.”

Romeo Radman, Vice President, Life Sciences & Healthcare, Product and Strategy, Clarivate

“Real” intelligence fills the artificial intelligence gaps

Rather than completely replacing conventional methods or wisdom, AI/ML tools create new efficiencies and accelerate understanding and decisions. A human-machine partnership takes advantage of the strengths of both. Many of the Clarivate AI-powered solutions, including Cortellis Deals Intelligence, follow a workflow involving human curation of AI-generated data to ensure the highest quality, accurate outputs.

In the image below showing our pipeline for target identification for drug repurposing, the disease-centric approach uses algorithms to extract knowledge from our Cortellis Drug Discovery Intelligence™ and MetaBase™ databases, as well as related content across the Cortellis discovery and preclinical tools and key data from publicly available sources. ML processes identify a list of prioritized targets that are reviewed by Clarivate experts for a refined list of recommended targets. Manual mechanism reconstruction by our experts contributes to the final list of prioritized drug target candidates for the indication of interest and the relevant drugs for repositioning opportunities. The AI-human expert team produces a report for each prioritized target that outlines the supporting evidence and details about the drugs that modulate its activity: mechanism of action, pathways in which it is involved, highest clinical trial phase the drug has ever reached and known adverse effects.


Source: Clarivate

“Domain expertise provides the wherewithal to validate the outputs of AI, and I believe this is what gives Clarivate the advantage over more generalist AI companies. Our AI solutions, paired with our robust, extensive data sets, accomplish 80% of the intelligence gathering, leaving our subject matter experts free to add value to the remaining 20%.”

Scott Tatro, VP, Management Partner, Clarivate Consulting Technology



Read our recent report on seven innovative AI/ML companies to watch here. The top deal makers and the top innovators across pharma, biotech and medtech trust our intelligence to inform their portfolio and investment strategies. Learn more about how Clarivate supports life science companies around the world with the development and commercialization of life-saving treatments using AI-powered solutions across the Clarivate portfolio.