TL;DR: Target Identification & Validation
- Only a fraction of human genes are considered druggable with current modalities (Hopkins & Groom, 2002)
- Genetic support for drug targets improves odds of clinical success (Nelson et al., 2015)
- Industry pipelines still show high attrition despite more data (Scannell et al., 2012; Cook et al., 2014)
- Literature maps gene-disease-drug links; Open Targets and ChEMBL add curated context
- Motif extracts associations with PMIDs; genetic validation and chemistry sit outside the platform
From the Motif team: Target hypotheses often start in published evidence. Motif extracts gene-disease-drug associations from PubMed, PMC, and Europe PMC, maps 69 biomedical entity types across 41 relationship types, and cross-references UniProt, Open Targets, ChEMBL, and others. In silico modeling and lab validation sit outside the platform.
Target identification still determines whether a drug program succeeds or stalls. Hopkins and Groom (2002) estimated that only a subset of the human proteome is druggable with conventional small-molecule approaches.1 Most teams combine genetic evidence, pathway databases, and published pharmacology before committing to a lead series.
What "Validated Target" Actually Means
Nelson et al. (2015) estimated that selecting genetically supported targets could double clinical development success rates.2 Genetic evidence and published pharmacology should point in the same direction before portfolio investment.
Cook et al. (2014) analyzed AstraZeneca's pipeline and found that target validation quality, exposure at the site of action, and safety margins were among factors linked to project outcomes.3 A compelling abstract is not the same as a validated target.
Scannell et al. (2012) documented rising R&D costs and falling productivity in drug discovery despite technological advances.4 Better literature triage does not remove attrition; it helps teams kill weak hypotheses earlier.
Literature-First Target Triage
Before ordering CRISPR screens or chemistry, teams need a cited map of what published studies already report about a gene-disease-drug triangle.
- Search: Plain-language questions become MeSH-aware queries against PubMed, PMC, and Europe PMC. Search provenance records what was screened out at title and abstract.
- Extract: Gene-disease and gene-drug associations include effect direction, study design, and PMIDs. Failed clinical programs appear when papers report negative outcomes.
- Cross-reference: Genes resolve to UniProt and Open Targets; compounds to ChEMBL and PharmGKB. Mismatched IDs are a common source of false confidence.
- Compare cohorts: Discovery and validation labels separate when authors report independent replication.
- Export: Cited evidence tables feed target review meetings, TPP drafts, or grant preliminary data sections.
Failure modes we see:
- Promoting a target from one underpowered study without checking conflicting literature
- Ignoring drug-target associations that papers report as failed in clinic
- Skipping cross-reference panels because the abstract sounds compelling
- Conflating biomarker association with druggability of the same gene product
- Missing recent PubMed papers that have not yet entered curated databases
Open Targets and similar resources aggregate genetic and functional data, but publication lag means recent bench findings may appear in PubMed before they reach curated panels. Motif is strongest at closing that gap from papers to structured associations.
Validation After Literature
Literature evidence is hypothesis-generating. Typical next steps sit outside any search platform:
- Genetic validation (Mendelian disease, GWAS colocalization, or perturbation in relevant cell types)
- Tool compound or biologic proof-of-mechanism in disease-relevant models
- Developability and safety screens before lead optimization
- Biomarker strategy aligned to mechanism and patient selection
Ioannidis et al. (2009) attempted to reproduce published microarray analyses and found that data and methods were often unavailable.5 Treat literature target claims as hypotheses until independent experiments confirm them.
Target hypotheses often start in the literature. Motif's discovery pipeline extracts gene-disease-drug associations with PMIDs and cross-references against UniProt, Open Targets, ChEMBL, and other sources. Read our blog on literature review automation to learn more about scoping the evidence base.
References
- Hopkins, A.L., & Groom, C.R. (2002). The druggable genome. Nature Reviews Drug Discovery, 1(9), 727-730. PMID: 12209152
- Nelson, M.R., et al. (2015). The support of human genetic evidence for approved drug indications. Nat Genet, 47(8), 856-860. PMID: 26121088
- Cook, D., et al. (2014). Lessons learned from the fate of AstraZeneca's drug pipeline. Nat Rev Drug Discov, 13(6), 419-431. PMID: 24833294
- Scannell, J.W., et al. (2012). Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov, 11(3), 191-200. PMID: 22378269
- Ioannidis, J.P., et al. (2009). Repeatability of published microarray gene expression analyses. Nature Genetics, 41(2), 149-155. PMID: 19174838



