TL;DR: What Actually Works
- Most biomarker candidates never reach routine clinical use; the bottleneck is validation, not discovery (Poste, 2011)
- Successful programs separate analytical validity, clinical validity, and clinical utility (FDA-NIH BEST, 2016)
- Literature mining can scope candidates and spot replication gaps before wet-lab work; validation still happens in the lab and clinic
- Pre-specified analysis plans and external validation cohorts reduce false discovery (Ou et al., 2021)
- Misclassified biomarker status in trials can bias survival analyses; statistical adjustment matters (Chen et al., 2024)
- REMARK and PRoBE frameworks separate discovery claims from prospective validation design (McShane et al., 2005; Pepe et al., 2008)
- Predictive enrichment trials need pre-specified cutoffs and control arms (Simon, 2013)
Note: Performance requirements and regulatory standards vary by biomarker type, indication, and jurisdiction. Always consult current FDA guidance documents and relevant regulatory authorities for your specific use case.
From the Motif team: Motif handles the literature-discovery stage: PubMed/PMC/Europe PMC search, association extraction across 69 biomedical entity types, cross-referencing against 50+ databases, and GRADE-adapted evidence scoring. Analytical validation, clinical validation, and regulatory submission remain your responsibility.
Most Candidates Stall in Validation, Not Discovery
Omics screens produce hundreds to thousands of candidate markers, but only a small fraction ever influence patient care (Poste, 2011). The limiting step is not finding interesting biology. It is proving that a marker measures reliably, predicts the right outcome, and changes decisions in a way that helps patients.
Teams that succeed treat literature evidence, analytical work, and clinical studies as linked stages with different evidence standards. Teams that fail often treat a promising discovery cohort as if it were validation.
Where Literature Mining Fits (and Where Motif Stops)
Before ordering assays or opening a trial, you need a scoped map of what published studies already report: which markers, which populations, which comparators, and whether discovery and validation cohorts are independent.
In Motif, that workflow typically runs like this:
- Search: A plain-language objective becomes MeSH-aware boolean queries against PubMed, PMC, and Europe PMC. Search provenance records per-database counts and what was screened out at title and abstract.
- Extract: Full text becomes structured association sentences with effect sizes, study design, and GRADE-adapted certainty tiers. Discovery and validation cohorts appear as separate associations when papers report them.
- Cross-reference: Genes, variants, and drugs resolve to external databases (UniProt, ClinVar, gnomAD, ChEMBL, Open Targets, and others) so you can see whether a literature claim aligns with curated records.
- Score gaps: When three or more studies report comparable effect sizes, pooled estimates and forest plots help you see whether evidence converges or conflicts.
Common failure modes we see in user workflows:
- Treating a single discovery paper as validation because the abstract sounds confident
- Ignoring population modifiers (stage, line of therapy, molecular subtype) that explain conflicting cohort results
- Skipping comparator fields on predictive claims, then wondering why enrichment criteria do not replicate
- Exporting a narrative without checking whether PMIDs in the Word file match the associations you plan to cite in a protocol
Motif does not run assays, enroll patients, or submit INDs. It compresses the evidence-scoping phase so wet-lab and clinical teams start with a cited baseline instead of a blank PubMed search.
Frameworks That Separate Discovery From Validation
McShane et al. (2005) published REMARK reporting recommendations so tumor-marker prognostic studies state cohort, assay, and analysis plans clearly enough to compare across papers.1 Pepe et al. (2008) introduced the PRoBE design for prospective specimen collection before outcome ascertainment, reducing retrospective bias in diagnostic development.2
Pepe et al. (2001) outline phased biomarker development from preclinical promise through population impact.3 Teams that treat a retrospective omics screen as phase 3 evidence routinely stall at regulatory or clinical adoption gates.
Simon (2013) reviews adaptive and biomarker-driven trial designs for predictive markers, emphasizing pre-specified cutoffs and control-group comparators.4 Literature mining can surface whether a marker was validated with a locked threshold or tuned on the same cohort that reported success.
Phase 1: Analytical Validity
Analytical validity asks whether the assay measures what you claim, with acceptable precision and reproducibility. CLSI precision studies (EP05-A3) remain the reference framework for quantitative assays (CLSI, 2014).
Ou et al. (2021) emphasize that analytical plans should be written before data arrive: outcomes, success criteria, and handling of batch effects should be fixed in advance.5 Assays that work in one lab but not in a second site are a routine reason programs stall here.
Phase 2: Clinical Validity
Clinical validity links the marker to the clinical endpoint you care about (diagnosis, prognosis, or treatment response). The FDA-NIH BEST glossary separates analytical validity, clinical validity, and clinical utility so teams do not conflate them (FDA-NIH, 2016).6
For diagnostic tests, FDA statistical guidance for reporting study results describes how sensitivity, specificity, and study design should be presented (FDA, 2007). Thresholds depend on indication and intended use; there is no universal cutoff that applies to every biomarker type.
Davis et al. (2020) outline how difficult clinical validation remains even when biology is plausible, using pain biomarkers as an example of endpoint and cohort challenges.7
External validation sample sizes need their own planning. Riley et al. (2024) provide methods for calculating how many participants an external validation study requires.8
Replication and Reproducibility
Published biomarker associations often fail to reproduce in independent datasets. Ioannidis et al. (2009) attempted to re-analyze results from 18 microarray papers and could fully reproduce only two analyses in principle; ten could not be reproduced, often because data or methods were unavailable.9 Treat literature claims as hypotheses until independent cohorts confirm them.
Chen et al. (2024) address another underappreciated issue: misclassification of biomarker status in stratified trials can bias treatment-effect estimates for survival endpoints.10 Literature review alone will not catch assay misclassification in your own cohort; it can surface how often papers report imperfect classification.
Phase 3: Clinical Utility
Clinical utility asks whether using the marker improves outcomes or decisions compared with usual care. A marker can be analytically sound and clinically valid yet fail utility if it does not change management or survival.
FDA-NIH BEST defines clinical utility separately from validity (FDA-NIH, 2016).6 Health-technology assessments often require utility evidence beyond diagnostic accuracy; literature review should note whether papers report decision impact or only AUC and hazard ratios.
Issa et al. (2017) reviewed coverage and reimbursement challenges for genomic and companion diagnostic tests, noting that payer evidence standards often require utility and economic data beyond analytical validity.11 Validation planning should name the decision maker (clinician, trialist, payer) your evidence must convince.
Qualification vs. Validation
Scientific validation (peer-reviewed evidence in defined populations) is not the same as regulatory qualification (FDA acceptance for a specific context of use). You can have strong publications without a qualification letter, and qualification without full cross-population validation.
Wong et al. (2019) report that oncology likelihood of approval was 3.4% in their aggregate clinical-trial registry sample from 2000 to 2015.12 Biomarkers can improve trial design, but they do not remove the need for well-powered studies and appropriate endpoints.
What to Do Next
For the literature-discovery stage, Motif searches PubMed and related sources, extracts structured associations, cross-references biomedical entities against 50+ databases, and scores evidence. Analytical and clinical validation remain your responsibility in the lab and clinic. Read our blog on FDA biomarker validation and our blog on AI in biomarker discovery to learn more.
Validation Checklist Before Wet-Lab Spend
- Map discovery versus validation cohorts per PMID; flag single-site retrospective series
- Record assay platform, specimen type, and LOD when papers report them
- Cross-reference genes and variants to ClinVar and gnomAD before assuming rarity implies tumor specificity
- Check whether predictive claims include comparator treatment and pre-specified cutoff (Simon, 2013)
- Plan external validation sample size with published methods (Riley et al., 2024)
- Separate analytical, validity, and utility evidence in your protocol sections
References
- McShane, L.M., et al. (2005). Reporting recommendations for tumor marker prognostic studies (REMARK). Journal of the National Cancer Institute, 97(16), 1180-1184. PMID: 16106022
- Pepe, M.S., et al. (2008). Phases of biomarker development for early detection of cancer. Clinical Trials, 5(6), 603-614. PMID: 18840817
- Pepe, M.S., et al. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute, 93(14), 1054-1061. PMID: 11459867
- Simon, R.M. (2013). Genomic biomarkers in predictive medicine: an interim analysis. EMBO Molecular Medicine, 5(6), 813-818. PMID: 23818349
- Ou, F.S., et al. (2021). Biomarker Discovery and Validation: Statistical Considerations. Journal of Thoracic Oncology, 16(4), 537-545. PMID: 33545385
- FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. PMID: 27010052
- Davis, K.D., et al. (2020). Discovery and validation of biomarkers to aid the development of safe and effective pain therapeutics. Nature Reviews Neurology, 16(7), 381-400. PMID: 32541893
- Riley, R.D., et al. (2024). Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ, 384, e074819. PMID: 38253388
- Ioannidis, J.P., et al. (2009). Repeatability of published microarray gene expression analyses. Nature Genetics, 41(2), 149-155. PMID: 19174838
- Chen, Y., et al. (2024). Two-stage stratified designs with survival outcomes and adjustment for misclassification in predictive biomarkers. Statistics in Medicine, 43(10), 1048-1063. PMID: 38634277
- Issa, A.M., et al. (2017). Coverage and reimbursement of genomic tests. Journal of Managed Care & Specialty Pharmacy, 23(3), 294-300. PMID: 28472596
- Wong, C.H., et al. (2019). Estimation of clinical trial success rates and related parameters. Biostatistics, 20(2), 273-286. PMID: 29394327
- CLSI. (2014). EP05-A3: Evaluation of Precision of Quantitative Measurement Procedures; Approved Guideline, Third Edition. Clinical and Laboratory Standards Institute.
- FDA. (2007). Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests. FDA-2007-D-0369.
- Poste, G. (2011). Bring on the biomarkers. Nature, 469(7329), 156-157. DOI: 10.1038/469156a



