What Is Patient Stratification in Clinical Trials?
Patient stratification divides trial populations by baseline biomarkers or characteristics so treatment effects in responsive subgroups are measurable rather than diluted. Predictive biomarkers justify enrichment designs; prognostic biomarkers forecast outcomes regardless of treatment. Motif surfaces published predictive and prognostic evidence with PMIDs, population modifiers, and conflicting associations before protocols lock enrollment criteria.
TL;DR: Patient Stratification in Clinical Trials
- Predictive biomarkers require trial designs that validate treatment benefit in biomarker-defined subgroups (Mandrekar & Sargent, 2009)
- Enrichment strategies must match biomarker credentials; not every marker justifies a selected population (Freidlin & Korn, 2014)
- Prognostic and predictive biomarkers answer different questions; mixing them breaks enrichment logic (Buyse et al., 2011)
- Simon (2005) and Simon (2013) outline phased development for genomic classifiers before they influence enrollment
- Master protocols can test multiple therapies under shared stratification rules (Woodcock & LaVange, 2017; Liu et al., 2021)
- Motif surfaces literature evidence with PMIDs; enrollment and biostatistical planning remain yours
From the Motif team: Last reviewed June 2026. Stratification criteria should be locked from cited predictive evidence, not from a single discovery abstract. Motif extracts predictive and prognostic associations with interaction statistics, population modifiers, and conflicting PMIDs before your protocol finalizes inclusion rules.
Patient stratification in clinical trials divides enrolled populations by baseline characteristics, often biomarkers, so treatment effects in responsive subgroups are measurable rather than diluted across non-responders. Before writing enrichment criteria, teams need published evidence on which biomarkers predict response or prognosis in their indication, not a generic list of marker names.
Clinical trials fail for many reasons, but heterogeneous populations hide true treatment effects when only a subgroup benefits. Simon (2005) outlined rigorous development and validation pathways for genomic classifiers intended to guide treatment selection.1 Stratification without that evidence risks underpowered subgroup analyses and non-reproducible labels.
Why Heterogeneity Dilutes Treatment Effects
When trials enroll broad populations, true treatment effects in biomarker-positive patients can be statistically invisible among non-responders. Mandrekar and Sargent (2009) review clinical trial designs for predictive biomarker validation, including enrichment, stratified, and hybrid strategies.2 The design must match the biomarker claim you intend to make at the end of the study.
Freidlin and Korn (2014) argue that enrichment strategies must align with how strong the biomarker evidence is at the start of the program.3 Weak discovery evidence should not automatically become a hard inclusion criterion without a validation plan and prespecified interaction analysis.
FDA-NIH BEST defines predictive biomarkers as those that identify individuals more likely to benefit from a specific treatment (FDA-NIH, 2016).4 Prognostic biomarkers describe outcome regardless of treatment. Enrichment for a predictive claim requires evidence of treatment-by-biomarker interaction, not merely correlation with prognosis.
Core idea: Stratification design follows the biomarker category. Prognostic enrichment and predictive enrichment answer different statistical questions.
Predictive vs. Prognostic: Do Not Mix Them Up
Buyse et al. (2011) explain why biomarker validation is statistically hard: prognostic markers describe outcome regardless of treatment, while predictive markers modify treatment benefit.5 A marker that predicts poor survival in all patients is not automatically a companion diagnostic for a specific drug.
Simon (2013) stresses that predictive biomarkers used for trial enrichment need pre-specified cutoffs validated in independent cohorts before they drive registration strategy.6 Data-driven threshold tuning on the same dataset that estimates treatment interaction repeats the overfitting problem.
Read our blog on personalized medicine biomarker analysis for the broader precision-medicine landscape and our blog on immunotherapy biomarkers for checkpoint inhibitor predictors.
Enrichment and Stratified Designs
Enrichment trials restrict enrollment to biomarker-positive patients when prior evidence supports larger treatment effects in that subgroup. Sample size can decrease when effect size increases, but generalizability to biomarker-negative patients is intentionally limited.
Stratified designs randomize within biomarker-defined strata and test interaction between biomarker status and treatment. Mandrekar and Sargent (2009) compare when each approach is appropriate.2 Stratified designs preserve information about biomarker-negative patients but require larger total enrollment.
Hybrid designs enroll all comers but pre-specify subgroup analyses with multiplicity control. They suit markers with moderate predictive evidence that still needs prospective confirmation.
Chen et al. (2024) address misclassification of biomarker status in stratified trials with survival endpoints, showing that assay error biases treatment-effect estimates.7 Literature review should note how often papers report imperfect classification; your trial still needs analytical validation on the locked assay.
Sample Size and Statistical Planning
Enrichment reduces required sample size only when the biomarker truly identifies a subgroup with larger treatment benefit and prevalence is known with reasonable precision. Underpowered subgroup analyses are a common post-hoc failure mode.
Pre-specify the primary analysis population (intent-to-treat vs biomarker-positive only), interaction test, and multiplicity strategy before first patient in. Adaptive designs may allow sample size re-estimation but should not be used to rescue a biomarker hypothesis invented after unblinded looks.
For regulatory context on predictive claims, read our blog on FDA biomarker validation and our blog on biomarker discovery and validation.
Adaptive and Platform Trials
Berry (2012) reviewed adaptive clinical trial methods that allow pre-specified design changes based on accumulating data.8 Adaptive enrichment, where enrollment focuses on a responding subgroup after an interim look, requires careful control of type I error and transparent pre-registration.
Woodcock and LaVange (2017) describe master protocols that study multiple therapies or diseases under shared infrastructure.9 Lung-MAP is a published biomarker-driven platform in thoracic oncology (Liu et al., 2021).10 Platform trials share screening assays and stratification rules across substudies, which demands locked analytical methods early.
Master protocols accelerate biomarker-defined substudies but concentrate operational risk: a flawed central assay affects every arm.
Historical Examples of Biomarker-Guided Trials
HER2 amplification testing became standard enrichment for trastuzumab development in breast cancer. EGFR mutation testing selects non-small cell lung cancer patients for EGFR tyrosine kinase inhibitors. BRCA mutation status guides PARP inhibitor trials in ovarian and breast cancer. Each example required locked assays, pre-specified cutoffs, and treatment interaction evidence before enrichment drove registration labels.
Immunotherapy programs added PD-L1 expression and tumor mutational burden as exploratory then confirmatory stratifiers, with platform-specific scoring differences that still complicate cross-trial comparison. Literature review must tag assay clone, scoring algorithm, and tumor proportion score rules before pooling PMIDs for protocol background.
Negative examples matter: biomarkers that failed prospective validation after promising retrospective signals slowed programs and wasted enrollment. Motif surfacing null interaction tests helps teams avoid repeating discredited enrichment criteria.
Companion Diagnostics and Stratification
When stratification determines access to a targeted therapy, the assay often becomes a companion diagnostic co-developed with the drug. Analytical validity, clinical validity linked to treatment benefit, and labeling alignment are mandatory beyond literature support (Amur et al., 2011).11
PD-L1 immunohistochemistry, EGFR mutation testing, and HER2 amplification illustrate indication-specific cutoffs and platforms. Pooling literature that used incompatible assay versions or scoring rules produces misleading enrichment assumptions.
For protein and genomic assay evidence, see our blog on protein biomarkers and blog on genomic biomarkers in cancer therapy.
Literature Evidence Before the Protocol Locks
Motif is built for the evidence-scoping step, not patient enrollment:
- Ask a stratification question in plain language (e.g., predictive biomarkers for checkpoint inhibitors in a tumor type)
- Search PubMed, PMC, and Europe PMC; audit title-and-abstract screening in search provenance
- Extract predictive and prognostic associations with effect sizes, comparators, and interaction p-values
- Inspect modifiers (stage, line of therapy, molecular subtype, cohort identifier)
- Detect flips when the same biomarker shows opposing effects in different strata
- Export cited associations for protocol background or statistical analysis plan sections
Failure modes we see in stratification workflows:
- Using a prognostic literature base to justify a predictive enrichment criterion
- Ignoring papers that report null interaction tests
- Pooling studies that used incompatible assay cutoffs for PD-L1 or tumor mutational burden
- Assuming Motif output replaces sample-size calculation with your biostatistician
- Locking inclusion criteria from a single discovery cohort without external replication PMIDs
- Treating cross-validation AUC in a training set as predictive validation for trial design
Motif surfaces stratification evidence from literature with PMIDs; it does not enroll patients or run clinical decision support.
Regulatory and Diagnostic Coordination
Companion diagnostics and biomarker qualification programs require analytical and clinical validity evidence beyond literature review. FDA-NIH BEST definitions separate analytical validity, clinical validity, and clinical utility (FDA-NIH, 2016).4 Literature mining maps what is already published; it does not replace assay validation or IDE/IVD strategy.
Clinical utility for a predictive enrichment strategy typically requires demonstrating that biomarker-guided treatment improves outcomes versus unselected treatment or standard care. Utility trials are often larger and slower than analytical validation; early literature scoping prevents investing in the wrong marker.
Surrogate endpoints can support accelerated approval in some oncology settings, but surrogacy is not the same as predictive enrichment. Buyse et al. (2011) distinguish statistical validation of surrogates from predictive biomarker validation for treatment selection.5 Protocol teams should not substitute progression-free hazard ratios for predictive interaction evidence when the label claim is companion diagnostic selection.
Scoping Stratification Evidence with Motif
Before biostatisticians finalize sample size, literature should answer:
- How many independent cohorts report treatment-by-biomarker interaction in the intended line of therapy?
- What assay platforms and cutoffs did those cohorts use?
- What is biomarker prevalence in the target enrollment population?
- Which PMIDs report null or negative interaction tests?
Motif exports cited answers for protocol and SAP appendices. Enrollment, assay lock, and regulatory strategy remain your team's responsibility.
Related Articles
- FDA biomarker validation: qualification vs companion diagnostic pathways
- Machine learning in biomarker validation: algorithmic scores and external validation before enrichment
- Target identification and validation: mechanism and biomarker alignment before stratification
Frequently Asked Questions
What is patient stratification in clinical trials?
Patient stratification divides trial populations by baseline characteristics, often biomarkers, so treatment effects in defined subgroups can be measured reliably. Designs include enrichment (biomarker-positive only), stratified randomization, and hybrid all-comer trials with pre-specified subgroup analyses (Mandrekar & Sargent, 2009).
What is the difference between prognostic and predictive biomarkers in trials?
Prognostic biomarkers predict outcome regardless of treatment. Predictive biomarkers identify patients more likely to benefit from a specific therapy. Enrichment for a predictive claim requires treatment-by-biomarker interaction evidence, not prognosis alone (Buyse et al., 2011; FDA-NIH, 2016).
When should trials use biomarker enrichment?
Enrichment suits situations where prior evidence supports larger treatment effects in a biomarker-defined subgroup and the development goal focuses on that subgroup's label. Freidlin and Korn (2014) warn that enrichment must match biomarker credentials; weak discovery signals need validation plans, not automatic hard cutoffs.
What are master protocol trials?
Master protocols test multiple therapies or diseases under shared infrastructure with common stratification and screening rules. Lung-MAP in thoracic oncology is a biomarker-driven example (Woodcock & LaVange, 2017; Liu et al., 2021). They accelerate substudies but require early assay lock.
How does literature review support stratification design?
Before locking inclusion criteria, teams should map predictive and prognostic PMIDs, interaction statistics, assay platforms, and conflicting cohort results. Motif extracts cited associations with population modifiers for protocol and statistical analysis plan background sections.
Does patient stratification require a companion diagnostic?
When therapy access depends on biomarker status, a companion diagnostic with co-developed analytical and clinical validity evidence is typically required. Literature support alone does not satisfy IVD or CDx regulatory requirements (Amur et al., 2011).
References
- Simon, R. (2005). Roadmap for developing and validating therapeutically relevant genomic classifiers. Journal of Clinical Oncology, 23(29), 7332-7341. PMID: 16145063
- Mandrekar, S.J., & Sargent, D.J. (2009). Clinical trial designs for predictive biomarker validation. Journal of Clinical Oncology, 27(24), 4027-4034. PMID: 19597023
- Freidlin, B., & Korn, E.L. (2014). Biomarker enrichment strategies: matching trial design to biomarker credentials. Nature Reviews Clinical Oncology, 11(2), 81-90. PMID: 24281059
- FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. PMID: 27010052
- Buyse, M., et al. (2011). Biomarkers and surrogate end points: the challenge of statistical validation. Nature Reviews Clinical Oncology, 7(6), 309-317. PMID: 20368571
- Simon, R.M. (2013). Genomic biomarkers in predictive medicine: an interim analysis. EMBO Molecular Medicine, 5(6), 813-818. PMID: 23818349
- Chen, Y., et al. (2024). Two-stage stratified designs with survival outcomes and adjustment for misclassification in predictive biomarkers. Statistics in Medicine, 43(10), 1048-1063. PMID: 38634277
- Berry, D.A. (2012). Adaptive clinical trials in oncology. Nature Reviews Clinical Oncology, 9(4), 199-207. PMID: 22064461
- Woodcock, J., & LaVange, L.M. (2017). Master protocols to study multiple therapies, multiple diseases, or both. New England Journal of Medicine, 377(1), 62-70. PMID: 28679092
- Liu, S.V., et al. (2021). The National Cancer Institute thoracic malignancies steering committee lung master protocol (Lung-MAP study). Clinical Cancer Research, 27(1), 4-11. PMID: 33037066
- Amur, S., et al. (2011). Biomarker qualification: toward a multiple stakeholder framework. Clinical Pharmacology & Therapeutics, 89(3), 393-401. PMID: 21270794



