UK Biobank
Study design and population
For primary discovery, we conducted an observational cohort study leveraging the UK Biobank (Fig. 1a)—a population-based cohort of over 500,000 participants aged 37–73 years between 2006 and 2010, with ongoing follow-up. Of these, 194,072 were under age 55 years at baseline and received a physical examination and a questionnaire about sociodemographic, lifestyle and health information, and provided blood samples46. Blood samples and other biospecimens were collected across assessment centers using standardized protocols and processed centrally with uniform quality control procedures47,48. Details on UK Biobank data collection are available online (www.ukbiobank.ac.uk).
Assessment of systemic aging
PhenoAge15, validated across multi-ethnic cohorts49, is a mortality and morbidity trained composite measure of biological aging profile based on chronological age and nine blood biochemistry measurements (albumin, alkaline phosphatase, creatinine, C-reactive protein, glucose, mean cell volume, erythrocyte distribution width, leukocyte count and lymphocyte ratio). PhenoAge was derived in two steps. First, using data from 9,926 adults in National Health and Nutrition Examination Survey III, a Cox penalized regression model was applied to select a parsimonious set of nine measures plus chronological age from 42 candidate clinical measures that jointly predicted all-cause mortality. Second, these selected variables were entered into a parametric proportional hazards model assuming a Gompertz mortality distribution to estimate each participant’s 10-year mortality risk. Finally, this predicted risk was translated into units of years by identifying the chronological age in the reference population that corresponds to the same 10-year mortality risk. This equivalent age is defined as PhenoAge, representing the biological age implied by a person’s mortality risk profile rather than their actual years lived. Although trained on mortality, PhenoAge has been shown to correlate with morbidity49, and is therefore used here as a systemic indicator of biological aging and a mortality- and morbidity-based aging measure. In previous comparative analyses of blood chemistry-based aging measures, PhenoAge demonstrated stronger concordance with DNA methylation-based clocks than alternative clinical measurement-derived metrics, including the KDM and homeostatic dysregulation50. For example, correlations with the GrimAge methylation clock were higher for PhenoAge (r = 0.35) than for KDM (r = 0.25) or homeostatic dysregulation (r = 0.26)50. To assess the within-person variability of PhenoAge measurement, we leveraged repeated assessments available among 3,809 participants with baseline and repeat blood draws (mean interval 4.4 years, s.d. 0.8), 60% of participants in the lowest tertile and 60% in the highest tertile at baseline remained in the same tertile at follow-up (Extended Data Fig. 5).
KDM biological age19, in contrast, is trained to predict chronological age, integrating chronological age with eight biochemistry measurements (albumin, alkaline phosphatase, blood urea nitrogen, creatinine, C-reactive protein, glucose, HbA1c and total cholesterol) and two clinical measurements (systolic blood pressure and forced expiratory volume). KDM biological age was constructed using the KDM—a multivariate regression framework that estimates biological age as a weighted linear combination of measurements based on their individual associations with chronological age and their measurement variance.
In brief, both PhenoAge and KDM models were trained in the National Health and Nutrition Examination Survey III. In our analyses, we applied trained PhenoAge and KDM models to estimate biological age in the UK Biobank using the R package ‘BioAge.’ We further restricted the participants to those without missing or extreme values (>5 s.d.) for each measure. The level of age gap was determined by residuals from a linear regression of biological age against chronological age and was standardized and divided into tertiles. A total of 154,169 participants under age 55 years at baseline were included in the primary analyses.
Assessment of metabolomic aging
Metabolomic aging was estimated using a metabolomic aging score derived from nuclear magnetic resonance (NMR) metabolomics data generated by Nightingale Health for 249,616 UK Biobank participants20. In brief, out of 325 NMR metabolomic measurements, Zhang and colleagues20 performed a least absolute shrinkage and selection operator Cox regression model developed amongst 234,553 participants from England and Wales with all-cause mortality as the endpoint and selected 54 representative aging-related NMR measurements. A linear combination of 54 aging-related measurements weighted by the estimated coefficients from the model was used to produce an estimated metabolomic aging score for each participant. Metabolomic-based age gap was defined as the residual of metabolomic aging score regressed on chronological age, following an approach analogous to PhenoAge and KDM aging clocks. Higher values represent a metabolically older profile relative to a participant’s chronological age. A total of 140,373 participants under age of 55 years at baseline were included in the primary analyses.
Assessment of organ-specific aging
Organ-specific aging14 was estimated on 44,952 UK Biobank participants with plasma proteomic data available. Plasma proteomics profiling was performed using the Olink Explore 3072 platform, measuring 2,923 proteins across eight panels (cardiometabolic, cardiometabolic II, inflammation, inflammation II, neurology, neurology II, oncology and oncology II). Protein expression values were generated, normalized, and batch-corrected using the standardized UK Biobank preprocessing pipeline, including internal controls and quality control procedures, to minimize technical and center-related variation48. Organ-specific aging scores were calculated using previously published protein coefficients from Goeminne and colleagues14. In brief, Goeminne and colleagues analyzed plasma proteomic data from 53,014 UK Biobank participants and retained 44,952 participants and 2,916 proteins after filtering on missingness. Fivefold cross-validation with k-nearest neighbors imputation (k = 10) was used to train Cox proportional hazards elastic-net models to predict time-to-death, where elastic net linearly combines L1 and L2 penalties to allow simultaneous feature selection and shrinkage. Models were optimized to minimize the mean absolute error of the residuals, and detailed model performance metrics were reported in Supplementary Table 2 of ref. 14. Organ-enriched proteins were defined based on genotype-tissue expression data as those with expression levels at least fourfold higher in one organ compared with all others, and organ-specific clocks were trained using protein subsets corresponding to each organ51. Organ-specific aging was defined as the residual from an ordinary linear regression of the organ-specific predicted age (or predicted log mortality hazard) on chronological age, similar to PhenoAge. Each organ-specific aging score was estimated independently and analyzed in separate models rather than entered simultaneously into the same regression model. In addition to brain, pituitary, salivary, thyroid, esophagus, lung, heart, artery, liver, stomach, pancreas, kidney, intestine, adrenal, immune, skin, muscle and adipose aging, we included additional models for organismal aging (based on proteins expressed across several organs), multi-organ aging (based on proteins from all organs) and conventional aging (based on all available plasma proteins to predict chronological age). A total of 19,874 participants under age 55 years at baseline were included in this exploratory analysis.
Ascertainment of early-onset solid cancers
Incident cancer cases were identified through linkage to cancer registries and death records provided by the National Health Service (NHS) Information Centre and the NHS Central Register, National Records of Scotland, defined using the International Classification of Diseases, Tenth Revision (ICD-10) code. We included cancers of central nervous system (C70, C71, C72), head and neck (C00, C10-C14, C30–C32), thyroid (C73), lung (C33–C34), breast (C50), melanoma (C43), GI (C15–C26), colorectal (C18–C20), other GI (C15–C17, C21–C26), uterine (C54–C55), ovarian (C56) and prostate (C61). Complete cancer registry follow-up for all participants was available up to 31 December 2020 for England, 31 December 2016 for Wales and 30 November 2021 for Scotland. The mean (s.d.) follow-up was 6.4 (3.8) years.
Our primary outcome was incident early-onset solid cancers diagnosed between ages 18 and 55 years; later-onset cancers diagnosed after age 55 were examined secondarily. We selected 55 years as the primary cutoff for two reasons: (1) Consistency with recent epidemiologic evidence2,3, successive birth cohorts experiencing higher cancer incidence before age 50 seem to be carrying this excess risk into ages 50–54 years. Correspondingly, incidence rates among individuals younger than 55 have increased since the mid-1990s2,3,52. (2) The second reason was to maximize statistical power across cancer sites and enable analyses on subsites and organ-specific aging. Because measurements such as leukocyte count and lymphocyte ratio can vary substantially in hematological malignancies, we focused on solid tumors. For type-specific analyses, we restricted to cancer types with at least ten cases in each tertile of age gap. To assess the association between age gap and early-onset cancers, we excluded any cancer diagnosis (except nonmelanoma skin cancer) before or within 6 months of baseline, those who were underweight (body mass index (BMI) <18.5 kg m−2), or those with missing data on genetic ancestry.
Assessment of covariates
At baseline, participants self-reported age, sex, race, education, smoking status and pack-years, alcohol intake, 24-h recall and food frequency questionnaire (FFQ), family history of principal cancers (lung, breast, colorectal, prostate), age at menarche, oral contraceptive use, age at menopause or hysterectomy, parity and personal history of chronic obstructive pulmonary disease, cardiovascular disease (heart attack, stroke, heart failure, coronary heart disease, atrial fibrillation) and diabetes. Height and weight were measured to calculate BMI. The Townsend Deprivation Index was used to assess socioeconomic status with higher value indicating more deprivation. Total physical activity (metabolic equivalent of task) was measured using International Physical Activity Questionnaire. A healthy diet indicator was defined as meeting at least four of seven recommended dietary components for cardiometabolic health, based on available 24-h recall and FFQ data (fruit and vegetable intake, whole grains, red and processed meat, fish and alcohol intake)53. We also retrieved information on the first ten principal components (PCs) of genetic ancestry. Leukocyte telomere length was measured from peripheral blood at baseline using quantitative polymerase chain reaction.
Assessment of genetic predisposition of aging and cancer
To assess genetic predisposition to both aging and cancer, we calculated polygenic risk scores (PRS) derived from relevant single-nucleotide polymorphisms. Two sets of aging PRS were included: one for longevity33 and one for lifespan modified by nongenetic risk factors (that is, 13 diseases and 12 mortality risk factors)34. Cancer PRS were based on single-nucleotide polymorphisms identified in meta-analyses of genome-wide association studies of lung35, colorectal36 and endometrial37 cancers. For analyses that also adjusted for PRS, we restricted the analyses to 105,628 after excluding participants with low quality or abnormal heterozygosity, more than 5% missing heterozygosity, sex chromosome aneuploidy and a kinship coefficient greater than or equal to 0.0442 (ref. 54).
All of Us Research Program
Study design and population
To validate our findings from the UK Biobank in EHR, we conducted an independent observational cohort study using the All of Us Research Program (Fig. 1a), a diverse US biomedical cohort of over 450,000 adults, with enrollment and follow-up ongoing since 2010. At enrollment, participants completed The Basics, Overall Health, Lifestyle and Personal and Family Health History surveys, consented to EHR retrieval, and provided physical measurements and biospecimen samples. We included 14,851 participants aged under 55 years at enrollment. We applied additional exclusions per the UK Biobank protocol. All participants provided written informed consent to share EHRs, surveys and other study data with qualified investigators for broad-based research.
Assessment of systemic aging
To estimate PhenoAge, we retrieved the nine measurements from each participant’s EHR before enrollment, selecting measurements that were closest in date to each other and recorded within a 5-year window, and calculated the mean chronological age between the first and the last measurement. We then applied the same methods as UK Biobank to estimate the degree of age gap. For the analyses of trends of age gap by birth cohort, a total of 10,262 participants were included.
Ascertainment of early-onset solid cancers
Incident solid cancer cases were identified through EHR after enrollment, defined using ICD-9-CM and ICD-10-CM codes in the All of Us Research Program release R2022Q4R9. All participants were followed-up until 1 July 2022. The mean (s.d.) follow-up was 2.4 (1.4) years. We assessed the association between age gap and early-onset cancers in 8,935 participants, excluding previous cancer diagnosis (except nonmelanoma skin cancer) before or within 6 months of the first PhenoAge measurement or who were underweight.
Assessment of covariates
At enrollment, participants self-reported age, sex, race/ethnicity, education, smoking status, alcohol intake status, personal history of chronic obstructive pulmonary disease, cardiovascular disease (heart attack, stroke, heart failure, coronary heart disease, atrial fibrillation) and diabetes, family history of lung, breast, colorectal or prostate cancer, and zip-code. The Social Deprivation Index—an area-level census-based measure using first three-digit zip-code—served as a proxy for socioeconomic status, with higher values indicating greater deprivation. BMI was obtained directly from the EHR or estimated using weight and height data closest to enrollment.
Statistical analysis
A similar analytical approach was applied to the UK Biobank and the All of Us Research Program, if appropriate. In brief, we first assessed age gap levels by birth cohorts using linear regressions and evaluated sex differences by including an interaction term between birth year and sex, adjusted for age and age squared. Pairwise comparisons within age groups were conducted using two-tailed t-tests with Bonferroni-adjusted P values. Generalized additive models (GAMs) were used to visualize birth year trends in age gap, adjusted for age and age squared, separately by sex. Formal model comparison demonstrated better fit for the nonlinear specification than for linear models (Plikelihood ratio test < 0.001). To further examine differences in trends according to sex, we compared a model with sex-specific smooth terms to a model with a common smooth term using a likelihood ratio test, adjusting for age and age squared.
To assess the association between age gap and risk of early-onset cancers, multivariable Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs), using chronological age as the time scale. The proportional hazards assumption was evaluated using Schoenfeld residuals for each main model; no violations were detected (global test P values > 0.10). Age gap was analyzed by tertiles and as a continuous variable. To account for multiple comparisons across biological aging metrics and cancer outcomes, false discovery rate (FDR)-adjusted P values were calculated using the Benjamini–Hochberg procedure across all cancer site-specific tests presented in each table or figure. All models were adjusted for sex (male, female), race (White, other), Townsend Deprivation Index/Social Deprivation Index (continuous), education (pre-, postcollege), BMI (continuous), smoking status and intensity (never smoker, past smoker 1–19 pack-years, past smoker >19 pack-years, past smoker unknown pack-year, current smoker 1–19 pack-years, current smoker >19 pack-years, current smoker unknown pack-year), alcohol intake (never drinker, previous drinker, current drinker 0.1–14.9 g day−1, current drinker 15-29.9 g day−1, current drinker 30+ g day−1, current drinker but unknown g day−1), healthy diet (yes, no), physical activity (metabolic equivalent of task hours per week, in quartile), personal history of chronic obstructive pulmonary disease (yes, no), cardiovascular disease (yes, no) and diabetes (yes, no), and the first ten PCs of genetic ancestry (UK Biobank only). For cancer subtype analyses, we also adjusted for family history of lung, breast, colorectal and prostate cancer (yes, no), respectively. For female-specific cancers, we also adjusted for age at menarche (continuous), oral contraceptive use (never, past, current), menopause (yes, no, hysterectomy) and parity (continuous).
In the UK Biobank, we also assessed whether the association between age gap and early-onset cancers was independent of well-established markers of aging and cancer, by additionally adjusting for leukocyte telomere length, aging PRS33,34, and cancer-specific PRS (lung35, colorectal36 and endometrial37 cancers). For sensitivity analyses, we excluded participants with follow-up <2 years, and redefined early-onset cancers using age at diagnosis before 50 years. For secondary analysis, we assessed the association with late-onset cancers using similar approaches.
Associations between each organ-specific aging measures and early-onset solid cancers were assessed using Cox regression models using the same set of covariates as above, and FDR-adjusted P values were calculated using the Benjamini–Hochberg procedure. We also adjusted for PhenoAge-defined age gap.
For All of Us Research Program, we conducted stratified analyses for participants with (1) the nine measurements measured within 3 years, and also (2) the duration from first measurement to enrollment being less than 4 years. We also performed the sensitivity analysis excluding participants with less than 2 years of follow-up to minimize reverse causation.
This study followed the Strengthening the Reporting of Observational Studies in Epidemiology—Molecular Epidemiology (STROBE) reporting guideline. All analyses were performed using R (version 4.3.3) in RStudio (version 2026.01.2 + 418.pro1) with packages including tidyverse (v2.0.0), survival (v3.5-7), Publish (v2023.1.17), mgcv (v1.9-1), ggplot (v3.5.2) and BioAge (v0.1.0). All analyses were considered statistically significant with two-sided P < 0.05. Figures were generated in RStudio and subsequently refined for presentation using BioRender (https://www.biorender.com) and Adobe Illustrator 2026 (version 30.3).
Ethics
The UK Biobank obtained ethical approval from the Northwest Multicenter Research Ethics Committee, the National Information Governance Board for Health and Social Care in England and Wales, and the Community Health Index Advisory Group in Scotland. All participants had provided written informed consent. UK Biobank received ethical approval from the Research Ethics Committee (REC reference 11/NW/0382). This study was conducted under UK Biobank Application No. 55288.
The All of Us Research Program obtained institutional review board (IRB) approval for all study procedures, and all participants provided informed consent at enrollment. To protect participant privacy, data used in this study were accessed by approved researchers only after registration in accordance with All of Us Research Program policies, completion of required ethics training, and agreement to data use terms through the All of Us Research Workbench (https://workbench.researchallofus.org/login).
This study used deidentified data from both resources. According to the Washington University School of Medicine in St. Louis Institutional Review Board, this study was deemed exempt from review as a secondary analysis of existing, deidentified data.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
