In this Resource paper, we highlight the value of the All of Us Research Program’s expanded wearables dataset. We examined how multiple DHT outcomes aligned with expected trends previously published in the literature. Specifically, we calculated baseline cohort activity and sleep outcomes in large cohorts of more than 30,000 participants, observed seasonal variations in physical activity and sleep, and presented a case study of the activity trajectory of participants following a lower limb fracture. Together, these analyses demonstrate the value and unique nature of the All of Us Fitbit data resource in terms of its scale, longitudinal observation period and integration with clinical outcomes, including those recorded in the EHR data.
The longitudinal nature of this dataset enables examination of temporal patterns. Although variation in seasonal activity and sleep is relatively well-established, few studies have measured oscillations directly via continuous activity monitoring over several years13. An advantage of commercial wearable device data (for example, Fitbit) in large cohort studies is potentially higher compliance and more continuous data.
Analysis of the All of Us dataset revealed expected seasonal variation in physical activity, as measured by median daily steps, including a deviation from this pattern in 2020 owing to the COVID-19 pandemic. This deviation was observed in an earlier analysis of this dataset, but at that time, the sample size was much smaller (n = 5,443) and less demographically varied15. Interestingly, our analysis shows that median daily steps never fully recovered to pre-pandemic levels (Fig. 2a, years 2021–2023). This incomplete recovery likely reflects two factors. First, 2021 marks the first year that WEAR participants’ step data was incorporated into the seasonal average. As shown in Table 2, WEAR participants have significantly lower step counts than their BYOD counterparts, suggesting this compositional shift in the All of Us cohort contributed to lower step counts beginning in 2021. Second, lingering pandemic-related behavioral changes, such as extended remote work policies, may have also reduced baseline activity. Future work is needed to disentangle these contributing factors. This expanded dataset will strengthen researchers’ ability to study typical physical activity patterns and the factors that influence deviations from these patterns17.
We also observed seasonal variation in sleep duration, consistent with other self-reported and objective measures in the literature, which generally show longer sleep in winter and shorter sleep in spring and summer18,19,20,21. Notably, we observed increased sleep durations starting in winter 2020 that gradually returned to baseline by winter 2023 (Fig. 2b). Self-reported data have documented similar increases in sleep duration during this period22,23, as have several studies using objective measures of sleep early in the COVID-19 pandemic24. Our data provide additional confirmation of this pattern and extend the observation period through winter 2023, demonstrating a gradual return to baseline.
We observed a median baseline of 6,454 steps per day in our general activity cohort (Table 2). Published estimates from comparable cohorts (for example, UK Biobank and National Health and Nutrition Examination Survey (NHANES)) often report ~9,000–9,600 steps per day25,26,27. However, these comparisons are sensitive to the step-count algorithm utilized28.
In addition, All of Us ingests wearable data via the Fitbit Application Programming Interface, which provides summary tables and metrics derived from Fitbit’s proprietary algorithms. As a result, raw accelerometry data are not available to researchers. Although this standardization may improve comparability in All of Us studies, it complicates comparisons with other cohorts that do provide raw accelerometry data (for example, UK Biobank, NHANES). Furthermore, whereas NHANES and UK Biobank distribute devices for 1 week, All of Us participants donated data for extended periods. Finally, All of Us is a broad convenience sample and is not representative of the US population. Despite the WEAR study’s success in increasing the number of people from certain demographic groups (for example, lower income, less access to healthcare), the All of Us dataset is still older, more female and more highly educated than the general US population. Researchers making detailed comparisons to other cohorts or the general population should apply post-stratification or weighting methods to account for sampling and demographic differences.
The recommended daily sleep duration for adults is 7–9 h and self-reported estimates from US adults typically range from 6.5 to 7.5 h29,30. We were interested in how these subjective sleep durations, from surveys like the NHANES would compare to the objectively measured sleep durations in our cohort. In our cohort, the median (IQR) daily sleep duration was 6.8 h (6.2–7.2) (Table 2), which is comparable to the NHANES estimates. However, whereas NHANES data suggests that ~32% of US adults experience ‘short sleep’ (<7 h), we found a much larger percentage (62.5%, n = 21,612) of participants with a median main sleep duration classified as short or very short sleep (<7 h). Although these differences are interesting to note, our cohort is not nationally representative and uses device-measured rather than self-reported sleep, complicating direct comparisons. In addition, while research suggests self-reported sleep can lead to overestimations31, the magnitude of the difference (32% versus 62.5%) suggests additional factors may be involved.
Recent studies of large cohorts using consumer sleep trackers have generated estimates of global sleep patterns, perhaps providing more appropriate comparisons to our device-measured data. One such study reported a slightly longer average sleep duration in its US subset32: 6.9 h versus 6.8 h in our cohort. That study measured sleep in ~50,000 Oura ring users who donated an average of ~242 nights of data from January 2021 to January 2022. By contrast, participants in our cohort donated a median of 159 nights of valid sleep data over a median data donation window of 464 days, spanning from 2009 to 2023 (Supplementary Table 1). The cohorts were similar in age and sex, but socioeconomic status—known to influence sleep duration33—was not reported32. The WEAR program successfully enrolled individuals from lower socioeconomic statuses who are less likely to be included in wearables datasets that rely on independent device purchases (Table 1). The likely difference in socioeconomic composition between the studies may partially explain the lower sleep durations we observed.
Another study using an under-mattress sleep device, the Withings Sleep Analyzer, reported a significantly higher average sleep duration of ~7.5 h for US device users34. Validation studies suggest that the Withings device significantly overestimates sleep duration when compared to polysomnography and may do so to a greater extent than Fitbit devices35,36. In addition, the Withings Sleep Analyzer study assessed sleep over 9 months in adults who registered to use the device between July 2020 and March 2021, a period that overlapped significantly with the COVID-19 pandemic. Several reports suggest population-level sleep abnormalities during this time, including increased time in bed and total sleep duration23,37. Although our dataset includes this pandemic period, it also includes data from many years before and after, which would have mitigated the impact of pandemic-related changes on our longitudinal median sleep duration.
A key strength of the All of Us data is the ability to examine individual-level changes in relation to clinical events. To demonstrate the value of integrating wearable outcomes with clinical events documented in EHR records, we examined daily step counts in participants who experienced a lower limb fracture. Among the 61 participants in this case study, we observed considerable variability in average daily steps both before and after injury. Nevertheless, the cohort showed a rapid decline in steps relative to baseline immediately following the injury, with recovery taking on average 90 days after injury (seen as 120 rolling average days in Fig. 4). Even at 180 days (6 months) after injury, the cohort had not fully returned to pre-injury activity levels. Given the range of injury severity represented in the ‘Fracture of Lower Limb’ concept ID (Supplementary Table 2) and published reports indicating that several of these injuries require recovery times exceeding 6 months—particularly in older adults—this incomplete recovery was expected38,39. Our primary purpose in conducting this case study was to demonstrate how wearable data can be integrated with clinical outcomes to understand correlations between health events and changes in activity patterns. Although we chose a relatively straightforward case study with an expected result, future researchers can leverage these integrated data types to identify novel biomarkers and associations between health, physical activity and sleep outcomes.
Realizing the full potential of this dataset requires continued methodological advancement in several areas. For example, the impact of device type on wearable outcomes remains poorly understood, and there are currently no consensus methods for addressing the use of multiple Fitbit devices in a single study or by a single participant40. Similarly, approaches for handling missing data in DHT datasets are not standardized. Missing data in these datasets are unlikely to be random and may reflect conscious or subconscious decisions to remove a device, which can correlate with participant characteristics or health states (for example, mood) and introduce bias41.
Although Fitbits have demonstrated reasonable reliability compared to gold-standard devices for certain activity and sleep metrics42,43,44, their reliability varies across specific measures (for example, sleep stages, heart rate), populations and device types40,45,46. For example, research suggests that Fitbits measure heart rate less reliably in people with darker skin tones because of differences in how sensors optically measure light absorption40. In addition, Fitbit step estimation accuracy may be reduced in people with irregular gait patterns from neurological conditions such as Parkinson’s disease, with inaccuracies varying by device type47. The effect of these limitations on study findings will depend on the specific research question, outcome measures and population being studied. Researchers should carefully consider these device-specific and population-specific reliability limitations when designing analyses and interpreting results from this dataset.
Another important consideration is that wearables data may be subject to measurement reactivity, where participants temporarily alter their behavior when first provided with activity and sleep trackers. However, the duration of this effect is likely short-lived and depends on the health-related behavior of interest (for example, daily steps versus exercise minutes)48. Researchers should consider their research question and observation period carefully and may wish to exclude the first few days or weeks of data donated by participants to avoid bias49,50. Given the large-scale and longitudinal nature of the analyses in this manuscript, we chose not to exclude any days of data.
Future research using the All of Us Fitbit dataset will benefit from methodological advancements that address current limitations; however, developing such approaches was beyond the scope of this paper. Instead, our goal was to present the dataset at a high level, with the expectation that the broader research community will leverage it for methodological developments. Encouragingly, the research community has already begun this work, including several reports that specifically evaluate and provide considerations for using the All of Us Fitbit dataset40,50,51. Future All of Us wearables data users are encouraged to reference the program’s user support hub (https://support.researchallofus.org), which contains additional information and guidance, including multiple ‘featured workspaces’ with example code and support articles, such as one titled ‘Considerations while using Fitbit data in the All of Us Research Program’.
Finally, analyses of demographic variables and DHT outcomes (for example, daily steps and sleep duration) require careful consideration to avoid misleading conclusions. A strength of the All of Us dataset is that it integrates many data types, including EHR, genomics and extensive self-reported survey data. Specifically, 82% (48,487 out of 59,018) of participants with Fitbit data also responded to the program’s SDOH survey, which asks about social factors like neighborhood, social life and perceived stress. We urge researchers to plan their analyses carefully, consult experts and community members in their research design, and consider all the data the program collects to study factors underlying sleep and activity differences.
An important consideration for all real-world datasets, including the data presented here, is that many factors of data collection are beyond experimenter control, and some of these uncontrolled factors may introduce sources of error or bias. For example, participants in our cohort used 41 different Fitbit device models with various sensors and technologies (Supplementary Table 3). This device heterogeneity may affect measurement accuracy owing to device-specific limitations or user-selected settings. In addition, although 59,018 participants donated Fitbit data to All of Us, only 52,860 (89.6%) had device information available in the device table, and a small fraction of participants showed evidence of using five or more devices during their data donation window (Supplementary Fig. 2). Such data characteristics reflect the real-world nature of this dataset, in which participants use their own devices over extended periods under free-living conditions.
Because our goal was to provide a high-level overview of Fitbit data availability and trends as a Resource paper, and because there are currently no consensus methods in the field for addressing device heterogeneity in consumer-grade wearables research, we did not prescribe specific analytical approaches for handling these factors. Establishing such methods is an active area of research that extends beyond the scope of a resource description paper. As the field continues to evolve, researchers should carefully consider potential sources of error or bias when analyzing real-world data, and important findings obtained in observational real-world datasets should ideally be followed up with controlled interventional studies when feasible.
In sum, although potential errors and biases present challenges when working with real-world data, there are also crucial benefits that make real-world datasets a valuable resource for the research community. These include their massive scale, richness of longitudinal data, integration with multiple data types (for example, EHR, surveys, genomics), and ability to support a wide range of research objectives—benefits that are often difficult to obtain in more controlled, small-scale datasets. The All of Us Fitbit dataset, with its extended observation periods, large and diverse participant population, and linkage to clinical outcomes, offers opportunities for discovery that complement findings from traditional research-grade accelerometry and plethysmography studies.
The WEAR study was a strategic and innovative effort by the All of Us Research Program to expand the number and representativeness of individuals donating DHT data by distributing Fitbit devices to participants at no cost. WEAR’s success is evidenced by a larger proportion of participants from varying backgrounds donating activity data through the WEAR study relative to the BYOD program (Table 1). The All of Us Research Program is accelerating research in precision medicine, a field that initially focused on the potential for human genetics to enable individually tailored treatments and improve health outcomes, but that over time has broadened its scope to appreciate the role of additional data types, including DHT. By substantially increasing the amount of DHT available from a broader range of individuals across the US population, the expanded All of Us Fitbit dataset offers a valuable resource to advance biomedical research. This resource can help researchers better understand the contributions of sleep, heart rate and physical activity on important health outcomes, and inform the development of more precise treatments and interventions.
