In the midst of a health crisis a few years ago, doctors repeatedly checked my thyroid-stimulating hormone (TSH), revealing an odd pattern: My test results were consistently much higher when tested at the local hospital than they were when tested at my outpatient doctor’s office.
One week, the clinic test showed my TSH was 5.19 milliunits per liter, slightly above normal. Four days later, the hospital measured it at 14.99 milliunits per liter, so high that it prompted a new diagnosis from my doctor. No one could explain the discrepancy in results.
Was my TSH fluctuating so dramatically? The answer mattered because the hormone, produced by the pituitary gland, stimulates the thyroid to release its own hormones. These chemicals regulate the body’s metabolism, including heart rate, energy production, and metabolic stability — the same systems in which my symptoms were worsening. If the TSH elevations were accurate, they would signal a progressing thyroid condition. If not, doctors would look for other causes.
Although TSH fluctuates naturally, biology alone cannot explain a near triple increase within four days. The standard measure for how much this hormone realistically varies from test to test is about 50%. Larger shifts are statistically improbable. My health hadn’t changed in the days between tests.
I started researching and made a startling discovery: The TSH test, one of the most commonly ordered laboratory tests in the United States, is not standardized. Depending on which company’s analyzer processes the blood, the same sample can yield results differing by 20% to 40%. Studies comparing the same blood samples tested on different lab machines have consistently found clinically significant differences in results, pointing to variations in the machines. My results exceeded even these documented variations, prompting deeper investigation.
These differences have been documented in the laboratory medicine literature for more than a decade. Each major manufacturer — Abbott, Siemens, Roche, and Beckman Coulter — uses proprietary antibodies and calibration systems protected as intellectual property. Yet many clinicians, their patients, and even some labs don’t know that different brands of machines produce different results.
What makes this failure especially frustrating is that the scientific solution already exists. After more than 15 years of collaborative work, the Centers for Disease Control and Prevention and the International Federation of Clinical Chemistry (IFCC) established and validated a universal TSH harmonization protocol. This protocol allows manufacturers to align assays to the same biological target so, for instance, a TSH of 4 means the same thing, regardless of platform.
The problem is not science. The problem is adoption. Following the protocol remains voluntary. The Food and Drug Administration has never required manufacturers to recalibrate to the harmonized standard. Some manufacturers have moved toward alignment, but without a regulatory requirement, progress remains inconsistent.
A common counterargument is the concern that recalibration could complicate long-term patient follow-up. In practice, laboratories already manage shifts from reagent lot changes and platform upgrades, which routinely alter patient results.
An estimated 20 million Americans have thyroid disease, with women — particularly postmenopausal women — disproportionately affected. TSH values guide diagnosis, medication dosing, and long-term monitoring. A large enough difference between lab machines can determine whether a patient gets diagnosed, receives treatment, or is told their symptoms are unrelated to thyroid function.
The consequences are not abstract. Patients whose results read high on one platform may be started on thyroid hormone unnecessarily, risking overtreatment complications that include atrial fibrillation and bone loss. Patients whose results appear normal on another may go untreated — and suffer from prolonged hypothyroidism, which is associated with elevated cholesterol, cognitive symptoms, and increased cardiovascular risk. A TSH of 3.5 mIU/L on one platform could register as 4.9 mIU/L on another, pushing treatment decisions into contested territory.
This failure stems from fragmented oversight. The CDC lacks enforcement power; CLIA proficiency-testing further compounds this by rewarding agreement within manufacturer “peer groups” — grading labs on how well they match other machines of the same brand, rather than how well they match biological truth. Meanwhile, the FDA approves assays based on “substantial equivalence” to older tests. This creates a regulatory trap: recalibrating to the CDC standard would invalidate that equivalence, triggering costly new submissions and effectively penalizing manufacturers from fixing the bias.
The fix is straightforward: The FDA could create a facilitated pathway for recalibration to harmonized targets, while CMS could require proficiency testing using those universal standards, rather than brand-specific peer groups, as a benchmark. This would close the gap without creating new science or infrastructure.
Other high-impact laboratory tests, such as glucose and cholesterol, benefit from established reference measurement systems that ensure cross-platform comparability. There is no scientific reason thyroid testing should lack equivalent standardization. The tools exist. The methodology has been validated. What remains missing is regulatory will.
I am not a scientist or an endocrinologist, but a patient who was forced to learn — through experience and extensive research — that my diagnosis depended on which company’s machine processed my blood. The hospital test that showed my TSH level was 14.99 milliunits per liter changed my treatment — and my health began to improve.
But most patients, practitioners, and even some laboratories remain unaware of this variability. Sometimes they see symptoms persist even as test-guided treatment says everything is fine.
That should not be acceptable in modern medicine. The science is finished. The protocol exists. What remains is a regulatory system that has abdicated its mandate — leaving patients to reconcile test results that their own doctors cannot explain.
Samantha Bonsack is a patient advocate based in Moab, Utah.
