Predicting fractures and pain using chest x-rays: A Nightingale Open Science dataset

Authors: Matthew Lungren¹, Johanna Kim¹, Stephanie Bogdan¹, William Lane², Josh Risley², Katy Haynes², Ziad Obermeyer^2,3

¹ Stanford Center for Artificial Intelligence in Medicine and Imaging
² Nightingale Open Science
³ University of California, Berkeley

Lead Nightingale analyst: William Lane

When using this resource, please cite: more options
Matthew Lungren, Johanna Kim, Stephanie Bogdan, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N5RP44

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4

BibTeX

@dataset{fracture-aimi-xray,
  author = {Lungren, Matthew and Kim, Johanna and Bogdan, Stephanie and Lane, William and Risley, Josh and Haynes, Katy and Obermeyer, Ziad},
  title = {Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset},
  publisher = {Nightingale Open Science},
  year = {2021},
  doi = {10.48815/N5RP44},
  url = {https://doi.org/10.48815/N5RP44}
}

ACM

Matthew Lungren, Johanna Kim, Stephanie Bogdan, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N5RP44

APA

Lungren, M., Kim, J., Bogdan, S., Lane, W., Risley, J., Haynes, K., & Obermeyer, Z. (2021). Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset [Data set]. Nightingale Open Science. https://doi.org/10.48815/N5RP44

Chicago

Lungren, Matthew, Johanna Kim, Stephanie Bogdan, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. “Predicting Fractures and Pain Using Chest X-Rays: A Nightingale Open Science Dataset.” Nightingale Open Science. https://doi.org/10.48815/N5RP44.

Vancouver

Lungren M, Kim J, Bogdan S, Lane W, Risley J, Haynes K, et al. Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset [Internet]. Nightingale Open Science; 2021. Available at: https://doi.org/10.48815/N5RP44

BibTeX

@article{nightingale2022,
  author = {Mullainathan, Sendhil and Obermeyer, Ziad},
  title = {Solving medicine's data bottleneck: Nightingale Open Science},
  journal = {Nature Medicine},
  year = {2022},
  month = may,
  day = {01},
  volume = {28},
  number = {5},
  pages = {897-899},
  issn = {1546-170X},
  doi = {10.1038/s41591-022-01804-4},
  url = {https://doi.org/10.1038/s41591-022-01804-4}
}

ACM

Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4

APA

Mullainathan, S., & Obermeyer, Z. (2022). Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine, 28(5), 897–899. https://doi.org/10.1038/s41591-022-01804-4

Chicago

Mullainathan, Sendhil, and Ziad Obermeyer. 2022. “Solving Medicine’s Data Bottleneck: Nightingale Open Science.” Nature Medicine 28 (5): 897–99. https://doi.org/10.1038/s41591-022-01804-4.

Vancouver

Mullainathan S, Obermeyer Z. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine [Internet]. May 1, 2022;28(5):897–9. Available at: https://doi.org/10.1038/s41591-022-01804-4

The problem

For many older patients—and some younger ones—a fracture marks the beginning of the end. The fracture itself is seldom fatal; but it sets off a downward spiral of pain, decreased mobility, physical deconditioning, debility, and ultimately death. This is why screening for osteoporosis, recommended today for women starting at age 65, is so critical: the appearance of bones on a special type of x-ray (called a DEXA scan) shows us who is at high risk of fractures, and lets us start treatments to prevent them before they happen.

Given the massive costs of fractures—to both patients, and the health care system, which a recent report put at nearly $60 billion for fractures in US Medicare patients alone—it’s clear that our current screening strategies are not adequate. For one thing, despite established guidelines calling for universal screening over age 65, the vast majority of women don’t get it—not to mention the fact that many fractures occur in men and younger people, for whom guidelines don’t recommend screening. So it would be very useful to find another way to predict fractures at scale, using routinely available data.

The chest x-ray is, by far, the most commonly-performed radiological study in the world, done when patients see their doctor for a cough, chest or back pain, before surgery, in the ER, on admission to the hospital, and in a variety of other settings. An interesting fact about the ‘chest’ x-ray is that it also gets a very clear view of the spine, from neck to the upper lumbar area. And the spine is an excellent place to assess the quality and quantity of bone, which may hold signal for predicting future fractures.

Dataset overview

This dataset starts with the Stanford Artificial Intelligence in Medical Imaging CheXpert dataset, which contains x-rays from across the Stanford Medicine system: in outpatient clinics, in the ER, or in the hospital. As part of that initial dataset, the chest x-rays were linked to the radiologist’s interpretation of the image, which we also provide here.

But this dataset goes further, adding labels on both health outcomes and patient experiences. First, we link each x-ray to the occurrence of past and future fractures, not just in the spine, but all over the body; and to data on diagnoses of osteopenia and osteoporosis, so that researchers can compare algorithmic predictions to what doctors already know about patient risk. (We also have the CheXpert labels, so researchers can also observe whether the doctor saw a fracture in the actual chest x-ray.) We also link the x-rays to diagnoses of musculoskeletal problems (joints, tendons, pain, etc), again past and future and all over the body, to test the hypothesis that subtle features of the chest x-ray might also be able to yield insights into a range of musculoskeletal issues (as other recent articles have suggested). Finally, we also add other relevant data elements describing the patients, including height, weight, and selected vital signs.

A few notes to keep in mind. All labels—on fractures, pain, etc.—will only be present if the patient received care involving that fracture in some part of the Stanford Medicine system. This creates bias in who is labeled, since some patients who have fractures will not show up, or go elsewhere. Note also that many, but not all, x-ray studies contain two orthogonal images: the PA [postero-anterior] view taken from back to front, and the lateral view from the side. (Some patients, particularly those who are too frail or sick to stand up, receive only the AP [antero-posterio] view from front to back, while lying down in bed.) Finally, note that there can be multiple chest x-ray studies per patient, on different days.

Our partners

The Stanford Artificial Intelligence in Medical Imaging (AIMI) Center supports the development, evaluation and dissemination of new artificial intelligence methods applied across the medical imaging life cycle, in order to solve clinically important problems in medicine using AI. Their mission is to develop and support transformative medical AI applications and the latest in applied computational and biomedical imaging research to advance patient health. Building on their trailblazing work to release imaging datasets like CheXpert, this Nightingale dataset holds the promise to predict future fractures and frailty in patients, which could lead to the creation of tools for triage and diagnosis, and optimize over-burdened hospitals. This dataset was conceived of and created by Dr. Matthew Lungren and Johanna Kim, Co-Directors of the Stanford AIMI Center, as well as Stephanie Bogdan, Project Manager for the Stanford AIMI Center. We are deeply grateful for their help, as well as their inspirational work to make data available as a public good.

Dataset details

Versions

This dataset v1: Each observation in the dataset corresponds to one of 224,316 chest x-ray studies, from 65,240 unique patients between October 2002 and July 2017. The x-rays were then linked to electronic health record data from the Stanford Medicine system using patient MRN. We queried ICD diagnosis tables to obtain codes on fractures and pain over one years before and after the date of the x-ray, and patient flowsheet data to obtain data on height, weight, body temperature.

What’s next v2 (target release date: March 2022): We will add diagnosis and procedure codes that capture pulmonary deterioration in the short-term after the x-ray was done, as well as the setting of the x-ray (e.g., the ER, inpatient, clinic). This will allow researchers to predict this important outcome, and align this dataset with other Nightingale Open Science datasets that also involve prediction of pulmonary deterioration with chest x-rays.

Dataset schema

Dataset Observations Connection to Key Outcomes

Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

Key variables

Fracture

We obtained data on ICD-9 codes 800–829 (fractures) over the year before and after the x-ray. In the summary table below, diagnoses were grouped by body region, but individual ICD-9 codes are available in the dataset.

fracture_location	icd9_code	total_with_dx	dx_year_before	dx_year_after	dx_year_before_or_after
skull and face	800,801,802,803,804	1303	77.44%	65.54%	83.73%
spine and ribs	805,806,807,809	4779	75.27%	68.07%	83.16%
pelvis and hip	808,820	2075	66.36%	57.64%	76.14%
scapula and clavicle	810,811	748	78.34%	72.33%	85.29%
arm	812,813,818,819	1512	56.08%	49.80%	64.75%
hand	814,815,816,817	927	46.60%	46.17%	61.06%
leg	821,822,823,824,827	1983	60.87%	57.19%	74.48%
foot	825,826	683	42.75%	40.85%	57.39%
other	828,829	492	50.00%	51.83%	70.53%

Note that fracture is one of the CheXpert labels that the radiologist can comment on in the chest x-ray interpretation, so you will also be able to know if the particular fracture that shows up in the ICD-9 code (in the electronic health/billing record) was visible and commented on in the x-ray itself.

Osteoporosis and Osteopenia

We obtained data on ICD-9 codes 733.00–733.03 for osteoporosis; and 733.09 or 733.90 for osteopenia, based on prior research. Additionally, for osteopenia, we required the text flag accompanying codes 733.09 or 733.90 to mention osteopenia (e.g., in this dataset, some patients had code 733.90 accompanied by a text flag for osteodynia, which would not be included under our definition).

icd9_code	dx_name_corrected	total_with_dx	dx_year_before	dx_year_after	dx_year_before_or_after
733	Osteoporosis	3795	54.00%	42.00%	65.00%
733.9	Osteopenia	1422	26.00%	25.00%	41.00%

Muscoloskeletal problems and pain

We obtained data on ICD-9 codes 710–739 (musculoskeletal diagnoses), many of which involve pain, over the year before and after the x-ray. Again in the summary table below, we group these by clinical category, but individual ICD-9 codes are available in the dataset.

dx_name	icd9_code	total_with_dx	dx_year_before	dx_year_after	dx_year_before_or_after
Connective tissue disease	710	988	67.91%	52.02%	78.04%
Infected joint	711	510	67.84%	58.63%	82.35%
Gout	712	375	53.87%	54.93%	74.40%
Rheumatoid arthritis	714	1253	60.73%	45.89%	71.35%
Osteoarthritis	715	8389	62.08%	56.96%	75.79%
Other joint problem	716,713	2973	44.37%	39.86%	59.87%
Knee problem	717	664	26.81%	25.90%	41.72%
Joint problem	719,718	13486	52.44%	53.28%	69.75%
Ankylosing spondylitis	720	289	47.06%	32.18%	57.79%
Spondylosis	721	8221	65.14%	61.31%	80.46%
Intervertebral disc problem	722	4700	56.15%	49.98%	73.04%
Neck problem	723	4850	47.46%	44.25%	62.41%
Back problem	724	10966	59.96%	54.08%	73.05%
Tendon and bursa problem	726,727	6378	35.15%	35.97%	52.37%
Ligament problem	728	5701	51.80%	50.99%	68.85%
Other disorders of soft tissues	729	15664	60.60%	60.25%	76.60%
Bone infection	730	1461	65.43%	66.53%	80.42%
Bone and cartilage problem	733,732,731	10170	58.48%	52.87%	73.80%
Scoliosis	737	2298	65.36%	56.66%	78.42%
Limb deformity	738,735,736	4482	48.22%	46.94%	66.44%
Nonallopathic lesions not elsewhere classified	739	207	54.59%	52.66%	77.29%

Physiological measurements

Finally, we obtained temperature, height, and weight from the flowsheet data collected in the course of medical encounters. Note, there are some outliers that are most likely data entry errors.

SUMMARY	WEIGHT [kg]	HEIGHT [m]	TEMP [°F]	BMI [kg/m^2]
count	35450	34377	36500	34344
mean	86.60	1.71	99.37	30.18
std	24.62	0.11	0.84	50.40
min	0.22	0.13	36.70	0.083
25%	70.31	1.63	99.00	24.74
50%	83.20	1.70	99.70	28.21
75%	98.43	1.79	99.90	32.73
max	282.81	2.44	107.40	4835.96

Data Dictionary

Predicting fractures and pain using chest x-rays: A Nightingale Open Science dataset

BibTeX

ACM

APA

Chicago

Vancouver

BibTeX

ACM

APA

Chicago

Vancouver

The problem

Dataset overview

Our partners

Dataset details

Versions

Dataset schema

Key variables

Fracture

Osteoporosis and Osteopenia

Muscoloskeletal problems and pain

Physiological measurements

Table of contents