dataset versions

v2	Documentation	Data Dictionary
v1	Documentation	Data Dictionary

Emergency triage of Covid-19 patients using chest X-rays: A Nightingale Open Science dataset

Authors: Ari Robicsek¹, Angelique Russell¹, Todd Czartoski¹, George Diaz¹, JB Minogue¹, Per E. Danielsson², Michael Pirri², Kristen Manning², Katie Lin³, William Lane³, Josh Risley³, Katy Haynes³, Ziad Obermeyer^3,4

¹ Providence St. Joseph
² Swedish Health Services
³ Nightingale Open Science
⁴ University of California, Berkeley

Lead Nightingale analyst: William Lane

When using this resource, please cite: more options
Ari Robicsek, Angelique Russell, Todd Czartoski, George Diaz, JB Minogue, Per E. Danielsson, Michael Pirri, Kristen Manning, Katie Lin, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N5MW26

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4

BibTeX

@dataset{covid-psj-xray,
  author = {Robicsek, Ari and Russell, Angelique and Czartoski, Todd and Diaz, George and Minogue, JB and Danielsson, Per E. and Pirri, Michael and Manning, Kristen and Lin, Katie and Lane, William and Risley, Josh and Haynes, Katy and Obermeyer, Ziad},
  title = {Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset},
  publisher = {Nightingale Open Science},
  year = {2021},
  doi = {10.48815/N5MW26},
  url = {https://doi.org/10.48815/N5MW26}
}

ACM

Ari Robicsek, Angelique Russell, Todd Czartoski, George Diaz, JB Minogue, Per E. Danielsson, Michael Pirri, Kristen Manning, Katie Lin, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N5MW26

APA

Robicsek, A., Russell, A., Czartoski, T., Diaz, G., Minogue, J. B., Danielsson, P. E., Pirri, M., Manning, K., Lin, K., Lane, W., Risley, J., Haynes, K., & Obermeyer, Z. (2021). Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset [Data set]. Nightingale Open Science. https://doi.org/10.48815/N5MW26

Chicago

Robicsek, Ari, Angelique Russell, Todd Czartoski, George Diaz, JB Minogue, Per E. Danielsson, Michael Pirri, et al. 2021. “Emergency Triage of Covid-19 Patients Using Chest X-Rays: A Nightingale Open Science Dataset.” Nightingale Open Science. https://doi.org/10.48815/N5MW26.

Vancouver

Robicsek A, Russell A, Czartoski T, Diaz G, Minogue JB, Danielsson PE, et al. Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset [Internet]. Nightingale Open Science; 2021. Available at: https://doi.org/10.48815/N5MW26

BibTeX

@article{nightingale2022,
  author = {Mullainathan, Sendhil and Obermeyer, Ziad},
  title = {Solving medicine's data bottleneck: Nightingale Open Science},
  journal = {Nature Medicine},
  year = {2022},
  month = may,
  day = {01},
  volume = {28},
  number = {5},
  pages = {897-899},
  issn = {1546-170X},
  doi = {10.1038/s41591-022-01804-4},
  url = {https://doi.org/10.1038/s41591-022-01804-4}
}

ACM

Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4

APA

Mullainathan, S., & Obermeyer, Z. (2022). Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine, 28(5), 897–899. https://doi.org/10.1038/s41591-022-01804-4

Chicago

Mullainathan, Sendhil, and Ziad Obermeyer. 2022. “Solving Medicine’s Data Bottleneck: Nightingale Open Science.” Nature Medicine 28 (5): 897–99. https://doi.org/10.1038/s41591-022-01804-4.

Vancouver

Mullainathan S, Obermeyer Z. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine [Internet]. May 1, 2022;28(5):897–9. Available at: https://doi.org/10.1038/s41591-022-01804-4

The problem

In emergency rooms across the world, doctors facing hospital bed shortages must make a difficult judgment call: is a patient with respiratory infection safe to go home? Or is close monitoring in the hospital, or even the ICU, needed? Getting this right is critical not just to save lives, but also to optimize scarce hospital resources.

Reports from the front lines of the Covid-19 pandemic indicate that the current state of medical knowledge is failing here. Empirically, many patients are admitted to the hospital, but ultimately do not require advanced care—a waste of beds. Other patients look well enough to be sent home, only to deteriorate rapidly, returning to the ER in profound respiratory distress—or not returning at all.

The key to solving this problem could lie in the chest x-ray, a rapid, cheap diagnostic that nearly all patients with respiratory complaints get in the ER. It’s clear to front-line doctors that there is a signal in the x-ray image for predicting impending pulmonary collapse. But this signal can be devilishly hard to find. Indeed, some health systems explicitly require senior physicians to personally review x-rays before a patient is sent home from the ER, in the hope that expending their scarcest resource—doctors’ time—can help catch high-risk patients in time.

Dataset overview

This dataset links chest x-rays to pulmonary outcomes, in order to fill an urgent need identified by clinicians: an algorithm that helps physicians make good triage decisions, by predicting pulmonary collapse on the basis of x-rays done in the ER. This is directly motivated by the Covid-19 pandemic, but if it works, it could help a range of other patients with respiratory infections that progress via the same “final common pathway”—acute respiratory distress syndrome (ARDS): influenza, pneumonia, sepsis, and non-infectious inflammatory conditions. The dataset consists of patients who were received in the Emergency Department (ED) of a participating hospital and received a chest X-ray and either a positive COVID-19 diagnosis (via physician) or a positive test (rapid, antibody, or PCR) within fourteen days of their ED visit date. We begin by identifying chest x-rays performed in the ER across the 51 hospitals at Providence St. Joseph, in patients diagnosed with Covid-19. We then determine whether the patient was admitted to the hospital or not, reflecting the emergency physician’s triage decision: did the patient need close monitoring as an inpatient, or were they safe to go home? Finally, we obtain two critical outcomes: whether the patient ultimately required mechanical ventilation over the 14 days after the initial visit; and did they die over the same period.

Of note, we will only observe mechanical ventilation if the patient returns to the same hospital (or another hospital within the Providence St. Joseph system). However, because we obtain mortality data from linkage to Social Security records, the label is not dependent on seeking care.

Our partner

Providence St. Joseph Health is a not-for-profit health care system operating in seven states and serves as the parent organization for 100,000 caregivers. The combined system includes 51 hospitals, 829 physician clinics, and other health, education and social services across Washington, Oregon, California, Alaska, Montana, New Mexico, and Texas. This dataset was conceived of and created by Ari Robicsek, Chief Medical Analytics Officer, along with colleagues in infectious disease and radiology. These clinicians are true heroes: in addition to working long hours in the hospital at the height of the pandemic, they also devoted time to in-depth discussions with our team on how algorithms could be used to improve their performance in trying times.

Versions

dataset versions

v2	Documentation	Data Dictionary
v1	Documentation	Data Dictionary

This dataset v1: The current v1 dataset contains 7533 chest x-rays linked to the triage decision, mechanical ventilation, and mortality. Each observation in the dataset corresponds to an x-ray from a Covid-19 patient. We begin by querying the Ambra picture archiving and communication system (PACS) system for chest x-rays performed in ERs. Because this involves a large volume of studies, we then take a 50% random sample of these studies, and link them to diagnoses of Covid-19 from the Providence Covid-19 internal registry. Of note, this contains both patients diagnosed early in the pandemic, on the basis of symptomatology, geography, and time; and patients diagnosed on the basis of PCR or antigen tests. We then link these x-rays to ICD-9 procedure codes on ventilation, as well as Social Security data on mortality.

What’s next for v1.1: We are in the process of transferring more data as they are queried from the underlying radiology system.

v1.1: 15,000

v1: 7,533

25,000

Note that in the v1 version of the dataset we have relatively few points with hospital admissions; in v1.1 we anticipate this being far more balanced.

What’s next for v2 (released: April 2022): We plan to augment the dataset with additional data elements from the electronic health record, including other diagnoses, labs, and vital signs from the ER visit. We will also add data on the source of the diagnosis for Covid-19 patients (symptoms, PCR, antigen testing), as well as additional observations on Covid-19 negative patients.

Dataset schema

Dataset Observations Connection to Key Outcomes

Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

We summarize the dataset construction and key variables in a diagram that transparently shows (i) where the data come from and (ii) what are the key outcomes (labels) relevant to the medical problem we are trying to solve. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates the key patient.

	Admit	Discharge
N	568	6301
Intubate	29.05%	3.86%
Mortality	27.46%	1.24%

Admission: Indicates admission to the hospital from the ED. We coded this variable by looking at the ‘Encounters’ table, where each ‘ED episode ID’ has a ‘Patient Class’ which contains the flag ‘Admitted’. Intubation: Procedures for Covid-19 patients were entered in the line, drain, airway (LDA) table, drawn from the hospital electronic health record. We queried this table and coded intubation if the patient had a matching record in the the procedure was marked as ‘airway’. Mortality: Indicates patient death, any time following ED visit. This variable is merged in from Social Security data via the electronic health record.

Data Dictionary

Emergency triage of Covid-19 patients using chest X-rays: A Nightingale Open Science dataset

BibTeX

ACM

APA

Chicago

Vancouver

BibTeX

ACM

APA

Chicago

Vancouver

The problem

Dataset overview

Our partner

Versions

Dataset schema

Table of contents