Data dictionary v2

Table of contents
  1. File tree
  2. Slide biopsy mapping
  3. Outcomes
  4. Demographics
  5. Social determinants
  6. Comorbidities
  7. Treatments
  8. Pathology items
  9. Cancer diagnosis

Dataset Changes

Dataset Reorganization (Feb 2023)
  1. Demographic fields moved from outcomes.csv to demographics.csv.
  2. ndpi/ directory flattened. The year directories were removed.

File tree changes

.
└── brca-psj-path
   ├── ...    
   ├── v2
   │   ├── cancer-dx.csv
   │   ├── comorbidities.csv
+  │   ├── demographics.csv   
   │   ├── outcomes.csv
   │   ├── pathology-items.csv
   │   ├── slide-biopsy-map.csv
   │   ├── social-determinants.csv
   │   └── treatments.csv 
   └── ndpi
-      ├── 2016
       │   ├── 0035de3d-81ec-4945-a760-55518ba8b376.ndpi
       │   └── ...   
-      ├── 2017
       │   ├── 00a94273-e9ab-42f5-a47e-512a13e8603e.ndpi
       │   └── ...    
       ...   

Note

The dates in this dataset have been shifted by a random number of days. All dates for any particular patient have been shifted by the same amount in order to preserve the time duration between events.

Male patients made up less than 2% of biopsy patients and were excluded from the dataset.

File tree

.
└── brca-psj-path
    ├── ...    
    ├── v2
    │   ├── cancer-dx.csv
    │   ├── comorbidities.csv
    │   ├── demographics.csv   
    │   ├── outcomes.csv
    │   ├── pathology-items.csv
    │   ├── slide-biopsy-map.csv
    │   ├── social-determinants.csv
    │   └── treatments.csv 
    └── ndpi
        ├── 0035de3d-81ec-4945-a760-55518ba8b376.ndpi
        ├── 00a94273-e9ab-42f5-a47e-512a13e8603e.ndpi
        └── ...    

Slide biopsy mapping

slide-biopsy-map.csv

This table contains the mapping between the digital pathology image files and the corresponding biopsies. One or more slides are produced from the tissue samples of a biopsy procedure. The number of slides for each biopsy in the dataset can vary from 1 to 100.

Column Name Description Sample
slide_id Unique identifier for each digital pathology image c9cc2d38-a042-4883-9ab1-141e7b876678
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
slide_path Filepath for the NDPI file of the slide /path/to/{slide_id}.ndpi

Outcomes

outcomes.csv

This table contains outcomes for each biopsy case. There are patient that have multiple biopsies.

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
patient_ngsci_id Unique patient identifier 821a6ba7-f5aa-49d3-a4c6-313ff649b715
case_year Year of biopsy 2018
biopsy_dt Date of biopsy 2152-01-01
mortality If there is a record of patient death
0: no death record
1: death
1
death_dt Date of patient death 2155-08-24
in_registry Whether entries were found in the Providence cancer registry for the patient that match the time of the biopsy
0: not in cancer registry
1: in cancer registry
1
stage Cancer stage for patient in the year of the biopsy IA
strict_metastatic_dx Whether patient has a strict metastatic diagnosis as described in the documentation
0: no
1: yes
0
strict_metastatic_dx_dt Date of first strict metastatic disease diagnosis 2154-01-30

Demographics

demographics.csv

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
sex Sex of patient F
race Self identified race
1: White or Caucasian
2: Black or African American
3: American Indian or Alaska Native
4: Asian
5: Native Hawaiian or Pacific Islander
8: other
9: unknown
1
ethnicity Self identified ethnicity
0: Non-hispanic or Latino
1: Hispanic or Latino
9: unknown
0
birth_dt De-identified date of birth 2041-03-15

Social determinants

social-determinants.csv

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
bmi The last recording of BMI at or before the date of biopsy [units: kg/m2] 25
tobacco 0: no documented smoking
1: ICD10 codes F17.XX or Z72.0X
0

Comorbidities

comorbidities.csv

Comorbidities are those included in the Charleson comorbity index (CCI), and were obtained from patient charts using ICD-9 and ICD-10 codes. Comorbidites were only included if patients were diagnosed in the two years before the biopsy date.
For each included comorbidity, 0: does not have diagnosis, 1: has diagnosis.

Treatments

treatments.csv

This table contains the treatments for patient in Providence’s cancer registry. Treatment at another health system would not be recorded in this table. The following is a helpful resource SEER Program Coding and Staging Manual 2021.

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
cancer_registry_dx_dt Cancer diagnosis date 2156-01-01
most_definitive_surgical_procedure_cd For codes and additional detail, SEER 2021 Manual, Breast
Surgery Codes
22
most_definitive_radiation_modality_cd For codes and additional detail, SEER Program Coding and
Staging Manual 2021
, pg. 191
31
surgical_margin_cd For codes and additional detail, SEER Program Coding and
Staging Manual 2021
, pg. 166
8
radiation_summ_cd For codes and additional detail, SEER 2003 Code Manual, pg. 134a 1
chemo_summ_cd For codes and additional detail, SEER 2003 Code Manual, pg. 137b 87
immuno_therapy_cd For codes and additional detail, SEER 2003 Code Manual, pg. 139b 1
hormone_summ_cd For codes and additional detail, SEER 2003 Code Manual, pg. 138b 87
{therapeutic modality}_dt Multiple data items with date of administered therapy (i.e. rx_chemo_dt, first_surgery_dt, etc.) 2156-01-13
stg_dx_summ_cd For codes and additional detail, NAACCR archives 2

Pathology items

pathology-items.csv

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case b2423e0f-b92f-44ad-8d83-c45b0066a68a
grade_clinical Grade before any treatment
For codes and additional detail, NAACCR Site Specific Data Items, breast
2
grade_pathological Grade after resection
For codes and additional detail, NAACCR Site Specific Data Items, breast
2
er_summary 0: ER negative
1: ER positive
For codes and additional detail, NAACCR Site Specific Data Items, breast
1
pr_summary 0: PR negative
1: PR positive
For codes and additional detail, NAACCR Site Specific Data Items, breast
1
her2_summary 0: HER2 negative
1: HER2 positive
For codes and additional detail, NAACCR Site Specific Data Items, breast
0
multigene_signature_method For codes and additional detail, NAACCR Site Specific Data Items, breast 1
multigene_signature_result For codes and additional detail, NAACCR Site Specific Data Items, breast X4
response_neoadjuvant_therapy For codes and additional detail, SEER 2021 Manual, Neoadjuvant
treatment effect, breast
2

Cancer diagnosis

cancer-dx.csv

This table contains the diagnoses for the patient cohort with whole number ICD9 codes 174, 175, 196, 197, 198. These codes are for breast cancer and metastatic cancer diagnosis. The ‘strict’ metastatic diagnosis in the outcomes table is derived from these codes as described in the documentation.

Column Name Description Sample
biopsy_id Unique identifier for each biopsy case 8dc5bacb-5904-45c4-9136-8aa8e16c3711
icd9 ICD9 diagnosis codes for the patient with the whole number 174, 175, 196, 197, 198 174.1
dx_dt Date of diagnosis 2153-04-03

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.