Data preprocessing steps
-
Retrospective Data (ABeICU)
- Create a mastertable with patient demographics and prediction outcomes
- Keep first 24 hour data in Measurements, Interventions, Prescriptions
- Aggregate value_char and value_num in each table to have unique combination of Admission_ID and ITEM_ID
- Pivot table to have a dual index of Admission_ID and ITEM_ID, add ITEM_ID to column names
- Drop any column with 30% missing data (drop if 30% of patients do not have this item)
- Join the 3 tables back to mastertable to create a dataframe
-
Data Cleaning Steps
- Generate a dataset with creatinine measurements for deriving AKI status
- Derive AKI status for <24hr, 24-48hr, >48hr after ICU admission based on creatinine measurements
- Derive Delirium statusfor <24hr, 24-48hr, >48hr after ICU admission based on delirium measurements
- In SQL, generate 4 tables for mastertable, aggregated tables for measuremnts, prescriptions and interventions for further processing in python
- In python load the 4 tables and perform pivoting, filtering and imputation and save final df as a csv
-
Preprocessing
- Extract 1000 testing samples; Extract 50 common patients
- Set up X_train, y_train, X_test, y_test
- Set up Pipeline (Imblearn Pipeline)
- Impute missing values numeric (median) and categorical (most frequent)
- Scale values
- Turn gender into binary
- Randomforest (100 estimators) to select top 100 features
- SMOTE + RandomUnderSampler to medigate class imbalance
-
Training
- Predict Outcomes: 'ICU_LOS', 'HOSP_LOS', 'ICU_EXPIRE_FLAG', 'HOSP_EXPIRE_FLAG', '30_DAYS_EXPIRE_FLAG', 'DELIRIUM_FLAG', 'AKI'
- Regression:
- 'ICU_LOS',
- 'HOSP_LOS'
- Classification:
- 'HOSP_EXPIRE_FLAG',
- '30_DAYS_EXPIRE_FLAG',
- 'DELIRIUM_FLAG'
- 'AKI'
- Models:
- classification: logistic regression, SVM, Random Forest, XGBoost, NN
- regression: Elastic net, SVM, Random Forest, XGBoost, NN
-
Select best model bsed on median AUC
-
Predicting on testing set
-
Compare with clinician performances