TY - JOUR
T1 - The HCUP SID Imputation Project
T2 - Improving Statistical Inferences for Health Disparities Research by Imputing Missing Race Data
AU - Ma, Yan
AU - Zhang, Wei
AU - Lyman, Stephen
AU - Huang, Yihe
N1 - Funding Information:
Figure 4: Evaluation of Post-Imputation Performance: Root Mean Square Difference (RMSD) of Coefficient Estimates for Linear Regression for Length of Stay 2009). Other efforts, such as imputation, aim to reduce the impact of existing missing data on disparities research (Mulugeta et al. 2012). This project was funded by AHRQ to impute missing data, including race, in the SID. Upon completion of the study, an imputed version of the SID will be available for public use.
Funding Information:
Joint Acknowledgment/Disclosure Statement: This research was supported by grant funding from the Agency for Healthcare Research and Quality (R01HS21734). We sincerely thank Dr. Andrew Gelman at Columbia University for his substantial contributions to the success of the project. Disclosures: None. Disclaimers: None.
Publisher Copyright:
© Health Research and Educational Trust
PY - 2018/6
Y1 - 2018/6
N2 - Objective: To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. Data Sources/Study Setting: HCUP SID. Study Design: A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Principal Findings: Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conclusions: Conditional MI substantially improved statistical inferences for racial health disparities research with the SID.
AB - Objective: To identify the most appropriate imputation method for missing data in the HCUP State Inpatient Databases (SID) and assess the impact of different missing data methods on racial disparities research. Data Sources/Study Setting: HCUP SID. Study Design: A novel simulation study compared four imputation methods (random draw, hot deck, joint multiple imputation [MI], conditional MI) for missing values for multiple variables, including race, gender, admission source, median household income, and total charges. The simulation was built on real data from the SID to retain their hierarchical data structures and missing data patterns. Additional predictive information from the U.S. Census and American Hospital Association (AHA) database was incorporated into the imputation. Principal Findings: Conditional MI prediction was equivalent or superior to the best performing alternatives for all missing data structures and substantially outperformed each of the alternatives in various scenarios. Conclusions: Conditional MI substantially improved statistical inferences for racial health disparities research with the SID.
UR - http://www.scopus.com/inward/record.url?scp=85018451831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018451831&partnerID=8YFLogxK
U2 - 10.1111/1475-6773.12704
DO - 10.1111/1475-6773.12704
M3 - Article
C2 - 28474359
AN - SCOPUS:85018451831
SN - 0017-9124
VL - 53
SP - 1870
EP - 1889
JO - Health Services Research
JF - Health Services Research
IS - 3
ER -