Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system

Saori Tou, Koutarou Matsumoto, Asato Hashinokuchi, Fumihiko Kinoshita, Hideki Nakaguma, Yukio Kozuma, Rui Sugeta, Yasunobu Nohara, Takanori Yamashita, Yoshifumi Wakata, Tomoyoshi Takenaka, Kazunori Iwatani, Hidehisa Soejima, Tomoharu Yoshizumi, Naoki Nakashima, Masahiro Kamouchi

研究成果: ジャーナルへの寄稿学術誌査読

抄録

Introduction: The reliability of data-driven predictions in real-world scenarios remains uncertain. This study aimed to develop and validate a machine-learning-based model for predicting clinical outcomes using real-world data from an electronic clinical pathway (ePath) system. Methods: All available data were collected from patients with lung cancer who underwent video-assisted thoracoscopic surgery at two independent hospitals utilizing the ePath system. The primary clinical outcome of interest was prolonged air leak (PAL), defined as drainage removal more than 2 days post-surgery. Data-driven prediction models were developed in a cohort of 314 patients from a university hospital applying sparse linear regression models (least absolute shrinkage and selection operator, ridge, and elastic net) and decision tree ensemble models (random forest and extreme gradient boosting). Model performance was then validated in a cohort of 154 patients from a tertiary hospital using the area under the receiver operating characteristic curve (AUROC) and calibration plots. Results: To mitigate bias, variables with missing data related to PAL or those with high rates of missing data were excluded from the dataset. Fivefold cross-validation indicated improved AUROCs when utilizing key variables, even post-imputation of missing data. Dichotomizing continuous variables enhanced performance, particularly when fewer variables were employed in the decision tree ensemble models. Consequently, regression models incorporating seven key variables in complete case analysis demonstrated superior discriminatory ability for both internal (AUROCs: 0.77–0.84) and external cohorts (AUROCs: 0.75–0.84). These models exhibited satisfactory calibration in both cohorts. Conclusions: The data-driven prediction model implementing the ePath system exhibited adequate performance in predicting PAL post-video-assisted thoracoscopic surgery, optimizing variables and considering population characteristics in a real-world setting.

本文言語英語
ジャーナルLearning Health Systems
DOI
出版ステータス印刷中 - 2024

!!!All Science Journal Classification (ASJC) codes

  • 健康情報学
  • 公衆衛生学、環境および労働衛生
  • 健康情報管理

フィンガープリント

「Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル