TY - JOUR
T1 - Multi-condition machine learning models for understanding retention mechanisms and predicting retention time in supercritical fluid chromatography/mass spectrometry
AU - Heravizadeh, Omidreza
AU - Nakatani, Kohta
AU - Tomiyasu, Noriyuki
AU - Torigoe, Taihei
AU - Yamashita, Toshiyuki
AU - Takahashi, Masatomo
AU - Izumi, Yoshihiro
AU - Bamba, Takeshi
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2026/2/1
Y1 - 2026/2/1
N2 - Background: Modern supercritical fluid chromatography (SFC) enables fast and efficient separations owing to the low viscosity and high diffusivity of supercritical mobile phases. However, its retention mechanisms remain incompletely understood, limiting method development and confident compound identification in SFC/MS. In this study, the retention times (RTs) of 1217 compounds measured under 51 chromatographic conditions—covering 15 stationary phases, three modifier chemistries (neutral, acidic, and basic), and two gradient programs—were analyzed to develop RT prediction models and elucidate the underlying retention mechanisms. Results: Gradient boosting (GB) models were first trained separately for each condition using the measured RTs together with 2285 molecular descriptors. Then, for the first time, system descriptors encoding chromatographic conditions (i.e., stationary phase, modifier, and gradient type) were introduced to integrate these individual models into multi-condition models. These models achieved high predictive accuracy, with R2 values of 0.951 and 0.923 and mean absolute errors (MAE) of 0.613 and 0.520 min for Gradients 1 (G1) and 2 (G2), respectively. To interpret retention mechanisms, GB-selected descriptors were quantified using partial least squares (PLS), classified into 10 physicochemical categories, and evaluated using the normalized combination effect (nCE) across conditions. Subsequently, RT shift analysis revealed the most pronounced differences between neutral and acidic media. Finally, heatmaps for each stationary phase summarized peak quality and detection percentages for functional group clusters. Significance: By introducing system descriptors, this study established multi-condition RT prediction models that accurately predict retention across diverse SFC conditions. Moreover, comprehensive descriptor-based analysis under 51 conditions elucidated the underlying retention mechanisms and provided a practical framework for selecting optimal analytical conditions.
AB - Background: Modern supercritical fluid chromatography (SFC) enables fast and efficient separations owing to the low viscosity and high diffusivity of supercritical mobile phases. However, its retention mechanisms remain incompletely understood, limiting method development and confident compound identification in SFC/MS. In this study, the retention times (RTs) of 1217 compounds measured under 51 chromatographic conditions—covering 15 stationary phases, three modifier chemistries (neutral, acidic, and basic), and two gradient programs—were analyzed to develop RT prediction models and elucidate the underlying retention mechanisms. Results: Gradient boosting (GB) models were first trained separately for each condition using the measured RTs together with 2285 molecular descriptors. Then, for the first time, system descriptors encoding chromatographic conditions (i.e., stationary phase, modifier, and gradient type) were introduced to integrate these individual models into multi-condition models. These models achieved high predictive accuracy, with R2 values of 0.951 and 0.923 and mean absolute errors (MAE) of 0.613 and 0.520 min for Gradients 1 (G1) and 2 (G2), respectively. To interpret retention mechanisms, GB-selected descriptors were quantified using partial least squares (PLS), classified into 10 physicochemical categories, and evaluated using the normalized combination effect (nCE) across conditions. Subsequently, RT shift analysis revealed the most pronounced differences between neutral and acidic media. Finally, heatmaps for each stationary phase summarized peak quality and detection percentages for functional group clusters. Significance: By introducing system descriptors, this study established multi-condition RT prediction models that accurately predict retention across diverse SFC conditions. Moreover, comprehensive descriptor-based analysis under 51 conditions elucidated the underlying retention mechanisms and provided a practical framework for selecting optimal analytical conditions.
KW - Machine learning models
KW - Mass spectrometry
KW - Quantitative structure–retention relationship
KW - Retention mechanism
KW - Retention time prediction
KW - Supercritical fluid chromatography
UR - https://www.scopus.com/pages/publications/105025661136
UR - https://www.scopus.com/pages/publications/105025661136#tab=citedBy
U2 - 10.1016/j.aca.2025.345026
DO - 10.1016/j.aca.2025.345026
M3 - Article
AN - SCOPUS:105025661136
SN - 0003-2670
VL - 1385
JO - Analytica Chimica Acta
JF - Analytica Chimica Acta
M1 - 345026
ER -