TY - JOUR
T1 - Quantum chemical calculation dataset for representative protein folds by the fragment molecular orbital method
AU - Takaya, Daisuke
AU - Ohno, Shu
AU - Miyagishi, Toma
AU - Tanaka, Sota
AU - Okuwaki, Koji
AU - Watanabe, Chiduru
AU - Kato, Koichiro
AU - Tian, Yu Shi
AU - Fukuzawa, Kaori
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/12
Y1 - 2024/12
N2 - The function of a biomacromolecule is not only determined by its three-dimensional structure but also by its electronic state. Quantum chemical calculations are promising non-empirical methods available for determining the electronic state of a given structure. In this study, we used the fragment molecular orbital (FMO) method, which applies to biopolymers such as proteins, to provide physicochemical property values on representative structures in the SCOP2 database of protein families, a subset of the Protein Data Bank. Our dataset was constructed by over 5,000 protein structures, including over 200 million inter-fragment interaction energies (IFIEs) and their energy components obtained by pair interaction energy decomposition analysis (PIEDA) using FMO-MP2/6-31 G*. Moreover, three basis sets, 6-31 G*, 6-31 G**, and cc-pVDZ, were used for the FMO calculations of each structure, making it possible to compare the energies obtained with different basis functions for the same fragment pair. The total data size is approximately 6.7 GB. Our dataset will be useful for functional analyses and machine learning based on the physicochemical property values of proteins.
AB - The function of a biomacromolecule is not only determined by its three-dimensional structure but also by its electronic state. Quantum chemical calculations are promising non-empirical methods available for determining the electronic state of a given structure. In this study, we used the fragment molecular orbital (FMO) method, which applies to biopolymers such as proteins, to provide physicochemical property values on representative structures in the SCOP2 database of protein families, a subset of the Protein Data Bank. Our dataset was constructed by over 5,000 protein structures, including over 200 million inter-fragment interaction energies (IFIEs) and their energy components obtained by pair interaction energy decomposition analysis (PIEDA) using FMO-MP2/6-31 G*. Moreover, three basis sets, 6-31 G*, 6-31 G**, and cc-pVDZ, were used for the FMO calculations of each structure, making it possible to compare the energies obtained with different basis functions for the same fragment pair. The total data size is approximately 6.7 GB. Our dataset will be useful for functional analyses and machine learning based on the physicochemical property values of proteins.
UR - http://www.scopus.com/inward/record.url?scp=85207493431&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85207493431&partnerID=8YFLogxK
U2 - 10.1038/s41597-024-03999-2
DO - 10.1038/s41597-024-03999-2
M3 - Article
C2 - 39443514
AN - SCOPUS:85207493431
SN - 2052-4463
VL - 11
JO - Scientific Data
JF - Scientific Data
IS - 1
M1 - 1164
ER -