TY - JOUR
T1 - Educational Data Analysis using Generative AI
AU - Berr, Abdul
AU - Leelaluk, Sukrit
AU - Tang, Cheng
AU - Chen, Li
AU - Okubo, Fumiya
AU - Shimada, Atsushi
N1 - Publisher Copyright:
© 2024 CEUR-WS. All rights reserved.
PY - 2024
Y1 - 2024
N2 - With the advent of generative artificial intelligence (AI), the scope of data analysis, prediction of performances, real-time feedback, etc. in learning analytics has widened. The purpose of this study is to explore the possibility of using generative AI to analyze educational data. Moreover, the performances of two large language models (LLMs): GPT-4 and text-davinci-003, are compared with respect to different types of analyses. Additionally, a framework, LangChain, is integrated with the LLM in order to achieve deeper insights into the analysis, which can be beneficial for beginner data scientists. LangChain has a component called an agent, which can help study the analysis being performed step-by-step. Furthermore, the impact of the OpenLA library, which pre-processes the data by calculating the number of reading seconds of students, counting the number of operations performed by students, and making page-wise behavior of each student, is also studied. Besides, factors with the most significant impact on students’ performances were also discovered in this analysis. The results show that GPT-4, when using the data pre-processed by OpenLA, provides the best analysis in terms of both, the accuracy of the final answer, and the step-by-step insights provided by LangChain’s agent. Also, we learn the significance of reading time and interactions used (Add marker, bookmark, memo) by students in predicting grades.
AB - With the advent of generative artificial intelligence (AI), the scope of data analysis, prediction of performances, real-time feedback, etc. in learning analytics has widened. The purpose of this study is to explore the possibility of using generative AI to analyze educational data. Moreover, the performances of two large language models (LLMs): GPT-4 and text-davinci-003, are compared with respect to different types of analyses. Additionally, a framework, LangChain, is integrated with the LLM in order to achieve deeper insights into the analysis, which can be beneficial for beginner data scientists. LangChain has a component called an agent, which can help study the analysis being performed step-by-step. Furthermore, the impact of the OpenLA library, which pre-processes the data by calculating the number of reading seconds of students, counting the number of operations performed by students, and making page-wise behavior of each student, is also studied. Besides, factors with the most significant impact on students’ performances were also discovered in this analysis. The results show that GPT-4, when using the data pre-processed by OpenLA, provides the best analysis in terms of both, the accuracy of the final answer, and the step-by-step insights provided by LangChain’s agent. Also, we learn the significance of reading time and interactions used (Add marker, bookmark, memo) by students in predicting grades.
KW - Data Analysis
KW - Generative AI
KW - LangChain
KW - Large Language Models
KW - OpenLA
UR - http://www.scopus.com/inward/record.url?scp=85191991256&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191991256&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85191991256
SN - 1613-0073
VL - 3667
SP - 47
EP - 55
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2024 Joint of International Conference on Learning Analytics and Knowledge Workshops, LAK-WS 2024
Y2 - 18 March 2024 through 22 March 2024
ER -