TY - GEN
T1 - Comparison of Large Language Models for Generating Contextually Relevant Questions
AU - Lodovico Molina, Ivo
AU - Svabensky, Valdemar
AU - Minematsu, Tsubasa
AU - Chen, Li
AU - Okubo, Fumiya
AU - Shimada, Atsushi
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - This study explores the effectiveness of Large Language Models (LLMs) for Automatic Question Generation in educational settings. Three LLMs are compared in their ability to create questions from university slide text without fine-tuning. Questions were obtained in a two-step pipeline: first, answer phrases were extracted from slides using Llama 2-Chat 13B; then, the three models generated questions for each answer. To analyze whether the questions would be suitable in educational applications for students, a survey was conducted with 46 students who evaluated a total of 246 questions across five metrics: clarity, relevance, difficulty, slide relation, and question-answer alignment. Results indicate that GPT-3.5 and Llama 2-Chat 13B outperform Flan T5 XXL by a small margin, particularly in terms of clarity and question-answer alignment. GPT-3.5 especially excels at tailoring questions to match the input answers. The contribution of this research is the analysis of the capacity of LLMs for Automatic Question Generation in education.
AB - This study explores the effectiveness of Large Language Models (LLMs) for Automatic Question Generation in educational settings. Three LLMs are compared in their ability to create questions from university slide text without fine-tuning. Questions were obtained in a two-step pipeline: first, answer phrases were extracted from slides using Llama 2-Chat 13B; then, the three models generated questions for each answer. To analyze whether the questions would be suitable in educational applications for students, a survey was conducted with 46 students who evaluated a total of 246 questions across five metrics: clarity, relevance, difficulty, slide relation, and question-answer alignment. Results indicate that GPT-3.5 and Llama 2-Chat 13B outperform Flan T5 XXL by a small margin, particularly in terms of clarity and question-answer alignment. GPT-3.5 especially excels at tailoring questions to match the input answers. The contribution of this research is the analysis of the capacity of LLMs for Automatic Question Generation in education.
KW - AI in Education
KW - Generative AI
KW - Question Generation
UR - http://www.scopus.com/inward/record.url?scp=85205312201&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205312201&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72312-4_18
DO - 10.1007/978-3-031-72312-4_18
M3 - Conference contribution
AN - SCOPUS:85205312201
SN - 9783031723117
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 137
EP - 143
BT - Technology Enhanced Learning for Inclusive and Equitable Quality Education - 19th European Conference on Technology Enhanced Learning, EC-TEL 2024, Proceedings
A2 - Ferreira Mello, Rafael
A2 - Rummel, Nikol
A2 - Jivet, Ioana
A2 - Pishtari, Gerti
A2 - Ruipérez Valiente, José A.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th European Conference on Technology Enhanced Learning, EC-TEL 2024
Y2 - 16 September 2024 through 20 September 2024
ER -