
























Abstract:Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The effectiveness of worked examples could be enhanced by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the prevailing approach, student explanations are judged by their semantic similarity to an instructor's or domain expert's explanation. Given recent advances in LLM-based automated scoring, it remains unclear whether semantic similarity methods are still the most effective technique to automatically score textual student responses like essays or code explanations. Comparing these methods also requires quality datasets that offer distinctive features such as balanced class distributions and domain-specific labeled data for automated scoring tasks. In this paper, we present a rigorous comparison between LLMs and semantic similarity used for automated scoring, framed as a binary classification task.
| Subjects: | Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.21614 [cs.HC] |
| (or arXiv:2605.21614v1 [cs.HC] for this version) | |
| https://doi.org/10.48550/arXiv.2605.21614 arXiv-issued DOI via DataCite (pending registration) |
From: Arun Balajiee Lekshmi Narayanan [view email]
[v1]
Wed, 20 May 2026 18:22:22 UTC (341 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。