Loading...
Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models
Macneil, Stephen ; Denny, Paul ; Tran, Andrew ; Leinonen, Juho ; Bernstein, Seth ; Hellas, Arto ; Sarsa, Sami ; Kim, Joanne
Macneil, Stephen
Denny, Paul
Tran, Andrew
Leinonen, Juho
Bernstein, Seth
Hellas, Arto
Sarsa, Sami
Kim, Joanne
Citations
Altmetric:
Genre
Conference proceeding
Date
2024-01-29
Advisor
Committee member
Department
Permanent link to this record
Collections
Files
Research Projects
Organizational Units
Journal Issue
DOI
https://doi.org/10.1145/3636243.3636245
Abstract
Identifying and resolving logic errors can be one of the most frustrating challenges for novices programmers. Unlike syntax errors, for which a compiler or interpreter can issue a message, logic errors can be subtle. In certain conditions, buggy code may even exhibit correct behavior – in other cases, the issue might be about how a problem statement has been interpreted. Such errors can be hard to spot when reading the code, and they can also at times be missed
by automated tests. There is great educational potential in automatically detecting logic errors, especially when paired with suitable feedback for novices. Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks, including generating and explaining code. These capabilities are closely linked to code syntax, which aligns with the next token prediction behavior of LLMs. On the other hand, logic errors relate to the runtime performance of code and thus may not be as well suited to analysis by LLMs. To explore this, we investigate the performance of two popular LLMs, GPT-3 and GPT-4, for detecting and providing a novice-friendly explanation of logic errors. We compare LLM performance with a large cohort of introductory computing students (𝑛 = 964) solving the same error detection task. Through a mixed-methods analysis of student and model responses, we observe significant improvement in logic error identification between the previous and current generation of LLMs, and find that both LLM generations significantly outperform students. We outline how such models could be integrated into computing education tools, and discuss their potential for supporting students
when learning programming.
Description
Citation
Stephen MacNeil, Paul Denny, Andrew Tran, Juho Leinonen, Seth Bernstein, Arto Hellas, Sami Sarsa, and Joanne Kim. 2024. Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models. In Australian Computing Education Conference (ACE 2024), January 29–February 02, 2024, Sydney, NSW, Australia. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3636243.3636245
Citation to related work
ACM
Has part
Proceedings of the 26th Australasian Computing Education Conference
ADA compliance
Embedded videos
License
Attribution CC BY
