How Accurately Do Large Language Models Understand Code?
https://arxiv.org/html/2504.04372v1
"This paper presents the first large-scale empirical investigation into the ability of LLMs to understand code. Inspired by mutation testing, we use an LLM’s ability to find faults as a proxy for its deep understanding of code. This approach is based on the insight that a model capable of identifying subtle functional discrepancies must understand the code well."
It appears that coding LLMs are vulnerable to misleading code comments, misleading variable names and misleading dead code. They still have shallow understanding of code, based on syntax and tokenization designed for natural languages, instead of analyzing code semantics. Writing a lot of incorrect comments can confuse them