How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs
Assesses LLM comprehension of narrative temporal aspect with an expert-in-the-loop probe pipeline, revealing over-reliance on prototypicality, inconsistent aspect judgments, and weak causal reasoning, and introducing a standardized framework for evaluating cognitive–linguistic capabilities.