Language is one of the hallmark characteristics of human cognition. For many scholars, it is also the critical capacity that distinguishes humans from other species. In recent years, considerable progress has been made in trying to determine whether this distinctly human characteristic can be learned in silico. These efforts to teach language to computers span across several decades, and have recently bloomed, thanks to the emergence of the so-called Large Language Models (LLMs).
LLMs are trained on vast amounts of data and form the basis of some popular Artificial Intelligence (AI) applications: search engines, machine translators, and audio-to-text converters. But what language skills do these models actually have? Do they do language like humans?
To investigate this, we systematically compared the grammatical skills of humans to those of the 3 best LLMs currently available. Both humans and LLMs were given a task that is straightforward, at least for the former: they were asked to identify on the spot whether a wide variety of sentences were grammatically well-formed. Both the humans who participated in this experiment and the models were asked a very simple question: ‘Is this sentence grammatically correct?’ The results showed that humans largely answered correctly, while the LLMs gave many wrong answers. Interestingly, the models were also found to suffer from a yes-response bias, which means that they adopted a default strategy of answering ‘yes’ most of the time, regardless of whether the prompt was correct or not.
AI tools are here to stay, and they can be helpful across a wide range of tasks, but as happens with all tools, their utility is constrained by their limitations. In this context, registering such shortcomings and limitations is urgent. Since most AI applications depend on understanding commands given in natural language,determining their limits in the understanding of language, as we have done in this study, is of vital importance.