Skip to main content
Gemma Boleda

Gemma Boleda

Universitat Pompeu Fabra

Engineering Sciences

I am an ICREA Research Professor in the Department of Translation and Language Sciences of the Universitat Pompeu Fabra, where I head the Computational Linguistics and Linguistic Theory (COLT) research group. I previously held post-doctoral positions at the Department of Linguistics of The University of Texas at Austin and the CIMEC Center for Brain/Mind Sciences of the University of Trento; before that, I graduated in Spanish Philology at the Universitat Autònoma de Barcelona and obtained my PhD at the Universitat Pompeu Fabra. I am a member of the standing review committee of the TACL journal. I was an Information Officer of the SIGSEM Board (2013-2020), and acted as area co-chair of ACL 2016, program co-chair of *SEM 2015, and local co-chair of ESSLLI 2015. I am currently funded by an ERC Starting Grant.


Research interests

I want to understand how language works; in particular, how humans convey meaning through language. I address this research question with a cross-disciplinary approach that integrates methodologies from Linguistics, Artificial Intelligence, and Cognitive Science.

The focus of my research are the mechanisms by which people put together information stored in their linguistic system (such as the lexicon or the grammar) with information coming from the specific situation they are in when they use language. For instance, given a picture of a chihuahua seen from quite afar, in a park, next to its owner, do people use “chihuahua”, or “dog”, or yet some other word? To understand this behavior, my team and I carry out statistical modeling and computational experiments, often using Machine Learning techniques. Our approach requires large amounts of data, and part of our work involves gathering linguistic data on a large scale.

Selected publications

– Westera M, Gupta A, Boleda G & Pado S 2021, ‘Distributional Models of Category Concepts Based on Names of Category Members‘, Cognitive Science, 45, 9, e13029.

– Aina L, Liao X, Boleda G & Westera M 2021, ‘Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution‘, Proceedings of CoNLL 2021, 454-469.

– Sorodoc I, Boleda G & Baroni M 2021, ‘Paper accepted at BlackBoxNLP 2021: Controlled tasks for model analysis: Retrieving discrete information from sequences‘, Proceedings of BlackBoxNLP 2021, 468-478


Selected research activities

  • Invited talk at Abralin Ao Vivo – Linguists Online: “When do languages use the same word for different meanings? The Goldilocks Principle in the lexicon”.
  • Released v2 of ManyNames, a dataset with 25,000 objects in images associated to 36 human-provided names each.

ICREA Memoir 2021