Gemma Boleda

Gemma Boleda

Universitat Pompeu Fabra

Engineering Sciences

I am an ICREA Research Professor in the Department of Translation and Language Sciences of the Universitat Pompeu Fabra, where I head the Computational Linguistics and Linguistic Theory (COLT) research group. I previously held post-doctoral positions at the Department of Linguistics of The University of Texas at Austin and the CIMEC Center for Brain/Mind Sciences of the University of Trento; before that, I graduated in Spanish Philology at the Universitat Autònoma de Barcelona and obtained my PhD at the Universitat Pompeu Fabra. I am a member of the standing review committee of the TACL journal. I was an Information Officer of the SIGSEM Board (2013-2020), and acted as area co-chair of ACL 2016, program co-chair of *SEM 2015, and local co-chair of ESSLLI 2015. I am currently funded by an ERC Starting Grant.


Research interests

I want to understand how language works; in particular, how humans convey meaning through language. I address this research question with an approach that I call "Data Science for Linguistics", integrating methodologies from Linguistics and Artificial Intelligence.

The focus of my research are the mechanisms by which people put together generic information stored in our linguistic system (such as the lexicon or the grammar) with information coming from the specific situations speakers are in when they use language. For instance, given a picture of a chihuahua seen from quite afar, in a park, next to its owner, do people use "chihuahua", or "dog", or yet some other word? To understand this behavior, my team and me carry out statistical modeling and computational experiments, often using Machine Learning techniques. Our approach requires large amounts of data, and part of our work involves gathering linguistic data on a large scale.

Selected publications

- Boleda G 2020, 'Distributional Semantics and Linguistic Theory', Annual Review Of Linguistics, Vol 6, 6, pp 213-234.

- Sorodoc I & Gulordava K & Boleda G 2020, 'Probing for Referential Information in Language Models', 58th Annual Meeting Of The Association For Computational Linguistics (ACL 2020), 4177 - 4189.

- Westera M & Boleda G 2020, 'A closer look at scalar diversity using contextualized semantic similarity', Proceedings of Sinn Und Bedeutung, 24(2), pp 439-454.

- Aina L & Brochhagen T & Boleda G 2020, 'Modeling word interpretation with deep language models: The interaction between expectations and lexical information', Proceedings of CogSci 2020, pp 1518-1524.

- Gulordava K & Brochhagen T & Boleda G 2020, 'Deep daxes: Mutual exclusivity arises through both learning biases and pragmatic strategies in neural networks', Proceedings of CogSci 2020, pp 2089-2095.


Selected research activities

Publicly released ManyNames, a dataset with 25,000 natural images with 36 names/image provided by human subjects, https://github.com/amore-upf/manynames.

Three invited talks in international centers (further 4 talks had to be cancelled due to illness), a.o. one at the CLASP Seminar, Gothenburg, Sweden.