Iñigo Parra

About Me

I am currently a PhD student at UC Berkeley Linguistics (starting August 2024).

I got a BA in Modern Languages with a focus on English Studies at the University of Deusto (Bilbao, Spain). I got my MA in Romance Languages (Spanish Linguistics) at the University of Alabama.

Currently, my research is focused on computational linguistics and natural language processing. Inside these areas, I study new methods to make low-resource langauges (LRLs) more present in modern language technologies such as large language models' (LLMs). I am also interested at the intersection between linguistics, cognition, and technology. Through these disciplines, I aim at understanding how humans learn and process language and how we can use this knowledge to improve language modeling.

I am the founder of the Visibility Project, a project to map endangered languages and drive their NLP research. I have also made significant contributions to the AYA project by Cohere for AI.



Research

  • Parra, I. (2024). Morphological Typology in BPE Subword Productivity and Language Modeling. NeurIPS 2024
  • Parra, I. (2024). Noise Be Gone: Does Speech Enhancement Distort Linguistic Nuances? ACL 2024
  • Parra, I. (2023). UnMASKed: Quantifying Gender Biases in Language Models through Linguistically Informed Job Market Queries. EACL 2024
  • Parra, I. (2023). Do You Speak Basquenglish? Assessing Low-resource Multilingual Proficiency of Pretrained Language Models. EMNLP 2023
  • Parra, I. (2023). The Turing Test Meets Dungeons & Dragons: A Comparative Study of Role-Playing and Standard Prompts in Language Models. (Under Review)
  • Koronkiewicz, B.; VanMeter, R.; Parra, I. (2023). Asymmetrical p-stranding: Acceptability Data from Spanish-English Code-switching. (In prep.)
  • Parra, I. (2020). Language and Opinion Online: Beliefs on COVID-19 and Socio-political Implications. (Undergraduate thesis)


Projects

The Visibility Project

The Visibility Project is an interactive map to locate endangered or low-resource languages (LRLs). It has a focus on natural language processing; we aim at making LRLs more visible and drag attention for their research. Each language is represented by a marker on the map, the bigger the size, the more speakers it has. The color represents the NLP attention it receives. Clicking on a marker will display a popup with information about the language. The popup also contains a link to the language's NLP papers, if available.

Visit the Visibility Project

AYA: Cohere For AI

I collaborated in the development of AYA, an initiative to accelerate multilingual AI progress. It is part of Cohere for AI, a project to make AI more accessible. I was the ambassador and main contributor for the Basque language. As such, I was in charge of social social-media news and updates for this language and the translation of official documents of the project. Apart from Basque, I also contributed to Spanish and English.

News on this! Aya is now an open source SOTA multilingual dataset. It is now vailable on Huggingface🤗 Datasets! Cohere's and C4AI new open source model, ⌘R+, was trained with our work! The model is now available on Huggingface🤗 Models!

AYA



Other Interests

I like cooking, staying active, and photography. I am also a huge fan of house and latin music. These are some moments I enjoyed in the past!



Music

In my free time I also make music. Here are some of my songs!