Iñigo Parra | Computational Linguistics Researcher

About Me

I am currently a PhD student at UC Berkeley Linguistics. My research lies at the intersection of computational linguistics, deep learning, and cognitive science.

I hold a BA in Modern Languages with a focus on English Studies from the University of Deusto (Bilbao, Spain) and an MA in Romance Languages (Spanish Linguistics) from the University of Alabama.

My research focuses on computational linguistics and natural language processing. I am deeply interested in the intersection between linguistics, cognition, and technology –understanding how humans process language and applying these insights to improve computational language models.

I am the founder of the Visibility Project, an initiative to map endangered languages and drive their NLP research. Additionally, I've made significant contributions to the AYA project by Cohere for AI.

Research Interests

Computational Linguistics

Developing computational models of language to understand linguistic phenomena from a formal perspective.

Natural Language Processing

Creating NLP solutions for low-resource languages and improving multilingual capabilities of language models. Research on tokenization strategies, cross-lingual transfer learning, and data-efficient approaches.

Deep Learning for Language

Exploring new architectures and training methodologies for language modeling with attention to efficiency, adaptability, and linguistic accuracy across diverse languages.

Brain, Language & Computation

Investigating the neurological basis of language processing and modeling cognitive mechanisms in computational frameworks. Bridging psycholinguistics with neural network behavior.

Publications

Morphological Typology in BPE Subword Productivity and Language Modeling

Parra, I. (2024)

NeurIPS 2024

Paper
Noise Be Gone: Does Speech Enhancement Distort Linguistic Nuances?

Parra, I. (2024)

ACL 2024

Paper
UnMASKed: Quantifying Gender Biases in Language Models through Linguistically Informed Job Market Queries

Parra, I. (2023)

EACL 2024

Paper
Do You Speak Basquenglish? Assessing Low-resource Multilingual Proficiency of Pretrained Language Models

Parra, I. (2023)

EMNLP 2023

Paper
The Turing Test Meets Dungeons & Dragons: A Comparative Study of Role-Playing and Standard Prompts in Language Models

Parra, I. (2023)

(Under Review)
Asymmetrical p-stranding: Acceptability Data from Spanish-English Code-switching

Koronkiewicz, B.; VanMeter, R.; Parra, I. (2023)

(In preparation)
Language and Opinion Online: Beliefs on COVID-19 and Socio-political Implications

Parra, I. (2020)

Undergraduate thesis

Projects

The Visibility Project

The Visibility Project is an interactive map to locate endangered or low-resource languages (LRLs). It has a focus on natural language processing; we aim at making LRLs more visible and drag attention for their research. Each language is represented by a marker on the map, the bigger the size, the more speakers it has. The color represents the NLP attention it receives. Clicking on a marker will display a popup with information about the language. The popup also contains a link to the language's NLP papers, if available.

Visit the Visibility Project

AYA: Cohere For AI

I collaborated in the development of AYA, an initiative to accelerate multilingual AI progress. It is part of Cohere for AI, a project to make AI more accessible. I was the ambassador and main contributor for the Basque language. As such, I was in charge of social social-media news and updates for this language and the translation of official documents of the project. Apart from Basque, I also contributed to Spanish and English.

News on this! Aya is now an open source SOTA multilingual dataset. It is now vailable on Huggingface🤗 Datasets! Cohere's and C4AI new open source model, ⌘R+, was trained with our work! The model is now available on Huggingface🤗 Models!

AYA

Visual Explanations

I create mathematical animations to visualize complex concepts in computational linguistics and NLP. These demonstrations use Manim (Mathematical Animation Engine) to illustrate key ideas in an intuitive way.

Sinusoidal Position Encodings

This animation demonstrates how sinusoidal positional encodings work in transformer models.

Transformers Deep Learning

Query, Key, and Value

A visualization of the Query, Key, and Value matrices in transformer attention mechanisms.

Attention NLP

The GAN Objective

Breaking down the mathematical formulation of Generative Adversarial Networks.

GAN Machine Learning

Some of my favourite quotes

"God could have caused birds to fly with their bones made of solid gold, with their veins full of quicksilver, with their flesh heavier than lead, and with their wings exceedingly small. He did not, and that ought to show something. It is only in order to shield your ignorance that you put the Lord at every turn to the refuge of a miracle."

Galileo Galilei (1564-1642)
"All models are wrong, but some are useful."

George E. P. Box (1919-2013)
"Intuitions have been tacitly granted a privileged position in generative grammar. The result has been the construction of elaborate theoretical edifices supported by disturbingly shaky empirical evidence."

Wasow & Arnold, (2004; p. 1482)
"Being a native speaker doesn’t confer papal infallibility on one’s intuitive judgments."

Raven McDavid (1985)

Other Interests

Outside of my academic life, I enjoy cooking, photography, and staying active. I also enjoy videogames. Below are some snapshots from my personal life.