About Me

I am currently a PhD student at UC Berkeley Linguistics. My research lies at the intersection of computational linguistics, deep learning, and cognitive science.

I hold a BA in Modern Languages with a focus on English Studies from the University of Deusto (Bilbao, Spain) and an MA in Romance Languages (Spanish Linguistics) from the University of Alabama.

My research focuses on computational linguistics and natural language processing. I am deeply interested in the intersection between linguistics, cognition, and technology –understanding how humans process language and applying these insights to improve computational language models.

I am the founder of the Visibility Project, an initiative to map endangered languages and drive their NLP research. Additionally, I've made significant contributions to the AYA project by Cohere for AI.



Research Interests

Computational Linguistics

Developing computational models of language to understand linguistic phenomena from a formal perspective.

Natural Language Processing

Creating NLP solutions for low-resource languages and improving multilingual capabilities of language models. Research on tokenization strategies, cross-lingual transfer learning, and data-efficient approaches.

Deep Learning for Language

Exploring new architectures and training methodologies for language modeling with attention to efficiency, adaptability, and linguistic accuracy across diverse languages.

Brain, Language & Computation

Investigating the neurological basis of language processing and modeling cognitive mechanisms in computational frameworks. Bridging psycholinguistics with neural network behavior.



Publications

  • Morphological Typology in BPE Subword Productivity and Language Modeling
    Parra, I. (2024)
    NeurIPS 2024
  • Noise Be Gone: Does Speech Enhancement Distort Linguistic Nuances?
    Parra, I. (2024)
    ACL 2024
  • UnMASKed: Quantifying Gender Biases in Language Models through Linguistically Informed Job Market Queries
    Parra, I. (2023)
    EACL 2024
  • Do You Speak Basquenglish? Assessing Low-resource Multilingual Proficiency of Pretrained Language Models
    Parra, I. (2023)
    EMNLP 2023
  • The Turing Test Meets Dungeons & Dragons: A Comparative Study of Role-Playing and Standard Prompts in Language Models
    Parra, I. (2023)
    (Under Review)
  • Asymmetrical p-stranding: Acceptability Data from Spanish-English Code-switching
    Koronkiewicz, B.; VanMeter, R.; Parra, I. (2023)
    (In preparation)
  • Language and Opinion Online: Beliefs on COVID-19 and Socio-political Implications
    Parra, I. (2020)
    Undergraduate thesis


Projects

The Visibility Project

The Visibility Project is an interactive map to locate endangered or low-resource languages (LRLs). It has a focus on natural language processing; we aim at making LRLs more visible and drag attention for their research. Each language is represented by a marker on the map, the bigger the size, the more speakers it has. The color represents the NLP attention it receives. Clicking on a marker will display a popup with information about the language. The popup also contains a link to the language's NLP papers, if available.

Visit the Visibility Project

AYA: Cohere For AI

I collaborated in the development of AYA, an initiative to accelerate multilingual AI progress. It is part of Cohere for AI, a project to make AI more accessible. I was the ambassador and main contributor for the Basque language. As such, I was in charge of social social-media news and updates for this language and the translation of official documents of the project. Apart from Basque, I also contributed to Spanish and English.

News on this! Aya is now an open source SOTA multilingual dataset. It is now vailable on Huggingface🤗 Datasets! Cohere's and C4AI new open source model, ⌘R+, was trained with our work! The model is now available on Huggingface🤗 Models!

AYA



Visual Explanations

I create mathematical animations to visualize complex concepts in computational linguistics and NLP. These demonstrations use Manim (Mathematical Animation Engine) to illustrate key ideas in an intuitive way.

Sinusoidal Position Encodings

This animation demonstrates how sinusoidal positional encodings work in transformer models.

Transformers Deep Learning

Query, Key, and Value

A visualization of the Query, Key, and Value matrices in transformer attention mechanisms.

Attention NLP

The GAN Objective

Breaking down the mathematical formulation of Generative Adversarial Networks.

GAN Machine Learning


Other Interests

Outside of my academic life, I enjoy cooking, photography, and staying active. I also enjoy videogames. Below are some snapshots from my personal life.