Erin Pacquetet, Ph.D

Senior Data Scientist & Linguist

Contact me

About me

Data scientist who specializes in NLP, Computational Linguistic techniques, and the implementation of LLM-based Generative AI strategies. I am passionate about bringing my Linguistic knowledge and Computer Science skills together to provide creative solutions to real-world problems. I have hands-on professional experience building AI-powered applications such as RAG chatbots from scratch. My work includes architecting AI systems, designing optimized LLM prompts, implementing guardrails, and developing evaluation datasets and methodologies to measure performance.

Before joining the private sector, I completed a Ph.D in Linguistics at the University at Buffalo, where I worked on the linguistic analysis of typing patterns.

My dissertation focused on the linguistic analysis of typing patterns, with the aim of understanding the relationship between how someone types and what they are typing. I am interested in how one can use computational techniques to investigate production in time in order to better understand the production processes at play and how they are influenced by the linguistic characteristics of the language being typed.

Alongside my technical work, I'm an experienced public speaker and educator. I've given talks and conference presentations across Europe and the United States on topics ranging from bias in speech models to evaluation frameworks for generative AI. and I regularly speak at tech and academic events to bridge the gap between research and practice.

CV

My full academic CV in .pdf format can be found here

Education:
2024 Ph.D in Linguistics, University at Buffalo
2021 M.S in Computational Linguistics, University at Buffalo
2018 M.A in English Linguistics, Université Paris Diderot
2016 B.A in Anglophone Studies, Université Paris Diderot

Publications: (selected)
2026 Is Prosody Encoded in the Neural Audio Codec Mimi? - Ballier, Saloev, Pacquetet, Arnold. Speech Prosody 2026 Proceedings
2026 Building Trust in LLM Products: A Production-Ready Evaluation Framework - Pacquetet, Séjourné. LREC 2026 Industry Day
2025 Predicting CEFR levels for learners of English with keylogging metrics, an exploratory study - Al Sawar, Pacquetet, Mallart, Simpkin, Ballier. CORIA-TALN-RJCRI-RECITAL 2025
2024 Logging Keystrokes in Writing by English Learners - Velentzas, Caines, Borgo, Pacquetet, Hamilton, Arnold, Nicholls, Buttery, Gaillat, Yannakoudakis, Ballier. LREC-COLING 2024
2021 Proto: A Neural Cocktail for Generating Appealing Conversations - Saha, Das, Soper, Pacquetet, Srihari. In 4th Proceedings of the Alexa Prize

Skills:
Human Languages: French (native), English (Bilingual), German (B1)
Programming Languages: Python, R, Perl, JavaScript
Libraries & Frameworks: NLTK, CoreNLP, spaCy, scikit-learn, Keras, TensorFlow, PyTorch, Pandas, NumPy, LangChain, LangGraph, Claude Code
Data & Visualization: SQL, Tableau, Power BI, Matplotlib, ggplot2
Tools: Git, GitHub, Jupyter, LaTeX, Visual Studio, Azure