Hello! I'm a recent UC Berkeley graduate based in Boston. Currently, I'm a Software Development Engineer at Amazon Web Services,
working on Data Services for Elastic Block Store.
Last year, I graduated from UC Berkeley with bachelor's degrees in Cognitive Science (Highest Honors) and Computer Science, where I received the Glushko Prize for Outstanding Undergraduate Research in Cognitive Sciences. I also interned as a software development engineer at
Workday and The Factual, and a data scientist at Applied Materials. I taught both introductory and advanced courses in data science and computational cognitive science.
I'm interested in how humans use, represent, and learn language from a computational perspective,
and what aspects of linguistic knowledge are encoded by models used for natural language processing.
This can improve both our understanding of language's role in human cognition and also how language technologies
can achieve more human-like capabilities. My research has used both behavioral experiments and corpus analyses.
Contextualized Word Embeddings Capture Human-Like Relations Between English Word Senses
Oral Presentation for Cognitive Aspects of the Lexicon workshop(CogALex VI) at International Conference on Computational Linguistics (COLING), 2020
Undergraduate Honors Thesis, advised by Dr. Meylan, Prof. Srinivasan, and Prof. Steven Piantadosi
Code and Data
We investigate whether recent advances in NLP (specifically the Transformer-based neural network model BERT), are able to capture human-like distinctions between
meanings of the same word, such as polysemy and homonymy. We collect human judgements of the relatedness of selected WordNet senses for 32 English words
from a two-dimensional spatial arrangement task, and compare them with relatedness according to BERT vectors for these corresponding senses in the SemCor corpus.
We demonstrate participants’ judgments of the relatedness between senses are correlated with distances between senses in the BERT embedding space, and that BERT encodes
homonymous sense relations closer to human judgements than polysemous ones.
Evaluating Models of Robust Word Recognition with Serial Reproduction.
Published in May 2021 issue of Cognition journal
Journal Article |
Preprint (full text)
We compared how several probabilistic generative language models, such as n-grams, probabilistic context free grammars (PCFGs), and neural networks,
capture human linguistic expectations in a web-based serial reproduction task, in which in which participants try to repeat sentences said by other participants,
similar to a game of "Telephone." We found that models that make use of preceding context, especially those with abstract representations of linguistic structure, best predict changes participants
made when trying to reproduce utterances in the experiment. I contributed to designing and implementing parts of the experimental interface,
extracting probabilities under PCFGs, modeling which words in utterances were most likely to change under the models, and revising the final paper.
Last year, I helped with a corpus analysis comparing the
child-directed and adult-directed speech under Ruthe Foushee (
Language and Cognitive Development Lab, PI: Prof. Mahesh Srinivasan). For this project, I extracted text from the target corpora (such as CHILDES and Santa Barbara),
and ran exploratory analyses and permutation tests on the surprisal of child-directed and adult-directed speech.
I also participated in Berkeley's Linguistic Research Apprentices Practicum (LRAP), working on annotating lexical data for
FrameNet and explored expanding the database
with word embeddings. This work was done under Dmetri Hayes at the International Computer Science Institute (PI: Dr. Collin Baker).