UMass NLP Seminar

This webpage will contain information on the Fall 2025 NLP seminar. See further below for information about the seminar as a course.

Schedule: Fall 2025

The NLP seminar takes place Wednesdays, 12:30pm - 1:45pm, in room LGRC A104 and on Zoom.

  • Sept. 10: organizational meeting
  • Sept. 17: Katrin Erk, UMass CICS and Linguistics, “Analyzing word token embeddings to assess meaning in context.” (In-person talk.)
    • Abstract: Word token embeddings constitute a condensed record of utterances of many speakers. This makes them interesting as data for lexical semantics, for questions like: Can word embeddings tell us about the structure of polysemous words, and which properties distinguish different usage groups? Do word embeddings encode meaning distinctions that differ from dictionary senses, and if so, is this data or noise? Can word embeddings help us assess the subtle meaning changes that constructions impose on their components? And, importantly: How can we test the performance of methods for analyzing word token embeddings, and under what circumstances can we rely on them? In this talk I discuss methods that we have been developing for studying lexical semantics through embeddings, and analyses we’ve done so far.
    • Related papers: A method for studying semantic construal in grammatical constructions with interpretable contextual embedding spaces (Chronis et al., ACL 2023) and Adjusting interpretable dimensions in embedding space with human judgements (Erk & Apidianaki, NAACL 2024).
  • Sept. 24: Catherine Arnett, EleutherAI, “Why do language models perform worse for morphologically complex languages?”
    • Abstract: Language models perform differently across languages. It has been previously suggested that morphological typology may explain some of this variability (Cotterell et al., 2018). We replicate previous analyses and find additional new evidence for a performance gap between agglutinative and fusional languages, where fusional languages, such as English, tend to have better language modeling performance than morphologically more complex languages like Turkish. We then propose and test three possible causes for this performance gap: morphological alignment of tokenizers, tokenization quality, and disparities in dataset sizes and measurement. We find some evidence that tokenization quality explains the performance gap, but none for the role of morphological alignment. Instead we find that the performance gap is most reduced when training datasets are of equivalent size across language types, but only when scaled according to the so-called “byte-premium”—the different encoding efficiencies of different languages and orthographies. These results suggest that languages of particular morphological types are not intrinsically advantaged or disadvantaged in language modeling. These findings bear on ongoing efforts to improve performance for low-performing and under-resourced languages.
    • Related papers: Why do language models perform worse for morphologically complex languages? (Arnett & Bergen, COLING 2025), A bit of a problem: Measurement disparities in dataset sizes across languages (Arnett et al., SIGUL 2024), and Goldfish: Monolingual language models for 350 languages (Chang et al., arxiv 2024).
  • Oct. 1: Paper discussion: Large language model hacking: Quantifying the hidden risks of using LLMs for text annotation (Baumann et al., arxiv 2025).
  • Oct. 8: No session
  • Oct. 15: Alisa Liu, University of Washington, “Between Language and Models: Rethinking Algorithms for Tokenization.”
    • Abstract: Language models operate over real numbers, while users of language models interface with human-readable text. This is made possible by tokenization, which encodes text as a sequence of embeddings and decodes real-valued predictions back into generated text. Despite its foundation importance to language modeling, the algorithms for tokenization have remained largely unchanged in the era of LLMs. In this talk, I will discuss my recent work in improving algorithms for tokenization. The first half presents SuperBPE, a superword tokenizer that extends traditional subword tokenization to include tokens that span multiple words. We motivate superword tokens from a linguistic perspective, and demonstrate empirically that models pretrained from scratch with SuperBPE achieve stronger performance on downstream tasks while also being significantly more efficient at inference-time. The second half revisits a fundamental limitation of tokenizer-based LMs: models trained over sequences of tokens cannot, out of the box, model the probability of arbitrary strings. I discuss the practical implications of this in domains such as Chinese and code, and then present an inference-time algorithm that converts LM-predicted probabilities over tokens into probabilities over characters, while preserving the sampling distribution at the text level. I will conclude by discussing open questions on the future of tokenization.
    • Related papers: SuperBPE: Space travel for language models (Liu et al., COLM 2025) and Sampling from your language model one byte at a time (Hayase et al., arxiv 2025).
  • Oct. 22: Emma Pierson, University of California, Berkeley, “Sparse autoencoders for hypothesis generation.”
    • Abstract: I will describe HypotheSAEs, a general method to hypothesize interpretable relationships between text data (e.g., headlines) and a target variable (e.g., clicks) using sparse autoencoders. HypotheSAEs produces novel discoveries on well-studied tasks and, compared to baselines, better identifies reference hypotheses on synthetic datasets and produces more predictive hypotheses on real datasets. After describing HypotheSAEs, I will describe how we have applied it to (1) learn the preferences encoded in human feedback datasets and (2) learn interpretable representations of free-text survey data.
    • Related papers: Sparse autoencoders for hypothesis generation (Movva et al., ICML 2025).
  • Oct. 29: Os Keyes, University of Massachusetts Lowell, “Injustices beyond models: Gender, disability and the political economy of Artificial Intelligence.” (In-person talk.)
    • Abstract: When gender and disability are discussed in AI, it’s often in the details: injustices or consequences in how systems are designed and deployed. But what about the political and cultural economies of AI: the shape, structure and narratives of the industry as a whole? Are injustices a matter of design choices, or of structure? In this talk, I will explore the gendered and disability-related implications of AI as a field and industry, arguing that efforts to address injustices require not simply a different way of developing AI, but fundamental changes in how we imagine the technology (and industry)’s roles in society.
    • Related papers: Automating autism: Disability, discourse, and artificial intelligence (Keyes, Journal of Sociotechnical Critique 2020) and The infopolitics of feeling: How race and disability are configured in emotion recognition technology (McInerney & Keyes, New Media & Society 2025).
  • Nov. 5: No session
  • Nov. 12: Emily Tseng, University of Washington
  • Nov. 19: Shira Wein, Amherst College. (In-person talk.)
  • Nov. 26: No session (Thanksgiving break)
  • Dec. 3: TBA

Course: COMPSCI 692L

The seminar is available as a 1-credit seminar course, 692L. Enrollment in the course is not required to attend talks.

Course requirements: Students must read posted papers before the class sessions, submit questions before the seminar, and be prepared to ask their questions (or other questions) at the speaker’s talk. In class sessions without a speaker, students should be ready to participate in class discussions.

Assignment details are distributed through the seminar’s slack channel.

Previous seminars

Speakers from some previous offerings of the NLP Seminar: Spring 2025, Fall 2024, Spring 2024, Fall 2023, Spring 2023.