Jonathan May

Research Associate Professor, University of Southern California Department of Computer Science
Principal Scientist, Information Sciences Institute
Director, Center for Useful Techniques Enhancing Language Applications Based on Natural And Meaningful Evidence

Teaching

CSCI 544: Applied Natural Language Processing

Fall 2017
Fall 2018 (with Nanyun Peng)

CSCI 662: Advanced Natural Language Processing

Join Us!

Hello there. This is an increasingly less text-positive environment, and is in a long tradition of bad CS professor web sites.

People want to know what I look like. Here is a head shot from 2023.

You'd think you could write one bio and it would be done but every context needs a different spin. Here is a collection, which you may find useful, but if not, ask me for a new one and I'll add it to the list.

Sometimes I wish I could make a web page like Terry Koo's.

I should work on my CV more often than I do.

I was recently told that students tend to take on the style of their advisors. I have no idea what was meant by this.

This guy has it all figured out. In the past I wanted to create a Joint Dependency Language Model, or maybe the May Tagger, but I'm happy with CUTELABNAME.

Group Pics

2019
Summer Hike	Summer Lunch
Winter Hike	More Winter Hike
2018
Summer Group Pic	Another Summer Group Pic

Software

Justin Cho's BotEval and SPOLIN
Thamme Gowda's Reader-Translator-Generator RTG
Ulf Hermjakob's Universal Romanizer uroman
I did not release any specific PRO software, but you may find the following links to others' efforts useful:
- Chris Dyer's open-source implementation of PRO, released as part of cdec
- Philipp Koehn contributed an implementation of PRO to Moses (use --pairwise-ranked when running mert-moses.pl)
- MegaM, Hal Daumé III's wonderful optimizer, used in the PRO paper experiments. If he tells you what the "GA" stands for, please let me know!
Tiburon (Old binary version here)
All of ISI's NLP software should be accessible here but I can't be held responsible for most of it.
Max and Chunting's Mega implementation

Publications (Get BibTex)

2026

"GTA: Generating Long-horizon Tasks for Web Agents at Scale", T. Huang, K. Huang, P. Choubey, Y. Zhou, M. Chen, J. May, C. Wu. Proc, ACL, 2026 (To Appear). Preprint.
"Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants", J. Ranjit, H. Cho, C. Smerdon, Y. Nam, M. Phung, J. May, J. Blosnich, S. Swayamdipta. Proc, ACL, 2026 (To Appear). Preprint.
"A Representation Sharpening Framework for Zero Shot Dense Retrieval", D. Ashok, S. Nair, M. Al-Darabsah, C. Teo, T. Agarwal, J. May. Proc, EACL, 2026. Preprint. To Appear.

2025

"Language Models Can Predict Their Own Behavior", D. Ashok, J. May. Proc. NeurIPS, 2025. Paper.
"Teaching Language Models To Gather Information Proactively", T. Huang, S. Chen, M. Chen, J. May, L. Yang, M. Wan, P. Zhou. Findings of EMNLP, 2025. Paper.
"Can VLMs Recall Factual Associations From Visual References?", D. Ashok, A. Chaubey, H. Arai, J. May, J. Thomason. Findings of EMNLP, 2025. Paper.
"FoodPuzzle: Toward Developing Large Language Models as Autonomous Flavor Scientists", T. Huang, D. Lee, J. Sweeney, J. Shi, E. Steliotes, M. Lange, J. May, M. Chen. Proc. KDD, 2025. Paper.
"R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory", T. Huang, K. Basu, I. Abdelaziz, P. Kapanipathi, J. May, M. Chen. Proc. ACL, 2025. Paper.
"A Little Human Data Goes A Long Way", D. Ashok, J. May. Proc. ACL, 2025. Paper.
"NewsInterview: a Dataset and a Playground to Evaluate LLMs' Grounding Gap via Informational Interviews", A. Spangher, M. Lu, S. Kalyan, H. Cho, T. Huang, W. Shi, J. May. Proc. ACL, 2025. Paper.
"Can Vision Language Models Understand Mimed Actions?" H. Cho, S. Lin, T. Srinivasan, M. Saxon, D. Kwon, N. Chavez, J. May. Findings of ACL, 2025. Paper.
"The Million Authors Corpus: A Cross-Lingual and Cross-Domain Wikipedia Dataset for Authorship Verification", A. Israeli, S. Liu, J. May, D. Jurgens. Findings of ACL, 2025. Paper.
"Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL", W. Wongkamjan, Y. Wang, F. Gu, D. Peskoff, J. Kummerfeld, J. May, J. Boyd-Graber. Findings of ACL, 2025. Paper.
"Personalized Help for Optimizing Low-Skilled Users' Strategy", F. Gu, W. Wongkamjan, J. Boyd-Graber, J. Kummerfeld, D. Peskoff, J. May. Proc. NAACL, 2025. Paper.
"Style Transfer with Multi-iteration Preference Optimization", S. Liu and J. May. Proc. NAACL, 2025. Paper.
"Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning", H. Cho, K. Sharma, N. Jedema, L. Ribeiro, J. May, A. Findings of NAACL, 2025. Paper.
"Token Pruning Optimization for Efficient Dense Retrieval with Multi-Vector Representations", S. He, M. Al-Darabsah, S. Nair, J. May, T. Agarwal, T. Yang, C. Teo. Proc. ECIR, 2025. Paper.
"Learning to Rewrite Negation Queries in Product Search", M. Guo, M. Al-Darabsah, C. Teo., J. May, T. Agarwal, R. Bhagat, Proc. ICCL Industry Track, 2025. Paper.

2024

"Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length", X. Ma, X. Yang, W. Xiong, B. Chen, L. Yu, H. Zhang, J. May, L. Zettlemoyer, O. Levy, C. Zhou. Proc. NeurIPS, 2024. Paper.
"Explaining Mixtures of Sources in News Articles", A. Spangher, J. Youn, M. DeButts, N. Peng, E. Ferrara, J. May. Findings of EMNLP, 2024. Paper.
"Are Large Language Models Capable of Generating Human-Level Narratives?", Y. Tian, T. Huang, M. Liu, D. Jiang, A. Spangher, M. Chen, J. May, N. Peng. Proc. EMNLP, 2024. Outstanding Paper Award. Paper.
"Speechworthy Instruction-tuned Language Models", H. Cho, N. Jedema, L. Ribeiro, K. Sharma, P. Szekely, A. Moschitti, R. Janssen, J. May. Proc. EMNLP, 2024. Paper.
"We Can Have AI Without Antisemitism---If We Want It", J. May, V. Felkner, J. Thompson. AJS Perspectives, Summer, 2024. Issue.
"BotEval: Facilitating Interactive Human Evaluation", H. Cho, T. Gowda, Y. Huang, Z. Lu, T. Tong, J. May. Proc. ACL Demo Sessions 2024. Paper.
"More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play", W. Wongkamjan, F. Gu, Y. Wang, U. Hermjakob, J. May, B. Stewart, J. Kummerfeld, D. Peskoff, J. Boyd-Graber. Proc. ACL 2024. Paper.
"GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction", V. Felkner, J. Thompson, J. May. Proc. ACL 2024. Paper.
"Tracking the Newsworthiness of Public Documents", A. Spangher, S. Tumgoren, B. Welsh, N. Peng, E. Ferrara, J. May. Proc. ACL 2024. Paper.
"Multilingual Meta-Distillation Alignment for Semantic Retrieval", M. M'hamdi, J. May, F. Dernoncourt, T. Bui, and S. Yoon. Proc. SIGIR 2024. Paper.
"Can Language Model Moderators Improve the Health of Online Discourse?", H. Cho, S. Liu, T. Shi, D. Jain, B. Rizk, Y. Huang, Z. Lu, N. Wen, J. Gratch, E. Ferrara, J. May. Proc. NAACL 2024. Paper.
"Leitner-Guided Memory Replay for Cross-lingual Continual Learning", M. M'hamdi, J. May. Proc. NAACL 2024. Paper.
"LegalDiscourse: Interpreting When Laws Apply and To Whom", A. Spangher, Z. Xue, T. Wu, M. Hansen, J. May. Proc. NAACL 2024. Paper.
"CPL-NoViD: Context-Aware Prompt-based Learning for Norm Violation Detection in Online Communities", Z. He, J. May, K. Lerman. Proc. ICWSM 2024. Paper.

2023

"Challenges in Context-Aware Neural Machine Translation", L. Jin, J. He, J. May, X. Ma. Proc. EMNLP 2023. Paper.
"Continual Dialogue State Tracking via Example-Guided Question Answering", H. Cho, A. Madotto, Z. Lin, K. Chandu, S. Kottur, J. Xu, J. May, and C. Sankar. Proc. EMNLP 2023. Paper.
"Analyzing Norm Violations in Live-Stream Chat", J. Moon, D. Lee, H. Cho, W. Jin, C. Park, M. Kim, J. May, J. Pujara, S. Park. Proc. EMNLP 2023. Paper.
"Identifying Informational Sources in News Articles", A. Spangher, N. Peng, E. Ferrara, J. May. Proc. EMNLP 2023. Paper.
"Feedback Loops and Complex Dynamics of Harmful Speech in Online Discussions", R. Chang, J. May, K. Lerman. Proc. SBP-BRiMS 2023. Paper.
"Anger Breeds Controversy: Analyzing Controversy and Emotions on Reddit", K. Chen, Z. He, R. Chang, J. May, K. Lerman. Proc. SBP-BRiMS 2023. Paper.
"First Steps Towards a Source Recommendation Engine: Investigating How Sources Are Used in News Articles", A. Spangher, J. Youn, J. May, N. Peng. Proc. The Joint Computation + Journalism European Data & Computational Journalism Conference 2023. Paper.
"Cross-lingual Continual Learning", M. M'hamdi, X. Ren, J. May. Proc. ACL 2023. Paper.
"WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models", V. Felkner, H. Chang, E. Jang, J. May. Proc. ACL 2023. Paper.
"RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation", S. Liu, H. Cho, M. Freedman, X. Ma, J. May. Proc. ACL 2023. Paper.
"Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning", M. Gheini, X. Ma, J. May. Findings of ACL 2023. Paper.
"Blend and Match: Distilling Semantic Search Models with Different Inductive Biases and Model Architectures", H. Bonab, A. Joshi, R. Bhatia, A. Gandhi, V. Huddar, J. Naik, M. Al-Darabsah, C. Teo, J. May, T. Agarwal, V. Petricek. Proc. The 2nd Workshop on Interactive and Scalable Information Retrieval Methods for eCommerce (ISIR-eCom). Paper.
"Mega: Moving Average Equipped Gated Attention", X. Ma, C. Zhou, X. Kong, J. He, L. Gui, G. Neubig, J. May, L. Zettlemoyer. Proc. ICLR, 2023. Paper.
"Bridging the Gap between Native Text and Translated Text through Adversarial Learning: A Case Study on Cross-Lingual Event Extraction", P. Yu, J. May, H. Ji. Findings of EACL 2023. Paper.

2022

"Machine Translation Robustness to Natural Asemantic Variation", J. Bremerman, X. Ren, J. May. Proc. EMNLP, 2022. Paper.
"Segmenting Numerical Substitution Ciphers", N. Aldarrab and J. May. Proc. EMNLP, 2022. Paper.
"Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics", H. Cho, C. Sankar, C. Lin, K. R. Sadagopan, S. Shayandeh, A. Celikyilmaz, J. May, A. Beirami. Findings of EMNLP, 2022. Paper.
"Investigating the Benefits of Free-Form Rationales", J. Sun, S. Swayamdipta, J. May, X. Ma. Findings of EMNLP, 2022. Paper.
"Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models", V. Felkner, H. Chang, E. Jang, J. May). Proc. Queer in AI Workshop @ NAACL 2022. Preprint (non-archival).
"Building an Event Extractor with Only a Few Examples", P. Yu, Z. Zhang, C. Voss, J. May, H. Ji. Proc. DeepLo, 2022. Paper.
"Augmenting Training Data for Massive Semantic Matching Models in Low-Traffic E-commerce Stores", (A. Joshi, S. Vishwanath, C. H. Teo, V. Petricek, V. Vishwanathan, R. Bhagat, J. May). Proc. NAACL 2022 Industry Track. Paper.
"NewsEdits: A Dataset of News Article Revision Histories and a Novel Document-Level Reasoning Challenge", (A. Spangher, X. Ren, J. May, N. Peng). Proc. NAACL 2022. Outstanding Paper Award. Paper.
"Opponent Modeling in Negotiation Dialogues by Related Data Adaptation", (K. Chawla, G. Lucas, J. May, J. Gratch). Findings of NAACL 2022. Paper.

2021

"Explaining Face Presentation Attack Detection Using Natural Language", (H. Mirzaalian, M. Hussein, L. Spinoulas, J. May, W. Abd-Almageed). Proc. IEEE International Conference on Automatic Face and Gesture Recognition 2021. Preprint Paper(paywalled)
"Luna: Linear Unified Nested Attention", (X. Ma, X. Kong, S. Wang, C. Zhou, J. May, H. Ma, L. Zettlemoyer). Proc. NeurIPS 2021. Paper.
"PERFUME: Programmatic Extraction and Refinement For Usability of Mathematical Expression", (N. Weideman, V. Felkner, W. Wu, J. May, C. Hauser, L. Garcia). Proc. CheckMATE 2021. Paper.
"Multitask Semi-Supervised Learning for Class-Imbalanced Discourse Classification", (A. Spangher, J. May, S. Shiang, L. Deng). Proc. EMNLP 2021. Paper.
"Salience-Aware Event Chain Modeling for Narrative Understanding", (X. Zhang, M. Chen, J. May). Proc. EMNLP 2021. Paper.
"Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation", (M. Gheini, X. Ren, J. May). Proc. EMNLP 2021. Paper.
"Summary-Oriented Question Generation for Informational Queries", (X. Yin, L. Zhou, K. Small, J. May). Proc. DialDoc 2021. Paper.
"Many-to-English Machine Translation Tools, Data, and Pretrained Models", (T. Gowda, Z. Zhang, C. Mattmann, J. May), Proc. ACL 2021 Demo Sessions. Paper.
"WARP: Word-level Adversarial ReProgramming", (K. Hambardzumyan, H. Khachatrian, J. May), Proc. ACL, 2021. Paper.
"Can Sequence-to-Sequence Models Crack Substitution Ciphers?", (N. Aldarrab, J. May). Proc. ACL, 2021. Paper.
"Macro-Average: Rare Types Are Important Too", (T. Gowda, W. You, C. Lignos and J. May). Proc. NAACL, 2021. Paper.
"X-METRA-ADA: Cross-lingual Meta-Transfer learning Adaptation to Natural Language Understanding and Question Answering", (M. M'hamdi, D. S. Kim, F. Dernoncourt, T. Bui, X. Ren and J. May). Proc. NAACL, 2021. Paper.
"CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems", (K. Chawla, J. Ramirez, R. Clever, G. Lucas, J. May, J. Gratch). Proc. NAACL, 2021. Paper.

2020

"Proceedings of the 14th International Workshop on Semantic Evaluation", (A. Herbelot, X. Zhu, N. Schneider, A. Palmer, J. May, E. Shutova), 2020. Paper.
"Learning to Generalize for Sequential Decision Making" (X. Yin, R. Weischedel, J. May). Findings of EMNLP, 2020. Paper.
"Finding the Optimal Vocabulary Size for Neural Machine Translation" (T. Gowda, J, May). Findings of EMNLP, 2020. Paper.
"Experience Grounds Language" (Y. Bisk, A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, N. Pinto and J. Turian), Proc. EMNLP, 2020. Paper.
"Connecting the Dots: Event Graph Schema Induction with Path Language Modeling" (M. Li, Q. Zeng, Y. Lin, K. Cho, H. Ji, J. May, N. Chambers and C. Voss), Proc. EMNLP, 2020. Paper.
"Enabling Low-Resource Transfer Learning across COVID-19 Corpora by Combining Event-Extraction and Co-Training" (A. Spangher, N. Peng, J. May, E. Ferrara), Proc. 1st Workshop on NLP for COVID-19 at ACL 2020. Paper.
"Grounding Conversations with Improvised Dialogues" (J. Cho, J. May), Proc. ACL, 2020. Paper. Press. Project.
"'Don't quote me on that': Finding Mixtures of Sources in News Articles" (A. Spangher, J. May, E. Ferrara, N. Peng), Proc. Computation+Journalism, 2020. Get paper in pdf
"Cross-lingual Structure Transfer for Zero-resource Event Extraction" (D. Lu, A. Subburathinam, H. Ji, J. May, S. Chang, A. Sil, C. Voss), Proc. LREC, 2020. Paper.

2019

"Evidence and Artificial Intelligence" (J. Dane and J. May), in Begging The Question: Critical Reasoning in Chaucer Studies, Book History, and Humanistic Inquiry (Mythodologies II), 2019.
"Cross-lingual Joint Entity and Word Embedding to Improve Entity Linking and Parallel Sentence Mining" (X. Pan, T. Gowda, H. Ji, J. May, S. Miller), Proc. DeepLo, 2019. Get paper in pdf . Get BibTex.
"Contextualized Cross-Lingual Event Trigger Extraction with Minimal Resources" (M. M'hamdi, M. Freedman, J. May), Proc. CoNLL, 2019. Get paper in pdf . Get BibTex.
"Do Nuclear Submarines Have Nuclear Captains? A Challenge Dataset for Commonsense Reasoning over Adjectives and Objects", (J. Mullenbach, J. Gordon, N. Peng and J. May), Proc. EMNLP, 2019. Get paper in pdf. Get BibTex.
"What Matters for Neural Cross-Lingual Named Entity Recognition: An Empirical Analysis", (X. Huang, J. May and N. Peng), Proc. EMNLP, 2019. Get paper in pdf. Get BibTex.
"Cross-lingual Structure Transfer for Relation and Event Extraction", (A. Subburathinam, D. Lu, H. Ji, J. May, S. Chang, A. Sil and C. Voss), Proc. EMNLP, 2019. Get paper in pdf. Get BibTex.
"Comprehensible Context-driven Text Game Playing", (X. Yin and J. May), Proc. CoG, 2019. Get paper in pdf.
"SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage", (E. Boschee, J. Barry, J. Billa, M. Freedman, T. Gowda, C. Lignos, C. Palen-Michel, M. Pust, B. K. Khonglah, S. Madikeri, J. May and S. Miller), Proc. ACL Demo Sessions, 2019. Get paper in pdf. Get BibTex.
"Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation", (N. Pourdamghani, N. Aldarrab, M. Ghazvininejad, K. Knight, J. May), Proc. ACL, 2019. Get paper in pdf. Get BibTex.
"Proceedings of the 13th International Workshop on Semantic Evaluation", (J. May, E. Shutova, A. Herbelot, X. Zhu, M. Apidianaki, S. M. Mohammad), 2019. Get paper in pdf. Get BibTex.
"Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging", (L. Huang, H. Ji, J. May), Proc. NAACL, 2019. Get paper in pdf. Get BibTex.
"A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages", (R. Cardenas, Y. Lin, H. Ji, J. May), Proc. NAACL, 2019. Get preview. Get paper in pdf. Get BibTex.

2018

"Translating a Language You Don't Know In the Chinese Room", (U. Hermjakob, J. May, M. Pust, K. Knight), Proc. ACL Demo Sessions, 2018. Get paper in pdf. Get BibTex.
"Out-of-the-box Universal Romanization Tool uroman", (U. Hermjakob, J. May, K. Knight), Proc. ACL Demo Sessions, 2018. Best Demo Award. Get paper in pdf. Get BibTex.
"Towards Controllable Story Generation", (N. Peng, M. Ghazvininejad, J. May, K. Knight), Proc. of the 1st Workshop on Storytelling, 2018. Get paper in pdf. Get BibTex.
"Proceedings of the 12th International Workshop on Semantic Evaluation", (M. Apidianaki, S. M. Mohammad, J. May, E. Shutova, S. Bethard, M. Carpuat), 2018. Get paper in pdf. Get BibTex.
"ELISA-EDL: A Cross-Lingual Entity Extraction, Linking and Localization System", (B. Zhang, Y. Lin, X. Pan, D. Lu, J. May, K. Knight, H. Ji), Proc. NAACL Demo Sessions, 2018. Get paper in pdf. Get BibTex.
"Recurrent Neural Networks as Weighted Language Recognizers", (Y. Chen, S. Gilroy, A. Maletti, J. May, K. Knight), Proc. NAACL, 2018. Outstanding Paper Award. Get paper in pdf. Get BibTex.

2017

"Incident-Driven Machine Translation and Name Tagging for Low-resource Languages", (U. Hermjakob, Q. Li, D. Marcu, J. May, S. Mielke, N. Pourdamghani, M. Pust, X. Shi, K. Knight, T. Levinboim, K. Murray, D. Chiang, B. Zhang, X. Pan, D. Lu, Y. Lin, H. Ji), Machine Translation. (October, 2017). Get paper. Get BibTex.
"Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems", (L. Huang, J. May, X. Pan, H. Ji, X. Ren, J. Han, L. Zhao, J. A. Hendler), Big Data, volume 5, no. 1 (March, 2017). Get paper in pdf. Get BibTex.
"SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation", (J. May, J. Priyadarshi), Proc. SemEval, 2017. Get paper in pdf. Get BibTeX.
"Cross-lingual Name Tagging and Linking for 282 Languages", (X. Pan, B. Zhang, J. May, J. Nothman, K. Knight, H. Ji), Proc. ACL, 2017. Get paper in pdf. Get BibTeX.
"Team ELISA System for DARPA LORELEI Speech Evaluation 2016", (P. Papadopoulos, R. Travadi, C. Vaz, N. Malandrakis, U. Hermjakob, N. Pourdamghani, M. Pust, B. Zhang, X. Pan, D. Lu, Y. Lin, O. Glembek, M. Karthick B, M. Karafiat, L. Burget, M. Hasegawa-Johnson, H. Ji, J. May, K. Knight, S. Narayanan), Proc. Interspeech, 2017. Paper.

2016

"Transfer Learning for Low-Resource Neural Machine Translation", (B. Zoph, D. Yuret, J. May, K. Knight), Proc. EMNLP, 2016. Get paper in pdf. Get BibTeX.
"SemEval-2016 Task 8: Meaning Representation Parsing", (J. May), Proc. SemEval, 2016. Get paper in pdf. Get BibTeX.
"Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies", (B. Zoph, A. Vaswani, J. May, K. Knight), Proc. NAACL, 2016 Get paper in pdf. Get BibTeX.
"Extracting Structured Scholarly Information from the Machine Translation Literature", (E. Choi, M. Horvat, J. May, K. Knight, D. Marcu), Proc. LREC, 2016. Get paper in pdf. Get BibTeX.

2015

"Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation", (M. Pust, U. Hermjakob, K. Knight, D. Marcu, J. May), Proc. EMNLP, 2015. Get paper in pdf. Get BibTeX.
"A Corpus of Rich Metaphor Annotation", (J. Gordon, J. Hobbs, J. May, M. Mohler, F. Morbini, B. Rink, M. Tomlinson, S. Wertheim), Proc. Workshop on Metaphor in NLP, 2015. Get paper in pdf. Get BibTeX.
"High-Precision Abductive Mapping of Multilingual Metaphors", (J. Gordon, J. Hobbs, J. May, F. Morbini), Proc. Workshop on Metaphor in NLP, 2015. Get paper in pdf. Get BibTeX.

2014

"An Arabizi-English Social Media Statistical Machine Translation System", (J. May, Y. Benjira, & A. Echihabi), Proc. AMTA, 2014. Get paper in pdf. Get BibTeX.

2013

"Identifying Useful Human Correction Feedback from an On-line Machine Translation Service", (A. Barrón-Cedeño, L. Màrquez, L., C. Henríquez Q., L. Formiga, E. Romero, & J. May), Proc. IJCAI, 2013. Get paper in pdf. Get BibTeX.
"Models of Translation Competitions", (M. Hopkins and J. May), Proc. ACL, 2013. Get paper in pdf. Get BibTeX.

2012

"An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output", (D. Pighin, L. Màrquez, and J. May), Proc. LREC, 2012. Get paper in pdf. Get BibTeX.

2011

"Tuning as Ranking", (M. Hopkins and J. May), Proc. EMNLP, 2011. Get paper in pdf. Get BibTeX. Get corrected version of paper in pdf . This version corrects a citation error for the "BLEU+1" approximation to BLEU that appears in the published version. Thanks to Colin Cherry and Chris Dyer for pointing it out. Get slides.

2010

"Efficient Inference Through Cascades of Weighted Tree Transducers", (J. May, K. Knight, and H. Vogler), Proc. ACL, 2010. Get paper in pdf. Get BibTeX.
"Re-Structuring, Re-Labeling, and Re-Aligning for Syntax-based Machine Translation", (W. Wang, J. May, K. Knight, and D. Marcu), Computational Linguistics, volume 36, no. 2 (June, 2010). Download pdf from MIT press . Get BibTeX.

2009 (Not a good year for downloadable papers)

"Determinization of Weighted Tree Automata using Factorizations", (M. Büchse, J. May, and H. Vogler), Proc. FSMNLP, 2009.
"Backward and Forward Bisimulation Minimization of Tree Automata", (J. Högberg, A. Maletti, and J. May), Theoretical Computer Science, volume 410, no. 37 (September, 2009).
"Applications of Weighted Automata in Natural Language Processing", (K. Knight and J. May), Handbook of Weighted Automata (M. Droste, W. Kuich, H. Vogler, eds.), 2009.

2008

"Training Tree Transducers", (J. Graehl and K. Knight and J. May), Computational Linguistics, volume 34, no. 3 (September, 2008). Download pdf from MIT Press. Get BibTeX.

2007

"Syntactic Re-Alignment Models for Machine Translation", (J. May and K. Knight), Proc. EMNLP, 2007. Get paper in pdf. Get paper in PostScript. Get BibTeX.
"Backward and Forward Bisimulation Minimisation of Tree Automata", (J. Högberg, A. Maletti, and J. May). Proc. International Conference on on Implementation and Application of Automata (CIAA), Lecture Notes in Computer Science v.4783, copyright Springer-Verlag, 2007. Get BibTeX.
"Bisimulation Minimisation for Weighted Tree Automata", (J. Högberg, A. Maletti, and J. May). Proc. International Conference on Developments in Language Theory (DLT), Lecture Notes in Computer Science v.4588, copyright Springer-Verlag, 2007. Get paper in pdf. Get paper in PostScript Get BibTeX. See talk slides (currently offline).

2006

"Tiburon: A Weighted Tree Automata Toolkit", (J. May and K. Knight), Proc. International Conference on Implementation and Application of Automata (CIAA), Lecture Notes in Computer Science v.4094, copyright Springer-Verlag, 2006. Get paper in pdf. Get paper in PostScript Get BibTeX.
"A Better N-Best List: Practical Determinization of Weighted Finite Tree Automata", (J. May and K. Knight), Proc. NAACL-HLT, 2006. Get paper in pdf. Get paper in PostScript. Get BibTeX.

2003

"Surprise! What's in a Cebuano or Hindi Name?", (J. May, A. Brunstein, P. Natarajan, R. Weischedel), ACM Transactions on Asian Language Information Processing, volume 2, no. 3 (September, 2003). Get paper in pdf. Get BibTeX.
"Answer Selection and Confidence Estimation", (J. Xu, A. Licuanan, J. May, S. Miller, R. Weischedel), New Directions in Question Answering, Papers from 2003 AAAI Spring Symposium, Stanford University, Stanford, CA AAAI Press, 2003. Get BibTeX

2002

"TREC 2002 QA at BBN: Answer Selection and Confidence Estimation", (J. Xu, A. Licuanan, J. May, S. Miller, R. Weischedel), Proc. TREC 2002. Get paper in pdf. Get BibTeX.

Thesis

"Weighted Tree Automata and Transducers for Syntactic Natural Language Processing" (J. May). Defense Passed April 20, 2010. Committee: Kevin Knight (chair), Daniel Marcu, David Chiang, Sven Koenig, Shrikanth Narayanan (outside member), Fernando Pereira (external member). Get the official, double-spaced, hard-to-read format. Or get the better looking, single-spaced format. See slides .

Non-Publications

2025

"Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants", J. Ranjit, H. Cho, C. Smerdon, Y. Nam, M. Phung, J. May, J. Blosnich, S. Swayamdipta. GenAI4Health Workshop at NeurIPS, 2025. Paper.

2023

"Identifying Informational Sources in News Articles," (A. Spangher, N. Peng, J. May, E. Ferrara), arXiv.
"Challenges in Context-Aware Neural Machine Translation," (L. Jin, J. He, J. May, X. Ma), arXiv.
"Continual Dialogue State Tracking via Example-Guided Question Answering," (H. Cho, A. Madotto, Z. Lin, K. Chandu, S. Kottur, J. Xu, J. May, C. Sankar), arXiv.
"Analyzing Norm Violations in Live-Stream Chat," (J. Moon, D. Lee, H. Cho, W. Jin, C. Park, M. Kim, J. May, J. Pujara, S. Park), arXiv.

2022

"Checks and Strategies for Enabling Code-Switched Machine Translation," (T. Gowda, M. Gheini, J. May), arXiv.

2021

"Viola: A Topic Agnostic Generate-and-Rank Dialogue System," (H. Cho, B. Shbita, K. Shenoy, S. Liu, N. Patel, H. Pindikanti, J. Lee, J. May), arXiv.
"StateCensusLaws.org: A Web Application for Consuming and Annotating Legal Discourse Learning," (A. Spangher, J. May), arXiv.

2020

"Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics", (X. Yin, J. May), arXiv.

2019

"A Universal Parent Model for Low-Resource Neural Machine Translation Transfer", (M. Gheini, J. May), arXiv.
"Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Bandit Feedback to Learn Families of Text-Based Adventure Games", (X. Yin, J. May), arXiv.

2018

"Augmenting Statistical Machine Translation with Subword Translation of Out-of-Vocabulary Words", (N. F. Liu, J. May, M. Pust, K. Knight), arXiv.

2015

"Using Syntax-Based Machine Translation to Parse English into Abstract Meaning Representation", (M. Pust, U. Hermjakob, K. Knight, D. Marcu, J. May), arXiv, 2015. Get paper in pdf. Get BibTeX. Note this is substantially the same work as the 2015 EMNLP paper listed above and thus the publication should be cited rather than this non-publication in most instances.

2008

"A Weighted Tree Transducer Toolkit for Syntactic Natural Language Processing Models", (J. May). Thesis Proposal (Qualifying Exam). Passed July 24, 2008. Committee: Kevin Knight (chair), Daniel Marcu, David Chiang, David Kempe, Shrikanth Narayanan (outside member), Fernando Pereira (external member). Get paper in pdf. See slides (currently offline).

2007

"Bisimulation Minimisation of Weighted Tree Automata", (J. Högberg, A. Maletti, and J. May), Technical Report ISI-TR-634 (2007). Get paper in pdf.
"Backward and Forward Bisimulation Minimisation of Tree Automata", (J. Högberg, A. Maletti, and J. May), Technical Report ISI-TR-633 (2007). Get paper in pdf.

Jonathan May

Research Associate Professor, University of Southern California Department of Computer Science Principal Scientist, Information Sciences Institute Director, Center for Useful Techniques Enhancing Language Applications Based on Natural And Meaningful Evidence

Teaching

CSCI 544: Applied Natural Language Processing

CSCI 662: Advanced Natural Language Processing

Join Us!

Group Pics

2019

2018

Software

Publications (Get BibTex)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009 (Not a good year for downloadable papers)

2008

2007

2006

2003

2002

Thesis

Non-Publications

2025

2023

2022

2021

2020

2019

2018

2015

2008

2007

Research Associate Professor, University of Southern California Department of Computer Science
Principal Scientist, Information Sciences Institute
Director, Center for Useful Techniques Enhancing Language Applications Based on Natural And Meaningful Evidence