Link Search Menu Expand Document

CS662

Advanced Natural Language Processing

Staff

Instructor

Jonathan May

Office Hours: Mondays and Wednesdays 3:00-4:00 pm GCS SB10 (LL2) or by appointment

Teaching Assistant

Katy Felkner

felkner@usc.edu

Office Hours: 1-3pm Wednesdays, GCS LL2, room SB3, or by appointment on Calendly

Lectures

  • Monday and Wednesday 10:00–11:50 am, DMC 261
  • See schedule for select days where class is canceled

Textbook

Grading

PercentageAssessment Component
10%In class participation
10%Posted questions before each in-class selected paper presentation and possible quizzes
10%In-class selected paper presentation
30%Three Homeworks (10% each)
40%Project, done in small groups, comprising:
 - Proposal (5%)
 - First version of report (5%)
 - In-class presentation (10%)
 - Final report (20%).
  • Written homeworks and project components except for final project report must be submitted on the date listed in the schedule, by 23:59:59 AoE.
  • Final project report is due Monday, December 15, 2025, 10:00 AM PST
  • A deduction of 1/5 of the total possible score will be assessed for each late day. After four late days (i.e. on the fifth), you get a 0 on the assignment (and you should come talk to us because your grade will likely suffer!)
  • You have four extension days, to be applied as you wish, throughout the entire class, for homeworks and project proposal / first report (NOT final report). No deduction will be assessed if an extension day is used. As an example, if an assignment is due November 10, you have two extension days remaining, you submit the assignment on November 12, and your score is 90/100. In this case you lose the extension days but your grade is not reduced; it remains 90/100. If you have one extension day, you lose it, and your grade is 70/100. If you have no extension days, your grade is 50/100.

Contact us

On Slack, or in class/office hours. Please do not email (unless notified otherwise).

Topics

(subject to change per instructor/class whim) (will not necessarily be presented in this order):
Fundamentals
Linguistic Stack (graphemes/phones - words - syntax - semantics - pragmatics - discourse
Corpora, Corpus statistics, Data cleaning, munging, and annotation
Evaluation
Linear and Nonlinear Models
Dense Representations and neural architectures (feed-forward, RNN, Transformer)
Language Models
Pre-training, Fine-tuning, Prompting, Reward Alignment
Ethics
Effective written and oral communication
Applications
Multilingualism and Translation
Syntax
Information Retrieval/Question Answering
Dialogue
Information Extraction
Multimodality
Speech Recognition and Generation
Agent Interaction
Discourse

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Oct 6
Efficient Inference
Narges Ghasemi Ghaleh Bahmani - LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
Questions by: Tianming Guo
Saeed Hedayatian - TreeRL: LLM Reinforcement Learning with On-Policy Tree Search
Questions by: Zhiyuan Gao
Oct 8
MEGA (Guest Lecture by Xuezhe Ma)
Mega Paper Megalodon
Daniel Ruiz - TokAlign: Efficient Vocabulary Adaptation via Token Alignment
Questions by: Abhinav Vadhera
Ardysatrio Haroen - Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Questions by: Chufan Shi
Oct 10
Mid Drop (No W, No refund)

Week 8

Oct 13
Agents (Guest Lecture by Tenghao Huang)
WebArena, ToolLLM, Narrative Discourse, ReAct
Kiarash Vaziri Goodarzi - TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Questions by: Matthew Finlayson
Oct 15
Ethics (Guest Lecture by Katy Felkner)
The Social Impact of Natural Language Processing, Energy and Policy Considerations for Deep Learning in NLP, Model Cards for Model Reporting
Kaicheng Wang - MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
Questions by: Ardysatrio Haroen
Zhiyuan Gao - OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
Questions by: Naga Vamsi Ramana Dinavahi
Oct 17
HW 2 due

Week 9

Oct 20
Information Retrieval (IR) and Question Answering (QA)
JM 11
Faith Baca - Large Language Models Are Biased Because They Are Large Language Models
Questions by: Sajjad Shahabi
Ruth-Ann Armstrong - Biased LLMs can Influence Political Decision-Making
Questions by: Saba Hashemi Safaei
Oct 22
Machine Translation (MT)/Multilinguality slides1 slides2
JM12 Weaver, Translation (1952)
Tianwen Fu - Improving Factuality with Explicit Working Memory
Questions by: Kaicheng Wang
Nikunj Gupta - Reinforced IR: A Self-Boosting Framework For Domain-Adapted Information Retrieval
Questions by: Faith Baca

Week 10

Week 11

Nov 3
Multimodal NLP (Guest Lecture by Xuezhe Ma)
Anzhe Cheng - SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Questions by: Feiyu Zhu
Gonglin Chen - SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
Questions by: Wenbin Teng
Nov 5
Spoken Language Processing (SLP) (Guest Lecture by Sudarsana Reddy Kadiri)
JM 15
Wenbin Teng - Improve Vision Language Model Chain-of-thought Reasoning
Questions by: Kiarash Vaziri Goodarzi
Chufan Shi - ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Questions by: Tianwen Fu
Nov 7
Project Report Version 1 due

Week 12

Nov 10
Mind Reading (Guest Lecture by Sam Nastase)
Lydia Ignatova - Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems
Questions by: Gonglin Chen
Sichang (Stephen) He - Learning to Rewrite: Generalized LLM-Generated Text Detection
Questions by: Anzhe Cheng
Nov 12
Discourse Slides
Abhinav Vadhera - JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
Questions by: Ruth-Ann Armstrong
Yuxin Yang - A Troublemaker with Contagious Jailbreak Makes Chaos in Honest Towns
Questions by: Sadra Sabouri Halestani
Nov 14
Late Drop (W, No refund)

Week 13

Nov 17
TBD
Danny Deng - LocAgent: Graph-Guided LLM Agents for Code Localization
Questions by: Saeed Hedayatian
Matthew Finlayson - Geometric Signatures of Compositionality Across a Language Model’s Lifetime
Questions by: Lydia Ignatova
Nov 19
Auditing, Dissecting, and Evaluating Large Language Models (Guest Lecture by Robin Jia)
Naga Vamsi Ramana Dinavahi - Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models
Questions by: Danny Deng
Nov 21
HW 3 due

Week 14

Nov 24
Project Presentations
(10:00) TBD

Questions by: TBD

(10:18) TBD

Questions by: TBD

(10:36) TBD

Questions by: TBD

(10:54) TBD

Questions by: TBD

(11:12) TBD

Questions by: TBD

(11:30) TBD

Questions by: TBD

Nov 26
THANKSGIVING BREAK; NO CLASS

Week 15

Dec 1
Project Presentations
(10:00) TBD

Questions by: TBD

(10:18) TBD

Questions by: TBD

(10:36) TBD

Questions by: TBD

(10:54) TBD

Questions by: TBD

(11:12) TBD

Questions by: TBD

(11:30) TBD

Questions by: TBD

Dec 3
NO CLASS