Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence

Published in The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

This paper addresses the challenge of knowledge conflicts that arise when models have access to rich, diverse knowledge sources. We propose methods for recalibrating models to appropriately handle conflicting evidence and reflect uncertainty in their predictions.

Recommended citation: Hung-Ting Chen, Michael J.Q. Zhang, Eunsol Choi. (2022). "Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence." The 2022 Conference on Empirical Methods in Natural Language Processing.
Download Paper

Continually Improving Extractive QA via Human Feedback

Published in The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation.

Recommended citation: Ge Gao^*, Hung-Ting Chen^*, Yoav Artzi, Eunsol Choi. (2023). "Continually Improving Extractive QA via Human Feedback." The 2023 Conference on Empirical Methods in Natural Language Processing.
Download Paper

Understanding Retrieval Augmentation for Long-Form Question Answering

Published in Conference On Language Modeling 2024, 2024

This paper provides a comprehensive analysis of retrieval augmentation for long-form question answering. We examine how retrieval augmentation affects model performance across different types of questions and identify key factors that determine its effectiveness.

Recommended citation: Hung-Ting Chen, Fangyuan Xu^*, Shane A. Arora^*, Eunsol Choi. (2024). "Understanding Retrieval Augmentation for Long-Form Question Answering." Conference On Language Modeling 2024.
Download Paper

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Published in The 63rd Annual Meeting of the Association for Computational Linguistics, 2024

This paper introduces CaLMQA, a benchmark for evaluating long-form question answering systems on culturally specific questions across 23 languages. We analyze how different models handle cultural nuances and language-specific knowledge.

Recommended citation: Shane Arora^*, Marzena Karpinska^*, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi. (2024). "CaLMQA: Exploring culturally specific long-form question answering across 23 languages." The 63rd Annual Meeting of the Association for Computational Linguistics.
Download Paper

Open-World Evaluation for Retrieving Diverse Perspectives

Published in 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025

We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e.g., will ChatGPT do more harm than good?). We curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS), where each example consists of a question and diverse perspectives associated with the question, sourced from survey questions and debate websites. On this data, retrievers paired with a corpus are evaluated to surface a document set that contains diverse perspectives. Our framing diverges from most retrieval tasks in that document relevancy cannot be decided by simple string matches to references. Instead, we build a language model-based automatic evaluator that decides whether each retrieved document contains a perspective. This allows us to evaluate the performance of three different types of corpus (Wikipedia, web snapshot, and corpus constructed on the fly with retrieved pages from the search engine) paired with retrievers. Retrieving diverse documents remains challenging, with the outputs from existing retrievers covering all perspectives on only 33.74% of the examples. We further study the impact of query expansion and diversity-focused reranking approaches and analyze retriever sycophancy. Together, we lay the foundation for future studies in retrieval diversity handling complex queries.

Recommended citation: Hung-Ting Chen, Eunsol Choi. (2025). "Open-World Evaluation for Retrieving Diverse Perspectives." 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics.
Download Paper

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

Published in Arxiv, 2025

Most text retrievers generate one query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We first quantify the limitations of existing retrievers. All retrievers we evaluate struggle more as the distance between target document embeddings grows. To address this limitation, we develop a new retriever architecture, Autoregressive Multi-Embedding Retriever (AMER). Our model autoregressively generates multiple query vectors, and all the predicted query vectors are used to retrieve documents from the corpus. We show that on the synthetic vectorized data, the proposed method could capture multiple target distributions perfectly, showing 4x better performance than single embedding model. We also fine-tune our model on real-world multi-answer retrieval datasets and evaluate in-domain. AMER presents 4 and 21% relative gains over single-embedding baselines on two datasets we evaluate on. Furthermore, we consistently observe larger gains on the subset of dataset where the embeddings of the target documents are less similar to each other. We demonstrate the potential of using a multi-query vector retriever and open up a new direction for future work.

Recommended citation: Hung-Ting Chen, Xiang Liu , Shauli Ravfogel, Eunsol Choi. (2025). "Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval." Arxiv.
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015