Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

本文由 AI 分析生成

建立時間： 2026-04-02 來源： https://arxiv.org/abs/2602.05128

Summary

Through interviews with 18 authors of LLM-integrated HCI papers, this study investigates how researchers navigate the challenges of publishing systems that use LLMs. Authors report that reviewers apply uniquely skeptical and inconsistent standards to LLM-integrated work, leading to defensive writing strategies like adding technical evaluations and de-emphasizing the LLM. The paper surfaces tensions between HCI and ML/NLP community norms and proposes guidelines for authors, reviewers, and communities.

透過訪談 18 位 LLM 整合系統論文作者，研究發現審稿人對 LLM 相關研究施以不一致的懷疑標準，作者因此採用防禦性寫作策略。論文揭示 HCI 與 ML/NLP 社群規範之間的衝突，並提出指導方針。

Prerequisites

HCI research methodology — the paper critiques norms specific to CHI/CSCW-style research; understanding what counts as a “contribution” in HCI (design, evaluation, theory) vs. ML is essential.
Peer review dynamics — the study is about reviewer-author interactions; understanding how reviewing norms shape publication bias contextualizes the findings.
LLM non-determinism — the core technical challenge driving the reporting problems is LLM output variability; understanding why this breaks standard reproducibility expectations is foundational.

Core Idea

LLMs introduce a new kind of uncertainty into HCI systems: outputs are non-deterministic, model versions change, and prompt engineering is context-dependent. This undermines the trust-building rituals of peer review — reviewers cannot easily replicate or verify LLM behavior, creating skepticism. Authors respond by over-engineering evaluations or hiding the LLM’s role, which distorts research reporting. The deeper problem is a values clash: HCI prioritizes user experience and design contribution; ML/NLP prioritizes technical rigor and reproducibility. LLM-integrated systems sit at the boundary and satisfy neither community’s norms cleanly.

Results

18 author interviews (qualitative study).
Themes: inconsistent reviewer standards, mistrust mitigation strategies, tensions over prompt disclosure requirements, debate on open vs. proprietary model usage.
6 expert HCI researchers contributed additional feedback to final guidelines.

Limitations

Author-stated: sample is self-selected authors of published papers; authors who failed to publish may have different experiences.
Unstated: the study captures a snapshot of norms in flux — as LLM-integrated work becomes more common, reviewer standards may stabilize in ways not captured here.

Reproducibility

Code: qualitative study; no code.
Datasets: interview transcripts (18 participants); not publicly released.
Compute: N/A.

Insights

This paper is important for anyone submitting LLM-integrated work to CHI or CSCW. The finding that authors de-emphasize LLM presence to avoid reviewer skepticism is a subtle publication bias — it means the published record may underrepresent how prevalent LLM integration actually is. The argument that prompt reporting is context-dependent (not universally required) is practically useful pushback against blanket “you must share all prompts” reviewer demands. The HCI-vs-ML norms tension is likely to intensify as more systems papers combine both traditions.

Connections

Raw Excerpt

Authors perceive that reviewers apply uniquely skeptical and inconsistent standards towards papers that report LLM-integrated systems, and mitigate mistrust by adding technical evaluations, justifying usage, and de-emphasizing LLM presence.

bot_vault

Explorer

Reporting and Reviewing LLM-Integrated Systems in HCI: Challenges and Considerations

Summary

Prerequisites

Core Idea

Results

Limitations

Reproducibility

Insights

Connections

Raw Excerpt

Graph View

Table of Contents

Backlinks