arXiv 论文速递

UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

Authors: Ruiheng Zhang, Jingfeng Yao, Huangxuan Zhao, Hao Yan, Xiao He, Lei Chen, Zhou Wei, Yong Luo, Zengmao Wang, Lefei Zhang, Dacheng Tao, Bo Du

First: 2026-01-16T18:59:58+00:00 · Latest: 2026-01-16T18:59:58+00:00

Comments: Codes and models are available at https://github.com/ZrH42/UniX

Abs · PDF · Code1 · Code2 · Code3

Abstract

Despite recent progress, medical foundation models still struggle to unify visual understanding and generation, as these tasks have inherently conflicting goals: semantic abstraction versus pixel-level reconstruction. Existing approaches, typically based on parameter-shared autoregressive architectures, frequently lead to compromised performance in one or both tasks. To address this, we present UniX, a next-generation unified medical foundation model for chest X-ray understanding and generation. UniX decouples the two tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. Crucially, a cross-modal self-attention mechanism is introduced to dynamically guide the generation process with understanding features. Coupled with a rigorous data cleaning pipeline and a multi-stage training strategy, this architecture enables synergistic collaboration between tasks while leveraging the strengths of diffusion models for superior generation. On two representative benchmarks, UniX achieves a 46.1% improvement in understanding performance (Micro-F1) and a 24.2% gain in generation quality (FD-RadDino), using only a quarter of the parameters of LLM-CXR. By achieving performance on par with task-specific models, our work establishes a scalable paradigm for synergistic medical image understanding and generation. Codes and models are available at https://github.com/ZrH42/UniX.

中文标题/摘要

标题：UniX：统一自回归和扩散模型以理解与生成胸部X光片

尽管取得了进展，但医疗基础模型仍然难以统一视觉理解和生成，因为这两个任务具有固有的冲突目标：语义抽象与像素级重建。现有方法通常基于参数共享的自回归架构，经常导致在其中一个或两个任务上的性能妥协。为了解决这个问题，我们提出了UniX，这是一种用于胸部X光片理解和生成的新一代统一医疗基础模型。UniX 将两个任务分别拆分为一个自回归分支用于理解，一个扩散分支用于高保真生成。关键地，引入了一种跨模态自注意力机制，以动态地用理解特征引导生成过程。结合严格的去噪数据处理管道和多阶段训练策略，该架构能够使任务之间协同合作，同时利用扩散模型的优势以实现更优的生成。在两个代表性基准上，UniX 在理解性能（Micro-F1）上提高了46.1%，在生成质量（FD-RadDino）上提高了24.2%，仅使用LLM-CXR参数的四分之一。通过达到与任务特定模型相当的性能，我们的工作确立了一种可扩展的医疗图像理解和生成协同范式。代码和模型可在 https://github.com/ZrH42/UniX 获取。

Summary / 总结

UniX is designed to unify visual understanding and generation for chest X-rays by decoupling these tasks into an autoregressive branch for understanding and a diffusion branch for high-fidelity generation. It introduces a cross-modal self-attention mechanism to dynamically guide generation with understanding features. On benchmarks, UniX improves understanding performance by 46.1% (Micro-F1) and generation quality by 24.2% (FD-RadDino), using only a quarter of the parameters of LLM-CXR. This work establishes a scalable paradigm for synergistic medical image understanding and generation.

UniX 通过将视觉理解任务和高保真生成任务分别用自回归分支和扩散分支来解耦，引入了跨模态自注意力机制以动态地用理解特征来引导生成过程。在基准测试中，UniX 的理解性能提高了 46.1%（Micro-F1），生成质量提高了 24.2%（FD-RadDino），并且仅使用了 LLM-CXR 参数的四分之一。这项工作为协同的医学图像理解和生成建立了一个可扩展的范式。

Do explanations generalize across large reasoning models?

Authors: Koyena Pal, David Bau, Chandan Singh

First: 2026-01-16T18:55:29+00:00 · Latest: 2026-01-16T18:55:29+00:00