arXiv 论文速递

WildRayZer: Self-supervised Large View Synthesis in Dynamic Environments

Authors: Xuweiyi Chen, Wentao Zhou, Zezhou Cheng

First: 2026-01-15T18:59:58+00:00 · Latest: 2026-01-15T18:59:58+00:00

Comments: Project Page: https://wild-rayzer.cs.virginia.edu/

Abstract

We present WildRayZer, a self-supervised framework for novel view synthesis (NVS) in dynamic environments where both the camera and objects move. Dynamic content breaks the multi-view consistency that static NVS models rely on, leading to ghosting, hallucinated geometry, and unstable pose estimation. WildRayZer addresses this by performing an analysis-by-synthesis test: a camera-only static renderer explains rigid structure, and its residuals reveal transient regions. From these residuals, we construct pseudo motion masks, distill a motion estimator, and use it to mask input tokens and gate loss gradients so supervision focuses on cross-view background completion. To enable large-scale training and evaluation, we curate Dynamic RealEstate10K (D-RE10K), a real-world dataset of 15K casually captured dynamic sequences, and D-RE10K-iPhone, a paired transient and clean benchmark for sparse-view transient-aware NVS. Experiments show that WildRayZer consistently outperforms optimization-based and feed-forward baselines in both transient-region removal and full-frame NVS quality with a single feed-forward pass.

中文标题/摘要

标题：WildRayZer：动态环境中的自监督大规模视图合成

我们提出了WildRayZer，一种在动态环境中进行新颖视图合成（NVS）的自监督框架，其中相机和物体都在移动。动态内容破坏了静态NVS模型依赖的多视图一致性，导致鬼影、虚假几何结构和不稳定的姿态估计。WildRayZer 通过执行分析-合成测试来解决这一问题：仅相机的静态渲染器解释刚性结构，其残差揭示了瞬态区域。从这些残差中，我们构建伪运动掩码，提炼出一个运动估计器，并使用它来屏蔽输入标记和门控损失梯度，使监督集中在跨视图背景完成上。为了实现大规模训练和评估，我们整理了包含15000个随意捕捉的动态序列的真实世界数据集Dynamic RealEstate10K（D-RE10K），以及D-RE10K-iPhone，这是一个瞬态和干净的基准数据集，用于稀疏视图瞬态感知NVS。实验表明，WildRayZer 在瞬态区域去除和全帧NVS质量方面，单次前向传递时均优于基于优化和前馈的基线。

Summary / 总结

WildRayZer is a self-supervised framework for novel view synthesis in dynamic environments where both the camera and objects move. It addresses the issue of multi-view inconsistency by using a camera-only static renderer to explain rigid structure and constructing pseudo motion masks to focus supervision on cross-view background completion. Experiments show that WildRayZer outperforms optimization-based and feed-forward baselines in both transient-region removal and full-frame NVS quality with a single feed-forward pass.

WildRayZer 是一种用于动态环境中的新颖视图合成的自监督框架，解决了鬼影和姿态估计不稳定等问题。它通过静态渲染器解释刚性结构，并利用残差构建伪运动掩码，从而将监督重点放在背景完成上。实验结果显示，WildRayZer 在移除过渡区域和全帧 NVS 质量方面优于基于优化和前馈的基本模型，且仅需一次前馈传递。

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Authors: Changle Qu, Sunhao Dai, Hengyi Cai, Jun Xu, Shuaiqiang Wang, Dawei Yin

First: 2026-01-15T18:59:23+00:00 · Latest: 2026-01-15T18:59:23+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local step precision with global task success, we introduce a dual-level advantage estimation scheme that integrates turn-level and trajectory-level signals, assigning distinct advantage values to individual interaction turns. Extensive experiments on three benchmarks demonstrate the superiority of MatchTIR. Notably, our 4B model surpasses the majority of 8B competitors, particularly in long-horizon and multi-turn tasks. Our codes are available at https://github.com/quchangle1/MatchTIR.

中文标题/摘要

标题：MatchTIR：通过二分匹配实现细粒度监督的工具集成推理

工具集成推理（TIR）通过交替进行推理步骤和外部工具交互，赋予大型语言模型（LLMs）处理复杂任务的能力。然而，现有的强化学习方法通常依赖于结果或轨迹级别的奖励，对轨迹内的所有步骤分配相同的优点，这种粗粒度的信用分配无法区分有效的工具调用与冗余或错误的调用，特别是在长时间多轮场景中。为了解决这个问题，我们提出了MatchTIR框架，通过基于二分匹配的轮次级别奖励分配和双层优势估计引入细粒度监督。具体来说，我们将信用分配形式化为预测和真实轨迹之间的二分匹配问题，利用两种分配策略推导出密集的轮次级别奖励。此外，为了平衡局部步骤精度与全局任务成功，我们引入了一种双层优势估计方案，结合轮次级别和轨迹级别的信号，为每个交互轮次分配不同的优势值。在三个基准上的广泛实验表明，MatchTIR具有优越性。值得注意的是，我们的4B模型在大多数8B竞争对手中表现更优，特别是在长时间多轮任务中。我们的代码可在https://github.com/quchangle1/MatchTIR获取。

Summary / 总结

The paper introduces MatchTIR, a framework that enhances Tool-Integrated Reasoning (TIR) for large language models by using bipartite matching to assign fine-grained turn-level rewards and dual-level advantage estimation. This method distinguishes effective tool calls from redundant ones, improving performance in long-horizon multi-turn tasks. Experiments on three benchmarks show that MatchTIR outperforms most 8B competitors, especially in complex tasks.

MatchTIR 通过基于二分匹配的回合级奖励分配和双层优势估计提供细粒度监督，以增强大型语言模型的工具集成推理 (TIR)。该方法在长时程多轮场景中区分有效的工具调用和冗余或错误的调用。实验结果表明，MatchTIR 在三个基准测试上优于大多数 8B 竞争对手，特别是在长时程和多轮任务中。

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

Authors: Cheng Chen, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Peng Di, Hang Yu, Lianli Gao

First: 2026-01-15T18:59:10+00:00 · Latest: 2026-01-15T18:59:10+00:00