arXiv 论文速递

Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance

Authors: Jan U. Müller, Robin Tim Landsgesell, Leif Van Holland, Patrick Stotko, Reinhard Klein

First: 2025-12-12T18:59:55+00:00 · Latest: 2025-12-12T18:59:55+00:00

Abstract

The recent success of 3D Gaussian Splatting (3DGS) has reshaped novel view synthesis by enabling fast optimization and real-time rendering of high-quality radiance fields. However, it relies on simplified, order-dependent alpha blending and coarse approximations of the density integral within the rasterizer, thereby limiting its ability to render complex, overlapping semi-transparent objects. In this paper, we extend rasterization-based rendering of 3D Gaussian representations with a novel method for high-fidelity transmittance computation, entirely avoiding the need for ray tracing or per-pixel sample sorting. Building on prior work in moment-based order-independent transparency, our key idea is to characterize the density distribution along each camera ray with a compact and continuous representation based on statistical moments. To this end, we analytically derive and compute a set of per-pixel moments from all contributing 3D Gaussians. From these moments, a continuous transmittance function is reconstructed for each ray, which is then independently sampled within each Gaussian. As a result, our method bridges the gap between rasterization and physical accuracy by modeling light attenuation in complex translucent media, significantly improving overall reconstruction and rendering quality.

中文标题/摘要

标题：基于时刻的3D高斯点云渲染：通过无序独立透明度解决体积遮挡

最近3D高斯点云渲染（3DGS）的成功已经重塑了新颖视图合成，通过实现快速优化和实时渲染高质量辐射场。然而，它依赖于简化的、有序依赖的alpha混合和渲染器中密度积分的粗略近似，从而限制了其渲染复杂、重叠的半透明对象的能力。在本文中，我们通过一种新颖的方法扩展了基于渲染器的3D高斯表示的渲染，该方法完全避免了光线追踪或逐像素样本排序的需要。基于先前基于时刻的无序独立透明度的工作，我们的核心思想是使用基于统计矩的紧凑且连续的表示来表征每个摄像机射线上的密度分布。为此，我们从所有贡献的3D高斯中推导并计算出一组像素级的矩。从这些矩中，为每条射线重建一个连续的透明度函数，然后在每个高斯中独立采样。因此，我们的方法通过建模复杂透明介质中的光衰减，弥合了渲染与物理准确性的差距，显著提高了整体重建和渲染质量。

Summary / 总结

This paper addresses the limitations of 3D Gaussian Splatting (3DGS) in rendering complex, overlapping semi-transparent objects by proposing a novel method for high-fidelity transmittance computation. The method avoids the need for ray tracing or per-pixel sample sorting, instead using statistical moments to characterize density distributions along camera rays. Key experimental findings show significant improvements in reconstruction and rendering quality, particularly in handling light attenuation in complex translucent media.

本文针对3D高斯散点图（3DGS）在渲染复杂重叠半透明物体时的局限性，提出了一种高保真透过率计算的新方法。该方法避免了光线追踪或逐像素样本排序的需求，而是使用统计矩来表征沿摄像机光线的密度分布。实验结果表明，该方法在处理复杂透明介质中的光衰减方面显著提高了重建和渲染质量。

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Authors: Ye Fang, Tong Wu, Valentin Deschaintre, Duygu Ceylan, Iliyan Georgiev, Chun-Hao Paul Huang, Yiwei Hu, Xuelin Chen, Tuanfeng Yang Wang

First: 2025-12-12T18:59:54+00:00 · Latest: 2025-12-12T18:59:54+00:00

Comments: Project Page: https://aleafy.github.io/vrgbx

Abs · PDF · Code1 · Code2 · Project1

Abstract

Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages them for video synthesis, and supports editable intrinsic representations remains unexplored. We present V-RGBX, the first end-to-end framework for intrinsic-aware video editing. V-RGBX unifies three key capabilities: (1) video inverse rendering into intrinsic channels, (2) photorealistic video synthesis from these intrinsic representations, and (3) keyframe-based video editing conditioned on intrinsic channels. At the core of V-RGBX is an interleaved conditioning mechanism that enables intuitive, physically grounded video editing through user-selected keyframes, supporting flexible manipulation of any intrinsic modality. Extensive qualitative and quantitative results show that V-RGBX produces temporally consistent, photorealistic videos while propagating keyframe edits across sequences in a physically plausible manner. We demonstrate its effectiveness in diverse applications, including object appearance editing and scene-level relighting, surpassing the performance of prior methods.

中文标题/摘要

标题：V-RGBX：具有准确内在属性控制的视频编辑

大规模视频生成模型在模拟现实世界场景中的真实外观和光照交互方面展现了显著潜力。然而，一个能够联合理解内在场景属性（例如，反射率、法线、材料和辐照度），利用这些属性进行视频合成，并支持可编辑的内在表示的闭环框架尚未被探索。我们提出了V-RGBX，这是第一个端到端的内在感知视频编辑框架。V-RGBX 统一了三个关键能力：(1) 视频逆渲染到内在通道，(2) 从这些内在表示生成真实感视频，以及 (3) 基于关键帧的内在通道条件下的视频编辑。V-RGBX 的核心是一个交错的条件机制，它通过用户选择的关键帧使视频编辑直观且物理上合理，支持对任何内在模态的灵活操作。大量定性和定量结果表明，V-RGBX 生成了时间上一致、真实感的视频，同时以物理上合理的方式在序列中传播关键帧编辑。我们展示了其在多种应用中的有效性，包括对象外观编辑和场景级重新照明，超越了先前方法的性能。

Summary / 总结

V-RGBX is an end-to-end framework for intrinsic-aware video editing that integrates video inverse rendering, photorealistic video synthesis, and keyframe-based editing. It uses an interleaved conditioning mechanism to enable physically grounded video editing. V-RGBX produces temporally consistent, photorealistic videos and effectively propagates keyframe edits across sequences, outperforming previous methods in applications such as object appearance editing and scene-level relighting.

V-RGBX 是一个端到端的框架，用于具有内在感知的视频编辑，结合了视频逆渲染、逼真的视频合成和关键帧编辑。它使用交错的条件机制来实现直观且物理上合理的视频编辑。V-RGBX 生成了时间上一致、逼真的视频，并有效地在序列中传播关键帧编辑，超越了先前方法在对象外观编辑和场景级重新照明等应用中的性能。

Particulate: Feed-Forward 3D Object Articulation

Authors: Ruining Li, Yuxin Yao, Chuanxia Zheng, Christian Rupprecht, Joan Lasenby, Shangzhe Wu, Andrea Vedaldi

First: 2025-12-12T18:59:51+00:00 · Latest: 2025-12-12T18:59:51+00:00

Comments: Project page: https://ruiningli.com/particulate

Abs · PDF · Code1 · Code2

Abstract

We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulation Transformer, which processes a point cloud of the input mesh using a flexible and scalable architecture to predict all the aforementioned attributes with native multi-joint support. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate can also accurately infer the articulated structure of AI-generated 3D assets, enabling full-fledged extraction of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D generator. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Quantitative and qualitative results show that Particulate significantly outperforms state-of-the-art approaches.

中文标题/摘要

标题：Particulate：前馈3D物体关节化

我们提出了Particulate，一种前馈方法，给定一个日常物体的单一静态3D网格，可以直接推断出其底层关节化结构的所有属性，包括其3D部件、运动学结构和运动约束。其核心是一个变压器网络，Part Articulation Transformer，它使用灵活且可扩展的架构处理输入网格的点云，以预测上述所有属性，具有原生多关节支持。我们通过公共数据集中的多样化关节化3D资产对网络进行端到端训练。在推理过程中，Particulate 将网络的前馈预测提升到输入网格，几秒钟内生成一个完全关节化的3D模型，比需要针对每个物体进行优化的先前方法快得多。Particulate 还可以准确推断AI生成的3D资产的关节化结构，结合现成的图像到3D生成器时，可以从单个（真实或合成）图像中完整提取关节化3D物体。我们还引入了一个新的具有挑战性的3D关节化估计基准，从高质量的公共3D资产中精心挑选，重新设计了评估协议，使其更符合人类偏好。定量和定性结果表明，Particulate 显著优于现有最佳方法。

Summary / 总结

Particulate is a feed-forward method that directly infers the 3D parts, kinematic structure, and motion constraints of an object from a single static 3D mesh using a transformer network. It trains end-to-end on diverse articulated 3D assets and can quickly generate a fully articulated 3D model during inference, outperforming previous approaches. It also accurately infers the articulated structure of AI-generated 3D assets, enabling extraction from a single image.

Particulate 是一种前馈方法，给定一个静态 3D 网格模型，可以直接推断出其关节结构，包括 3D 部件、运动学结构和运动约束。它使用一个变压器网络 Part Articulation Transformer，从输入网格的点云中预测这些属性。该网络通过多样化的 3D 资产进行端到端训练，在推理过程中快速生成一个完全关节化的 3D 模型，优于需要逐对象优化的先前方法。Particulate 还能够准确推断 AI 生成的 3D 资产的关节结构，结合现成的图像到 3D 生成器，可以从单个（真实或合成）图像中提取关节化的 3D 对象。该方法引入了一个新的 3D 关节估计基准，并重新设计了评估协议，使其更符合人类偏好，显示出在现有方法上的显著改进。

AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis

Authors: Junjie Ye, Rong Xue, Basile Van Hoorick, Pavel Tokmakov, Muhammad Zubair Irshad, Yue Wang, Vitor Guizilini

First: 2025-12-12T18:59:45+00:00 · Latest: 2025-12-12T18:59:45+00:00

Comments: Project page: https://jay-ye.github.io/AnchorDream/

Abs · PDF · Code1 · Code2 · Project1

Abstract

The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited diversity and fidelity with pronounced sim-to-real gaps. While generative models present an attractive solution, existing methods often alter only visual appearances without creating new behaviors, or suffer from embodiment inconsistencies that yield implausible motions. To address these limitations, we introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis. AnchorDream conditions the diffusion process on robot motion renderings, anchoring the embodiment to prevent hallucination while synthesizing objects and environments consistent with the robot's kinematics. Starting from only a handful of human teleoperation demonstrations, our method scales them into large, diverse, high-quality datasets without requiring explicit environment modeling. Experiments show that the generated data leads to consistent improvements in downstream policy learning, with relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies. These results suggest that grounding generative world models in robot motion provides a practical path toward scaling imitation learning.

中文标题/摘要

标题：AnchorDream：重新利用视频扩散模型进行感知体态机器人数据合成

大规模和多样化的机器人演示数据的收集仍然是模仿学习的主要瓶颈，因为现实世界的数据获取成本高昂，而模拟器提供的多样性和保真度有限，且存在显著的模拟到现实的差距。虽然生成模型提供了一个有吸引力的解决方案，但现有方法往往仅改变视觉外观，而没有创造新的行为，或者由于体态不一致导致产生不合理的动作。为了解决这些限制，我们引入了AnchorDream，这是一种感知体态的世界模型，重新利用预训练的视频扩散模型进行机器人数据合成。AnchorDream通过条件化扩散过程，基于机器人运动渲染，防止幻觉并合成与机器人运动学一致的物体和环境。从少量的人类远程操作演示开始，我们的方法可以扩展为大规模、多样化的高质量数据集，而无需进行显式环境建模。实验表明，生成的数据在下游策略学习中带来了持续的改进，模拟器基准测试中的相对收益为36.4%，而在现实世界研究中的表现几乎翻倍。这些结果表明，将生成的世界模型与机器人运动相结合，为扩展模仿学习提供了一条实用的道路。

Summary / 总结

AnchorDream addresses the challenge of collecting large-scale and diverse robot demonstrations by using pretrained video diffusion models to generate embodiment-aware robot data. Starting from a few human teleoperation demonstrations, AnchorDream creates large, diverse, and high-quality datasets that improve policy learning in both simulators and real-world settings, showing relative gains of 36.4% in simulators and nearly double performance in real-world studies.

AnchorDream通过利用预训练的视频扩散模型生成具备体感意识的机器人数据，从少量的人类远程操作演示开始，生成大量、多样且高质量的数据，无需进行显式环境建模。生成的数据在下游策略学习中表现出色，模拟器基准测试中的相对增益为36.4%，而在真实世界研究中的表现几乎翻倍。

Probing forced responses and causality in data-driven climate emulators: conceptual limitations and the role of reduced-order models

Authors: Fabrizio Falasca

First: 2025-06-27T18:04:36+00:00 · Latest: 2025-12-12T18:57:28+00:00

Abs · PDF · Code1 · Code2

Abstract

A central challenge in climate science and applied mathematics is developing data-driven models of multiscale systems that capture both stationary statistics and responses to external perturbations. Current neural climate emulators aim to resolve the atmosphere-ocean system in all its complexity but often struggle to reproduce forced responses, limiting their use in causal studies such as Green's function experiments. To explore the origin of these limitations, we first examine a simplified dynamical system that retains key features of climate variability. We interpret the results through linear response theory, providing a rigorous framework to evaluate neural models beyond stationary statistics and to probe causal mechanisms. We argue that the ability of emulators of multiscale systems to reproduce perturbed statistics depends critically on (i) the choice of an appropriate coarse-grained representation and (ii) careful parameterizations of unresolved processes. These insights highlight reduced-order models, tailored to specific goals, processes, and scales, as valuable alternatives to general-purpose emulators. We next consider a real-world application by developing a neural model to investigate the joint variability of the surface temperature field and radiative fluxes. The model infers a multiplicative noise process directly from data, largely reproduces the system's probability distribution, and enables causal studies through forced responses. We discuss its limitations and outline directions for future work. Overall, these results expose key challenges in data-driven modeling of multiscale physical systems and underscore the value of coarse-grained, stochastic approaches, with response theory providing a principled framework to guide model design and enhance causal understanding.

中文标题/摘要

标题：探究数据驱动气候模拟器中的强迫响应和因果关系：概念局限性及降阶模型的作用

气候科学和应用数学中的一个核心挑战是开发能够捕捉多尺度系统中稳态统计和对外部扰动响应的数据驱动模型。当前的神经气候模拟器旨在解决大气-海洋系统的全部复杂性，但往往难以再现强迫响应，限制了其在格林函数实验等因果研究中的应用。为了探索这些局限性的来源，我们首先研究了一个简化动力系统，保留了气候变率的关键特征。我们通过线性响应理论来解释结果，提供了一个超越稳态统计的神经模型评估框架，并探究因果机制。我们认为，多尺度系统模拟器再现扰动统计的能力取决于（i）适当粗粒化表示的选择和（ii）未解决过程的精细参数化。这些见解突出了针对特定目标、过程和尺度定制的降阶模型作为通用模拟器的有价值替代品的重要性。接下来，我们考虑了一个实际应用，通过开发一个神经模型来研究地表温度场和辐射通量的联合变异性。该模型直接从数据中推断出一个乘性噪声过程，几乎完全再现了系统的概率分布，并通过强迫响应进行因果研究。我们讨论了其局限性并指出了未来工作的方向。总体而言，这些结果揭示了多尺度物理系统数据驱动建模中的关键挑战，并强调了粗粒化、随机方法的价值，响应理论为模型设计提供了一个原则性的框架，以增强因果理解。

Summary / 总结

This study addresses the challenge of developing data-driven climate models that accurately capture both stationary statistics and responses to external perturbations. By examining a simplified dynamical system and applying linear response theory, the researchers identify that the ability of neural climate emulators to reproduce forced responses depends on the choice of coarse-grained representation and careful parameterization of unresolved processes. The study demonstrates that reduced-order models, tailored to specific goals and scales, can effectively capture perturbed statistics and enable causal studies. A real-world application using surface temperature and radiative fluxes further illustrates these findings, highlighting the limitations of general-purpose emulators and the value of stochastic approaches in data-driven modeling.

该研究探讨了开发能够准确捕捉多尺度系统中稳态统计和外部扰动响应的数据驱动气候模型的挑战。通过分析简化动力系统并应用线性响应理论，研究人员发现神经气候模拟器能否再现强迫响应取决于粗粒化表示的选择和未解析过程的精细参数化。研究结果表明，针对特定目标和尺度定制的降阶模型能够有效捕捉扰动统计并支持因果研究。使用地表温度和辐射通量的实际应用进一步说明了这些发现，突出了在数据驱动建模中使用随机方法的价值。

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Authors: Yang Fei, George Stoica, Jingyuan Liu, Qifeng Chen, Ranjay Krishna, Xiaojuan Wang, Benlin Liu

First: 2025-12-12T18:56:35+00:00 · Latest: 2025-12-12T18:56:35+00:00

Comments: Project Website: https://sam2videox.github.io/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Reality is a dance between rigid constraints and deformable structures. For video models, that means generating motion that preserves fidelity as well as structure. Despite progress in diffusion models, producing realistic structure-preserving motion remains challenging, especially for articulated and deformable objects such as humans and animals. Scaling training data alone, so far, has failed to resolve physically implausible transitions. Existing approaches rely on conditioning with noisy motion representations, such as optical flow or skeletons extracted using an external imperfect model. To address these challenges, we introduce an algorithm to distill structure-preserving motion priors from an autoregressive video tracking model (SAM2) into a bidirectional video diffusion model (CogVideoX). With our method, we train SAM2VideoX, which contains two innovations: (1) a bidirectional feature fusion module that extracts global structure-preserving motion priors from a recurrent model like SAM2; (2) a Local Gram Flow loss that aligns how local features move together. Experiments on VBench and in human studies show that SAM2VideoX delivers consistent gains (+2.60\% on VBench, 21-22\% lower FVD, and 71.4\% human preference) over prior baselines. Specifically, on VBench, we achieve 95.51\%, surpassing REPA (92.91\%) by 2.60\%, and reduce FVD to 360.57, a 21.20\% and 22.46\% improvement over REPA- and LoRA-finetuning, respectively. The project website can be found at https://sam2videox.github.io/ .

中文标题/摘要

标题：从跟踪中推断结构：提炼保结构运动以生成视频

现实是刚性约束与可变形结构之间的舞蹈。对于视频模型来说，这意味着生成既能保持保真度又能保持结构的运动。尽管在扩散模型方面取得了进展，但生成真实的保结构运动仍然具有挑战性，尤其是对于如人类和动物等具有关节和可变形物体。迄今为止，仅扩大训练数据尚未解决物理上不合理的过渡问题。现有方法依赖于使用外部不完美模型提取的噪声运动表示，如光学流或骨架。为了解决这些挑战，我们提出了一种算法，将自回归视频跟踪模型（SAM2）中的结构保运动先验提炼到双向视频扩散模型（CogVideoX）中。通过我们的方法，我们训练了SAM2VideoX，其中包含两项创新：（1）双向特征融合模块，从递归模型（如SAM2）中提取全局结构保运动先验；（2）局部格拉姆流损失，使局部特征的移动方式保持一致。在VBench和人类研究中的实验表明，SAM2VideoX在先前基线之上提供了持续的改进（在VBench上+2.60%，FVD降低21-22%，人类偏好度提高71.4%）。具体来说，在VBench上，我们达到了95.51%，超越了REPA（92.91%）2.60%，并将FVD降低至360.57，分别比REPA-和LoRA-微调提高了21.20%和22.46%。项目网站可访问 https://sam2videox.github.io/ 。

Summary / 总结

The research aims to generate realistic and structure-preserving motion in videos, addressing the challenges of physically plausible transitions for articulated and deformable objects. The method involves using an autoregressive video tracking model (SAM2) to distill structure-preserving motion priors into a bidirectional video diffusion model (CogVideoX). Experiments show that SAM2VideoX outperforms prior baselines, achieving a 2.60% improvement on VBench and a 21-22% lower FVD, with 71.4% human preference. Specifically, it surpasses REPA by 2.60% on VBench and reduces FVD to 360.57, a 21.20% and 22.46% improvement over REPA- and LoRA-finetuning, respectively.

本文解决了生成真实且结构保持的视频运动的挑战。它引入了SAM2VideoX，结合了来自自回归视频跟踪模型（SAM2）的双向特征融合模块和局部格拉姆流损失，以对齐局部特征的移动。实验表明，SAM2VideoX 在 VBench 和人类偏好研究中优于先前的方法，实现了 2.60% 的改进和 21-22% 降低的 FVD，优于先前的方法。

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

Authors: Etienne Boursier, Claire Boyer

First: 2025-12-12T18:54:52+00:00 · Latest: 2025-12-12T18:54:52+00:00

Abs · PDF · Code1 · Code2

Abstract

Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gaussian inputs, we lean on the fact that the softmax operator converges in the infinite-prompt limit to a linear operator acting on the underlying input-token measure. Building on this insight, we establish non-asymptotic concentration bounds for the output and gradient of softmax attention, quantifying how rapidly the finite-prompt model approaches its infinite-prompt counterpart, and prove that this concentration remains stable along the entire training trajectory in general in-context learning settings with sub-Gaussian tokens. In the case of in-context linear regression, we use the tractable infinite-prompt dynamics to analyze training at finite prompt length. Our results allow optimization analyses developed for linear attention to transfer directly to softmax attention when prompts are sufficiently long, showing that large-prompt softmax attention inherits the analytical structure of its linear counterpart. This, in turn, provides a principled and broadly applicable toolkit for studying the training dynamics and statistical behavior of softmax attention layers in large prompt regimes.

中文标题/摘要

标题：Softmax作为大提示下的线性注意力：基于测度的观点

Softmax注意力是变压器架构中的核心组件，但其非线性结构给理论分析带来了重大挑战。我们开发了一个统一的基于测度的框架，用于研究在有限和无限提示下的单层softmax注意力。对于独立同分布的高斯输入，我们利用softmax操作在无限提示极限下收敛于作用于底层输入-标记测度的线性操作这一事实。基于这一洞察，我们建立了softmax注意力输出和梯度的非渐近收敛界，量化了有限提示模型如何迅速接近其无限提示对应物，并证明了在一般子高斯标记的上下文学习设置中，这种收敛在训练轨迹的整个过程中保持稳定。在上下文线性回归的情况下，我们利用可处理的无限提示动力学来分析有限提示长度下的训练。我们的结果允许直接将针对线性注意力的优化分析转移到足够长的提示下的softmax注意力，表明大提示下的softmax注意力继承了其线性对应物的分析结构。这反过来为研究softmax注意力层在大提示下的训练动力学和统计行为提供了一个原则性的和广泛适用的工具箱。

Summary / 总结

The paper develops a measure-based framework to study softmax attention in transformers, focusing on its behavior in the large-prompt regime. By leveraging the infinite-prompt limit, the authors show that softmax attention converges to linear attention, allowing for the application of linear attention's analytical structure. Key findings include non-asymptotic concentration bounds for softmax attention outputs and gradients, and the stability of these properties during training. This framework enables optimization analyses for linear attention to be directly applied to softmax attention in long-prompt scenarios, providing insights into the training dynamics and statistical behavior of softmax layers.

论文开发了一种基于测度的方法来研究transformer中的softmax注意力在大提示长度下的行为。通过利用softmax在无限提示极限下收敛到线性算子的事实，作者建立了输出和梯度的非渐近收敛界，表明有限提示长度下的softmax注意力迅速接近其无限提示长度的对应物，并在整个训练过程中保持稳定。这种分析使得可以将线性注意力的优化技术直接应用于大提示长度下的softmax注意力，提供了关于softmax层训练动力学和统计行为的见解。

MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator

Authors: Peiqing Yang, Shangchen Zhou, Kai Hao, Qingyi Tao

First: 2025-12-12T18:51:49+00:00 · Latest: 2025-12-12T18:51:49+00:00

Comments: Project page: https://pq-yang.github.io/projects/MatAnyone2/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision often leads to segmentation-like mattes lacking fine details. To this end, we introduce a learned Matting Quality Evaluator (MQE) that assesses semantic and boundary quality of alpha mattes without ground truth. It produces a pixel-wise evaluation map that identifies reliable and erroneous regions, enabling fine-grained quality assessment. The MQE scales up video matting in two ways: (1) as an online matting-quality feedback during training to suppress erroneous regions, providing comprehensive supervision, and (2) as an offline selection module for data curation, improving annotation quality by combining the strengths of leading video and image matting models. This process allows us to build a large-scale real-world video matting dataset, VMReal, containing 28K clips and 2.4M frames. To handle large appearance variations in long videos, we introduce a reference-frame training strategy that incorporates long-range frames beyond the local window for effective training. Our MatAnyone 2 achieves state-of-the-art performance on both synthetic and real-world benchmarks, surpassing prior methods across all metrics.

中文标题/摘要

标题：MatAnyone 2：通过学习质量评估器扩展视频抠图

视频抠图受限于现有数据集的规模和真实性。虽然利用分割数据可以增强语义稳定性，但缺乏有效的边界监督往往导致缺乏细节数码分割蒙版。为此，我们引入了一个学习蒙版质量评估器（MQE），该评估器无需地面真值即可评估alpha蒙版的语义和边界质量。它生成一个像素级评估图，识别可靠和错误区域，实现细粒度的质量评估。MQE通过两种方式扩展视频抠图：（1）作为训练期间的在线抠图质量反馈，抑制错误区域，提供全面的监督；（2）作为离线选择模块进行数据整理，通过结合领先视频和图像抠图模型的优势提高注释质量。这一过程使我们能够构建一个包含28K片段和2.4M帧的大型真实世界视频抠图数据集VMReal。为处理长视频中较大的外观变化，我们引入了一种参考帧训练策略，该策略结合了超出局部窗口的长距离帧进行有效训练。我们的MatAnyone 2在合成和真实世界基准测试中均达到最先进的性能，所有指标均超越了先前的方法。

Summary / 总结

The research addresses the limitations of existing video matting datasets in terms of scale and realism. It introduces a learned Matting Quality Evaluator (MQE) that assesses the quality of alpha mattes without ground truth, providing pixel-wise evaluation maps to identify reliable and erroneous regions. This method enhances semantic stability and boundary quality, enabling fine-grained quality assessment. The MQE is used both during training to suppress erroneous regions and for offline data curation to improve annotation quality. This leads to the creation of a large-scale real-world video matting dataset, VMReal, and a reference-frame training strategy that incorporates long-range frames for effective training. The proposed MatAnyone 2 model achieves state-of-the-art performance on both synthetic and real-world benchmarks.

研究旨在解决现有视频抠图数据集在规模和真实性方面的限制。引入了一个无需地面 truth 的 Matting 质量评估器 (MQE)，以精细评估语义和边界质量。MQE 作为在线反馈机制用于训练期间抑制错误区域，并作为离线选择模块用于数据整理，提高标注质量。这导致创建了一个大规模的现实世界视频抠图数据集 VMReal，以及一种参考帧训练策略，通过引入长范围帧进行有效训练。所提出的 MatAnyone 2 模型在合成和现实世界基准测试中均达到最先进的性能。

Agile Flight Emerges from Multi-Agent Competitive Racing

Authors: Vineet Pasumarti, Lorenzo Bianchi, Antonio Loquercio

First: 2025-12-12T18:48:50+00:00 · Latest: 2025-12-12T18:48:50+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the environment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. Code: https://github.com/Jirl-upenn/AgileFlight_MultiAgent

中文标题/摘要

标题：敏捷飞行源自多智能体竞速比赛

通过多智能体竞争和赢得比赛的稀疏高层目标，我们发现，敏捷飞行（例如，高速运动使平台达到物理极限）和策略（例如，超越或阻挡）均源自使用强化学习训练的智能体。我们在模拟和现实世界中提供了证据，表明这种方法在环境复杂性增加（例如，存在障碍物）时，优于单独训练智能体并用规定行为的奖励进行训练的常见范式。此外，我们发现多智能体竞争产生的策略在现实世界中的转移性比使用单智能体进度奖励训练的策略更可靠，尽管两种方法使用相同的模拟环境、随机化策略和硬件。除了改进的模拟到现实世界的转移性，多智能体策略还表现出一定程度的对未在训练中遇到的对手的泛化能力。总体而言，我们的工作，沿袭了数字领域多智能体竞争游戏的传统，表明稀疏的任务级奖励足以训练出能够在物理世界中执行高级低级控制的智能体。

Summary / 总结

The research aims to explore how agile flight and strategic behavior can emerge from multi-agent competition using reinforcement learning. The method involves training agents in a competitive racing environment with sparse high-level rewards, focusing on winning the race. Key findings show that this approach outperforms single-agent training with detailed rewards, especially in complex environments with obstacles. Additionally, multi-agent policies exhibit better sim-to-real transfer and some generalization to unseen opponents compared to single-agent policies trained with progress-based rewards.

研究通过强化学习探索了多智能体竞速比赛如何导致敏捷飞行和策略行为的涌现。与孤立的进度奖励训练相比，这种多智能体方法在复杂环境（如存在障碍物）中表现更优，尤其是在模拟和现实世界之间的转移性能更好，并且对未见过的对手也表现出一定程度的泛化能力，证明了稀疏的任务级奖励可以实现物理世界中的高级低级控制。

Conditional Coverage Diagnostics for Conformal Prediction

Authors: Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach

First: 2025-12-12T18:47:39+00:00 · Latest: 2025-12-12T18:47:39+00:00

Abs · PDF · Code1 · Code2

Abstract

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if any classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.

中文标题/摘要

标题：条件覆盖诊断在一致预测中的应用

评估条件覆盖仍然是评估预测系统可靠性的最持久挑战之一。尽管一致方法可以提供边缘覆盖的保证，但没有任何方法可以保证生成具有正确条件覆盖的集合，使实践者无法清晰地解释局部偏差。为克服现有指标的样本低效性和过拟合问题，我们将条件覆盖估计问题转化为分类问题。条件覆盖被违反当且仅当任何分类器可以实现低于目标覆盖的风险。通过选择适当的损失函数，所得到的风险差异可以保守地估计L1和L2距离等自然覆盖偏差度量，并且甚至可以区分过度覆盖和不足覆盖以及非恒定目标覆盖的影响。我们称这一系列指标为目标覆盖的超额风险（ERT）。我们实验证明，使用现代分类器提供了比现有指标如CovGap所依赖的简单分类器更高的统计功效。此外，我们使用该指标对不同的一致预测方法进行了基准测试。最后，我们发布了ERT及其之前条件覆盖指标的开源包。这些贡献共同提供了一种新的视角来理解、诊断和改进预测系统的条件可靠性。

Summary / 总结

The paper addresses the challenge of evaluating conditional coverage in predictive systems. It proposes a new method by framing conditional coverage as a classification problem, using excess risk of the target coverage (ERT) to estimate violations. Experiments show that modern classifiers provide higher statistical power compared to traditional methods like CovGap, and the metric is used to benchmark conformal prediction methods. An open-source package for ERT and previous metrics is also released.

论文解决了预测系统中条件覆盖评估的挑战。它通过将条件覆盖问题重新定义为分类问题，使用目标覆盖的剩余风险（ERT）来估计违反情况。实验表明，现代分类器相比传统方法（如CovGap）提供了更高的统计功效，并使用该指标对不同的容许预测方法进行了基准测试。还发布了ERT及其之前指标的开源包。

The Adaptive Vekua Cascade: A Differentiable Spectral-Analytic Solver for Physics-Informed Representation

Authors: Vladimer Khasia

First: 2025-12-12T18:41:35+00:00 · Latest: 2025-12-12T18:41:35+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Coordinate-based neural networks have emerged as a powerful tool for representing continuous physical fields, yet they face two fundamental pathologies: spectral bias, which hinders the learning of high-frequency dynamics, and the curse of dimensionality, which causes parameter explosion in discrete feature grids. We propose the Adaptive Vekua Cascade (AVC), a hybrid architecture that bridges deep learning and classical approximation theory. AVC decouples manifold learning from function approximation by using a deep network to learn a diffeomorphic warping of the physical domain, projecting complex spatiotemporal dynamics onto a latent manifold where the solution is represented by a basis of generalized analytic functions. Crucially, we replace the standard gradient-descent output layer with a differentiable linear solver, allowing the network to optimally resolve spectral coefficients in a closed form during the forward pass. We evaluate AVC on a suite of five rigorous physics benchmarks, including high-frequency Helmholtz wave propagation, sparse medical reconstruction, and unsteady 3D Navier-Stokes turbulence. Our results demonstrate that AVC achieves state-of-the-art accuracy while reducing parameter counts by orders of magnitude (e.g., 840 parameters vs. 4.2 million for 3D grids) and converging 2-3x faster than implicit neural representations. This work establishes a new paradigm for memory-efficient, spectrally accurate scientific machine learning. The code is available at https://github.com/VladimerKhasia/vecua.

中文标题/摘要

标题：自适应韦克ua级联：一种基于物理的信息可解释谱分析求解器

基于坐标神经网络已成为表示连续物理场的强大工具，但它们面临两个基本问题：光谱偏差，这妨碍了对高频动力学的学习；以及维度灾难，这导致离散特征网格中的参数爆炸。我们提出了自适应韦克ua级联（AVC），这是一种结合了深度学习和经典逼近理论的混合架构。AVC通过使用深度网络学习物理域的微分同胚扭曲，将复杂的时空动态投影到一个潜在流形上，在该流形上，解由广义解析函数基表示。关键的是，我们用可微分线性求解器替代了标准的梯度下降输出层，使网络在前向传递过程中能够以封闭形式最优地解析谱系数。我们在五个严格的物理基准测试中评估了AVC，包括高频亥姆霍兹波传播、稀疏医学重建和非稳态三维纳维-斯托克斯湍流。我们的结果表明，AVC在参数数量减少几个数量级（例如，3D网格中的840个参数与420万参数相比）的同时，实现了最先进的准确性，并且比隐式神经表示快2-3倍的收敛速度。这项工作为高效、光谱准确的科学机器学习建立了新的范式。代码可在https://github.com/VladimerKhasia/vecua/ 获取。

Summary / 总结

The paper addresses the limitations of coordinate-based neural networks in representing high-frequency dynamics and handling the curse of dimensionality. It introduces the Adaptive Vekua Cascade (AVC), which uses a deep network to learn a diffeomorphic warping of the physical domain, projecting the solution onto a latent manifold. AVC replaces the standard gradient-descent output layer with a differentiable linear solver, enabling spectral coefficient resolution during the forward pass. Experiments on physics benchmarks show AVC achieves superior accuracy with significantly fewer parameters and faster convergence compared to implicit neural representations.

Adaptive Vekua Cascade (AVC) 结合深度学习与经典逼近理论，通过深度网络对物理域进行扭曲，并在潜在流形上使用广义解析函数表示解，从而克服了频谱偏差和维度灾难。AVC 在各种物理基准测试中（如亥姆霍兹波传播和纳维-斯托克斯湍流）表现出色，参数数量显著减少且收敛速度更快，达到最先进的准确性。

REDELEX: A Framework for Relational Deep Learning Exploration

Authors: Jakub Peleška, Gustav Šír

First: 2025-06-27T13:05:15+00:00 · Latest: 2025-12-12T18:15:25+00:00

Comments: Accepted to ECMLPKDD 2025 at Porto, Portugal

Abs · PDF · Code1 · Code2

Abstract

Relational databases (RDBs) are widely regarded as the gold standard for storing structured information. Consequently, predictive tasks leveraging this data format hold significant application promise. Recently, Relational Deep Learning (RDL) has emerged as a novel paradigm wherein RDBs are conceptualized as graph structures, enabling the application of various graph neural architectures to effectively address these tasks. However, given its novelty, there is a lack of analysis into the relationships between the performance of various RDL models and the characteristics of the underlying RDBs. In this study, we present REDELEX$-$a comprehensive exploration framework for evaluating RDL models of varying complexity on the most diverse collection of over 70 RDBs, which we make available to the community. Benchmarked alongside key representatives of classic methods, we confirm the generally superior performance of RDL while providing insights into the main factors shaping performance, including model complexity, database sizes and their structural properties.

中文标题/摘要

标题：REDELEX：关系深度学习探索框架

关系数据库（RDB）被认为是存储结构化信息的黄金标准。因此，利用这种数据格式的预测任务具有重要的应用前景。最近，关系深度学习（RDL）作为一种新兴范式出现，其中RDB被概念化为图结构，使得可以应用各种图神经网络架构来有效解决这些任务。然而，由于其新颖性，目前缺乏对各种RDL模型性能与其底层RDB特性之间关系的分析。在本研究中，我们提出了一种名为REDELEX的全面探索框架，用于在超过70个最多样化的RDB集合上评估不同复杂度的RDL模型，并向社区提供这些RDB。与经典方法的关键代表进行基准测试后，我们确认了RDL的一般优越性能，并提供了影响性能的主要因素的见解，包括模型复杂性、数据库大小及其结构特性。

Summary / 总结

REDELEX is a framework designed to evaluate the performance of different RDL models on a diverse set of over 70 RDBs. The study confirms that RDL models generally outperform traditional methods, and provides insights into factors affecting performance such as model complexity, database size, and structural properties.

REDELEX 是一个框架，用于评估不同 RDL 模型在超过 70 个不同 RDB 上的表现。研究证实 RDL 模型通常优于传统方法，并提供了影响性能的因素，如模型复杂性、数据库大小和结构特性等方面的见解。该框架对社区开放，供进一步研究使用。

MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems

Authors: Barak Or

First: 2025-11-08T21:29:18+00:00 · Latest: 2025-12-12T17:56:26+00:00

Comments: preprint

Abs · PDF · Code1 · Code2

Abstract

Ensuring cognitive stability in autonomous multi-agent systems (MAS) is a central challenge for large-scale, distributed AI. While existing observability tools monitor system outputs, they cannot quantify how rapidly agentic workflows recover once reasoning coherence has been lost. We adapt classical reliability metrics-Mean Time-to-Recovery (MTTR), Mean Time Between Failures (MTBF), and related ratios-into the cognitive domain, defining MTTR-A (Mean Time-to-Recovery for Agentic Systems) as a runtime measure of cognitive recovery latency. MTTR-A quantifies the time required for a MAS to detect reasoning drift and restore consistent operation, capturing the recovery of reasoning coherence rather than infrastructural repair. A benchmark simulation using the AG~News corpus and the LangGraph orchestration framework was conducted, modeling recovery latencies across multiple reflex modes. Automated reflexes restored stability within approximately 6s on average, while human-approval interventions required about 12s. Across 200 runs, the median simulated MTTR-A was 6.21+-2.14s, MTBF=6.7+-2.14s, and NRR=0.08, demonstrating measurable runtime resilience across reflex strategies. By formalizing recovery latency as a quantifiable property of distributed reasoning-and deriving reliability bounds linking recovery time and cognitive uptime-this work establishes a foundation for runtime dependability in agentic cognition, transforming cognitive recovery from an ad-hoc process into a standardized, interpretable performance

中文标题/摘要

标题：MTTR-A：多智能体系统中的认知恢复延迟度量

确保自主多智能体系统（MAS）的认知稳定性是大规模分布式人工智能中的核心挑战。现有可观测性工具监控系统输出，但无法量化智能体工作流在推理一致性丢失后恢复的速度。我们借鉴经典可靠性指标——平均恢复时间（MTTR）、平均故障间隔时间（MTBF）及相关比率，将其引入认知领域，定义MTTR-A（智能体系统平均恢复时间）作为运行时的认知恢复延迟度量。MTTR-A量化了MAS检测推理漂移并恢复一致运行所需的时间，捕捉的是推理一致性的恢复而非基础设施的修复。使用AG~News语料库和LangGraph编排框架进行了基准模拟，模型了多种反射模式下的恢复延迟。自动反射在大约6秒内恢复了稳定性，而人工审批干预则需要约12秒。在200次运行中，模拟的中位数MTTR-A为6.21±2.14秒，MTBF=6.7±2.14秒，NRR=0.08，展示了不同反射策略下的可测量运行时弹性。通过将恢复延迟形式化为分布式推理的可量化属性，并推导出恢复时间和认知运行时间之间的可靠性界限，这项工作为智能体认知的运行时可靠性奠定了基础，将认知恢复从一种随意的过程转变为标准化、可解释的性能指标

Summary / 总结

This paper introduces MTTR-A, a metric for measuring cognitive recovery latency in autonomous multi-agent systems (MAS), by adapting classical reliability metrics. The study uses a benchmark simulation with the AG News corpus and LangGraph to show that automated reflexes restore stability within about 6 seconds, while human interventions take around 12 seconds. The median simulated MTTR-A across 200 runs was 6.21±2.14 seconds, indicating measurable runtime resilience in reflex strategies.

该论文通过引入MTTR-A（Agentic系统的平均恢复时间），解决自主多智能体系统（MAS）的认知稳定性问题，MTTR-A是认知恢复延迟的运行时度量。研究使用AG News语料库和LangGraph编排框架进行基准模拟，评估不同反射模式下的恢复延迟，结果显示自动化反射平均在6秒内恢复稳定性，而人工审批干预则需要约12秒。模拟的MTTR-A中位数为6.21±2.14秒，表明反射策略下具有可测量的运行时弹性。

UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framework Built for Human-Centric AI

Authors: Darvin Yi, Teng Liu, Mattie Terzolo, Lance Hasson, Ayan Sinha, Pablo Mendes, Andrew Rabinovich

First: 2025-11-15T17:39:37+00:00 · Latest: 2025-12-12T17:51:50+00:00

Abs · PDF · Code1 · Code2

Abstract

As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collaboration. Existing benchmarks remain largely static, synthetic, or domain-limited, providing limited insight into how agents perform in dynamic, economically meaningful environments. We introduce UpBench, a dynamically evolving benchmark grounded in real jobs drawn from the global Upwork labor marketplace. Each task corresponds to a verified client transaction, anchoring evaluation in genuine work activity and financial outcomes. UpBench employs a rubric-based evaluation framework, in which expert freelancers decompose each job into detailed, verifiable acceptance criteria and assess AI submissions with per-criterion feedback. This structure enables fine-grained analysis of model strengths, weaknesses, and instruction-following fidelity beyond binary pass/fail metrics. Human expertise is integrated throughout the data pipeline (from job curation and rubric construction to evaluation) ensuring fidelity to real professional standards and supporting research on human-AI collaboration. By regularly refreshing tasks to reflect the evolving nature of online work, UpBench provides a scalable, human-centered foundation for evaluating agentic systems in authentic labor-market contexts, offering a path toward a collaborative framework, where AI amplifies human capability through partnership rather than replacement.

中文标题/摘要

标题：UpBench：一种基于真实世界的劳动力市场代理基准框架，为以人为本的AI而构建

随着大型语言模型（LLM）代理越来越多地承担数字工作，需要可靠的框架来评估其在现实世界中的能力、适应性和与人类协作的能力。现有基准大多静态、合成或领域限制，提供的洞察有限，无法反映代理在动态、经济意义上重要的环境中表现如何。我们介绍了UpBench，这是一种基于全球Upwork劳动力市场的实际工作的动态演变基准。每个任务对应一个经过验证的客户交易，将评估锚定在真实的劳动活动和财务结果上。UpBench采用基于评分的评估框架，其中专家自由职业者将每个任务分解为详细的、可验证的接受标准，并对AI提交内容进行逐项反馈评估。这种结构使我们能够对模型的优势、弱点和指令遵循的准确性进行精细分析，超越了二元通过/未通过的度量标准。在整个数据管道中（从任务策划、评分标准构建到评估）整合人类专业知识，确保符合真实的专业标准，并支持人类-AI协作的研究。通过定期更新任务以反映在线工作的演变，UpBench为评估代理系统在真实的劳动力市场环境中的能力提供了可扩展、以人为本的基础，提供了一条通往合作框架的道路，在这种框架中，AI通过伙伴关系而非替代来增强人类能力。

Summary / 总结

UpBench is a dynamically evolving benchmark for evaluating the real-world competence, adaptability, and human collaboration skills of large language model agents. It uses tasks from the global Upwork labor marketplace, ensuring that evaluations are grounded in genuine work activity and financial outcomes. The benchmark employs a rubric-based evaluation framework where expert freelancers assess AI submissions with detailed, per-criterion feedback, enabling a fine-grained analysis of model strengths and weaknesses. By regularly refreshing tasks, UpBench provides a scalable, human-centered foundation for evaluating agentic systems in authentic labor-market contexts.

UpBench 是一个动态演化的基准框架，用于评估大型语言模型代理在现实世界中的专业能力、适应性和与人类的合作能力。它使用来自全球 Upwork 劳动力市场的任务，确保评估基于真实的日常工作活动和财务成果。该基准框架采用基于评分表的评估体系，其中专家自由职业者对 AI 提交进行详细的、可验证的评估反馈，从而实现对模型性能的更细致分析。主要发现表明，UpBench 提供了一个面向人类的、可扩展的基础框架，用于在真实的劳动力市场环境中评估代理系统，促进一种协作框架，其中 AI 通过合作而非替代来增强人类能力。

LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems

Authors: Ernesto Casablanca, Oliver Schön, Paolo Zuliani, Sadegh Soudjani

Venue: AAAI 2026

First: 2025-12-12T17:46:50+00:00 · Latest: 2025-12-12T17:46:50+00:00

Comments: The manuscript has been accepted for publication in the main track of AAAI 2026

Abs · PDF · Code1 · Code2

Abstract

Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic dynamics. To address these challenges, we introduce LUCID (Learning-enabled Uncertainty-aware Certification of stochastIc Dynamical systems), a verification engine for certifying safety of black-box stochastic dynamical systems from a finite dataset of random state transitions. As such, LUCID is the first known tool capable of establishing quantified safety guarantees for such systems. Thanks to its modular architecture and extensive documentation, LUCID is designed for easy extensibility. LUCID employs a data-driven methodology rooted in control barrier certificates, which are learned directly from system transition data, to ensure formal safety guarantees. We use conditional mean embeddings to embed data into a reproducing kernel Hilbert space (RKHS), where an RKHS ambiguity set is constructed that can be inflated to robustify the result to out-of-distribution behavior. A key innovation within LUCID is its use of a finite Fourier kernel expansion to reformulate a semi-infinite non-convex optimization problem into a tractable linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. LUCID thus offers a robust and efficient verification framework, able to handle the complexities of modern black-box systems while providing formal guarantees of safety. These unique capabilities are demonstrated on challenging benchmarks.

中文标题/摘要

标题：LUCID：学习驱动的不确定性感知认证方法用于随机动力学系统

确保AI驱动系统的安全性，特别是在自动驾驶和医疗等高风险领域，变得越来越重要。传统的形式验证工具在面对既包含不透明的黑盒AI组件又包含复杂随机动力学的系统时表现不佳。为了解决这些挑战，我们提出了LUCID（学习驱动的不确定性感知认证方法用于随机动力学系统），这是一种基于有限状态转换数据认证黑盒随机动力学系统安全性的验证引擎。LUCID是首个能够为这类系统提供量化安全保证的工具。由于其模块化架构和详尽的文档，LUCID设计易于扩展。LUCID采用基于控制屏障证书的数据驱动方法，直接从系统状态转换数据中学习，以确保形式上的安全保证。我们使用条件均值嵌入将数据嵌入到再生核希尔伯特空间（RKHS），并在其中构建一个RKHS不确定性集，可以膨胀以增强结果以应对离分布行为。LUCID的关键创新在于使用有限傅里叶核展开将半无限非凸优化问题重新表述为可处理的线性规划问题。由此产生的光谱屏障使我们能够利用快速傅里叶变换高效生成松弛问题，提供一种可扩展且分布鲁棒的验证安全性的框架。LUCID因此提供了一种稳健且高效的验证框架，能够处理现代黑盒系统的复杂性，同时提供形式上的安全保证。这些独特的能力在具有挑战性的基准测试中得到了验证。

Summary / 总结

LUCID is a verification engine designed to ensure the safety of black-box stochastic dynamical systems by learning control barrier certificates from a finite dataset of random state transitions. It uses conditional mean embeddings and a finite Fourier kernel expansion to construct an RKHS ambiguity set, providing formal safety guarantees. LUCID efficiently solves a semi-infinite non-convex optimization problem by reformulating it into a tractable linear program, enabling scalable and robust verification of complex systems. Experiments on benchmarks demonstrate LUCID's effectiveness in handling the intricacies of modern black-box systems while offering formal safety guarantees.

LUCID 是一个验证引擎，旨在通过从有限数据集中学习不确定性感知的控制屏障证书来确保黑盒随机动力系统的安全性。它使用条件均值嵌入和有限傅里叶核扩展将半无限非凸优化问题重新表述为可处理的线性规划问题，从而实现高效且可扩展的安全验证。关键发现包括能够为复杂系统建立量化安全保证，并对离分布行为具有鲁棒性。

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

Authors: Minglei Shi, Haolin Wang, Borui Zhang, Wenzhao Zheng, Bohan Zeng, Ziyang Yuan, Xiaoshi Wu, Yuanxing Zhang, Huan Yang, Xintao Wang, Pengfei Wan, Kun Gai, Jie Zhou, Jiwen Lu

First: 2025-12-12T17:45:03+00:00 · Latest: 2025-12-12T17:45:03+00:00

Comments: Code Repository: https://github.com/KlingTeam/SVG-T2I; Model Weights: https://huggingface.co/KlingTeam/SVG-T2I

Abs · PDF · Code1 · Code2 · Code3

Abstract

Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale text-to-image diffusion models entirely within the VFM representation space remains largely unexplored. To bridge this gap, we scale the SVG (Self-supervised representations for Visual Generation) framework, proposing SVG-T2I to support high-quality text-to-image synthesis directly in the VFM feature domain. By leveraging a standard text-to-image diffusion pipeline, SVG-T2I achieves competitive performance, reaching 0.75 on GenEval and 85.78 on DPG-Bench. This performance validates the intrinsic representational power of VFMs for generative tasks. We fully open-source the project, including the autoencoder and generation model, together with their training, inference, evaluation pipelines, and pre-trained weights, to facilitate further research in representation-driven visual generation.

中文标题/摘要

标题：SVG-T2I：无需变分自编码器扩展文本到图像的潜在扩散模型

基于视觉基础模型（VFM）表示的视觉生成为整合视觉理解、感知和生成提供了极具前景的统一途径。尽管如此，完全在VFM表示空间内训练大规模文本到图像的扩散模型仍然鲜有探索。为填补这一空白，我们扩展了SVG（视觉生成的自监督表示）框架，提出SVG-T2I以直接在VFM特征域中支持高质量的文本到图像合成。通过利用标准的文本到图像扩散管道，SVG-T2I实现了竞争力的表现，分别在GenEval和DPG-Bench上达到0.75和85.78。这一表现验证了VFM在生成任务中的固有表示能力。我们完全开源了该项目，包括自动编码器和生成模型，以及它们的训练、推理、评估管道和预训练权重，以促进基于表示的视觉生成进一步研究。

Summary / 总结

The research aims to enhance text-to-image generation by scaling the SVG framework into SVG-T2I, which operates directly in the Visual Foundation Model (VFM) representation space. By using a standard text-to-image diffusion pipeline, SVG-T2I achieves competitive performance, scoring 0.75 on GenEval and 85.78 on DPG-Bench, demonstrating the representational power of VFM for generative tasks.

研究旨在通过将SVG框架扩展为SVG-T2I，在视觉基础模型（VFM）表示空间中直接进行文本到图像的生成。通过使用标准的文本到图像的扩散管道，SVG-T2I 达到了在GenEval上0.75和在DPG-Bench上85.78的性能，这表明VFM在生成任务中的表示能力。

CogniSNN: Enabling Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability with Random Graph Architectures in Spiking Neural Networks

Authors: Yongsheng Huang, Peibo Duan, Yujie Wu, Kai Sun, Zhipeng Liu, Changsheng Zhang, Bin Zhang, Mingkun Xu

First: 2025-12-12T17:36:31+00:00 · Latest: 2025-12-12T17:36:31+00:00

Abs · PDF · Code1 · Code2

Abstract

Spiking neural networks (SNNs), regarded as the third generation of artificial neural networks, are expected to bridge the gap between artificial intelligence and computational neuroscience. However, most mainstream SNN research directly adopts the rigid, chain-like hierarchical architecture of traditional artificial neural networks (ANNs), ignoring key structural characteristics of the brain. Biological neurons are stochastically interconnected, forming complex neural pathways that exhibit Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability. In this paper, we introduce a new SNN paradigm, named Cognition-aware SNN (CogniSNN), by incorporating Random Graph Architecture (RGA). Furthermore, we address the issues of network degradation and dimensional mismatch in deep pathways by introducing an improved pure spiking residual mechanism alongside an adaptive pooling strategy. Then, we design a Key Pathway-based Learning without Forgetting (KP-LwF) approach, which selectively reuses critical neural pathways while retaining historical knowledge, enabling efficient multi-task transfer. Finally, we propose a Dynamic Growth Learning (DGL) algorithm that allows neurons and synapses to grow dynamically along the internal temporal dimension. Extensive experiments demonstrate that CogniSNN achieves performance comparable to, or even surpassing, current state-of-the-art SNNs on neuromorphic datasets and Tiny-ImageNet. The Pathway-Reusability enhances the network's continuous learning capability across different scenarios, while the dynamic growth algorithm improves robustness against interference and mitigates the fixed-timestep constraints during neuromorphic chip deployment. This work demonstrates the potential of SNNs with random graph structures in advancing brain-inspired intelligence and lays the foundation for their practical application on neuromorphic hardware.

中文标题/摘要

标题：CogniSNN：通过随机图架构实现神经元扩展性、路径重用性和动态配置性

脉冲神经网络（SNNs），被视为人工神经网络的第三代，有望弥合人工智能与计算神经科学之间的差距。然而，大多数主流SNN研究直接采用传统人工神经网络（ANNs）的刚性、链状层次结构，忽略了大脑的关键结构特征。生物神经元以随机方式相互连接，形成复杂的神经路径，表现出神经元扩展性、路径重用性和动态配置性。在本文中，我们通过引入随机图架构（RGA）提出了一种新的SNN范式，称为认知导向SNN（CogniSNN）。此外，我们通过引入改进的纯脉冲残差机制和自适应池化策略解决了网络退化和维度不匹配问题。然后，我们设计了一种基于关键路径的学习不遗忘（KP-LwF）方法，该方法选择性地重用关键神经路径，同时保留历史知识，实现高效的多任务迁移。最后，我们提出了一种动态生长学习（DGL）算法，允许神经元和突触沿内部时间维度动态生长。广泛的实验表明，CogniSNN在神经形态数据集和Tiny-ImageNet上实现了与当前最先进的SNN相当或更优的性能。路径重用性增强了网络在不同场景下的持续学习能力，而动态生长算法提高了抗干扰性，并缓解了在神经形态芯片部署过程中固定时间步长的限制。这项工作展示了具有随机图结构的SNNs在推进仿脑智能方面的潜力，并为其在神经形态硬件上的实际应用奠定了基础。

Summary / 总结

CogniSNN introduces a new SNN paradigm by integrating Random Graph Architecture (RGA) to achieve Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability. It addresses network degradation and dimensional mismatch through an improved spiking residual mechanism and adaptive pooling strategy. Key Pathway-based Learning without Forgetting (KP-LwF) selectively reuses critical pathways for efficient multi-task transfer, while Dynamic Growth Learning (DGL) allows neurons and synapses to grow dynamically. Experiments show CogniSNN outperforms or matches current state-of-the-art SNNs on neuromorphic datasets and Tiny-ImageNet, enhancing continuous learning and robustness against interference.

CogniSNN通过引入随机图架构（RGA）实现神经元扩展性、路径重用性和动态配置性。它通过改进的纯脉冲残差机制和自适应池化策略解决了网络退化和维度不匹配问题。关键路径基于学习不遗忘（KP-LwF）方法选择性重用关键路径以实现高效的多任务迁移，而动态生长学习（DGL）允许神经元和突触沿内部时间维度动态生长。实验表明，CogniSNN在神经形态数据集和Tiny-ImageNet上优于或匹配当前最先进的SNN，增强了连续学习能力和对干扰的鲁棒性。

Small-Gain Nash: Certified Contraction to Nash Equilibria in Differentiable Games

Authors: Vedansh Sharma

First: 2025-12-07T11:11:36+00:00 · Latest: 2025-12-12T17:29:26+00:00

Abs · PDF · Code1 · Code2

Abstract

Classical convergence guarantees for gradient-based learning in games require the pseudo-gradient to be (strongly) monotone in Euclidean geometry as shown by rosen(1965), a condition that often fails even in simple games with strong cross-player couplings. We introduce Small-Gain Nash (SGN), a block small-gain condition in a custom block-weighted geometry. SGN converts local curvature and cross-player Lipschitz coupling bounds into a tractable certificate of contraction. It constructs a weighted block metric in which the pseudo-gradient becomes strongly monotone on any region where these bounds hold, even when it is non-monotone in the Euclidean sense. The continuous flow is exponentially contracting in this designed geometry, and projected Euler and RK4 discretizations converge under explicit step-size bounds derived from the SGN margin and a local Lipschitz constant. Our analysis reveals a certified "timescale band", a non-asymptotic, metric-based certificate that plays a TTUR-like role: rather than forcing asymptotic timescale separation via vanishing, unequal step sizes, SGN identifies a finite band of relative metric weights for which a single-step-size dynamics is provably contractive. We validate the framework on quadratic games where Euclidean monotonicity analysis fails to predict convergence, but SGN successfully certifies it, and extend the construction to mirror/Fisher geometries for entropy-regularized policy gradient in Markov games. The result is an offline certification pipeline that estimates curvature, coupling, and Lipschitz parameters on compact regions, optimizes block weights to enlarge the SGN margin, and returns a structural, computable convergence certificate consisting of a metric, contraction rate, and safe step-sizes for non-monotone games.

中文标题/摘要

标题：小增益纳什：在可微博弈中对纳什均衡的认证收缩

经典的基于梯度的学习在博弈中的收敛性保证需要伪梯度在欧几里得几何中是（强）单调的，如罗森（1965）所示，这一条件在许多简单博弈中由于强跨玩家耦合往往无法满足。我们引入了小增益纳什（SGN），这是一种自定义块加权几何中的块小增益条件。SGN将局部曲率和跨玩家Lipschitz耦合界转化为一个可处理的收缩证书。它在这些界成立的任何区域内构造了一个加权块度量，在这种度量下伪梯度在欧几里得几何中非单调时也变得强单调。在设计的几何中，连续流是指数收缩的，投影欧拉和RK4离散化在显式步长界下收敛，该步长界源自SGN的边际和局部Lipschitz常数。我们的分析揭示了一个认证的“时间尺度带”，这是一种非渐近、基于度量的证书，类似于TTUR的作用：SGN识别出一个相对度量权重的有限带，在该带内单步长动态是可证收缩的，而不是通过消失的、不等的步长来强制渐近时间尺度分离。我们在欧几里得单调性分析无法预测收敛的二次博弈中验证了该框架，并将其构造扩展到镜像/费舍尔几何中，以对马尔可夫博弈中的熵正则化策略梯度进行认证。结果是一个离线认证管道，它在紧凑区域内估计曲率、耦合和Lipschitz参数，优化块权重以扩大SGN的边际，并返回一个结构化的、可计算的收敛证书，包括一个度量、收缩率和非单调博弈中的安全步长。

Summary / 总结

The research addresses the convergence issues of gradient-based learning in games where the pseudo-gradient is not strongly monotone in Euclidean geometry. It introduces Small-Gain Nash (SGN), a block small-gain condition in a custom block-weighted geometry, which converts local curvature and cross-player coupling bounds into a certificate of contraction. The SGN ensures exponential contraction in the designed geometry and enables the convergence of projected Euler and RK4 discretizations under explicit step-size bounds. The framework is validated on quadratic games and extended to mirror/Fisher geometries for policy gradient in Markov games, providing a non-asymptotic, metric-based certificate for convergence in non-monotone games.

研究解决了梯度法在博弈中伪梯度不是欧几里得几何中的强单调性问题，导致收敛性问题。引入了基于块的小增益条件（SGN）在自定义的块加权几何中，将局部曲率和跨玩家Lipschitz耦合边界条件转化为收敛性证书。SGN确保在设计的几何中具有指数收缩性，并允许投影Euler和RK4离散化在显式步长下收敛。研究在二次博弈中验证了SGN，并将其扩展到镜像/费舍尔几何中用于马尔可夫博弈的熵正则化策略梯度，提供了一个离线估计和优化收敛参数的认证管道。

Misspecification-robust amortised simulation-based inference using variational methods

Authors: Matthew O'Callaghan, Kaisey S. Mandel, Gerry Gilmore

First: 2025-09-06T14:10:49+00:00 · Latest: 2025-12-12T17:22:58+00:00

Comments: Updated metrics, fixed typos, adjusted title

Abs · PDF · Code1 · Code2

Abstract

Recent advances in neural density estimation have enabled powerful simulation-based inference (SBI) methods that can flexibly approximate Bayesian inference for intractable stochastic models. Although these methods have demonstrated reliable posterior estimation when the simulator accurately represents the underlying data generative process (DGP), recent work has shown that they perform poorly in the presence of model misspecification. This poses a significant issue for their use in real-world problems, due to simulators always misrepresenting the true DGP to a certain degree. In this paper, we introduce robust variational neural posterior estimation (RVNP), a method which addresses the problem of misspecification in amortised SBI by bridging the simulation-to-reality gap using variational inference and error modelling. We test RVNP on multiple benchmark tasks, including using real data from astronomy, and show that it can recover robust posterior inference in a data-driven manner without adopting hyperparameters or priors governing the misspecification influence.

中文标题/摘要

标题：使用变分方法的鲁棒自编码模拟基于推理

神经密度估计的最新进展使强大的模拟基于推理（SBI）方法得以实现，这些方法可以灵活地近似不可解随机模型的贝叶斯推理。尽管这些方法在模拟器准确表示底层数据生成过程（DGP）时能够可靠地估计后验，但最近的研究表明，它们在模型失配的情况下表现不佳。这在实际问题中是一个重大问题，因为模拟器总是以某种程度偏离真正的DGP。在本文中，我们引入了鲁棒变分神经后验估计（RVNP）方法，该方法通过使用变分推理和误差建模来弥合模拟到现实的差距，从而解决自编码SBI中的模型失配问题。我们对多个基准任务进行了测试，包括使用来自天文学的真实数据，并展示了它可以在数据驱动的方式下进行鲁棒的后验推理，而无需采用控制失配影响的超参数或先验。

Summary / 总结

This paper addresses the issue of model misspecification in simulation-based inference (SBI) methods by introducing robust variational neural posterior estimation (RVNP). RVNP uses variational inference and error modelling to bridge the gap between simulators and real-world data, enabling robust posterior inference even when simulators are not perfectly specified. Experiments on benchmark tasks, including real astronomical data, demonstrate that RVNP can provide reliable posterior estimates without requiring hyperparameters or priors related to misspecification.

本文通过引入稳健的变分神经后验估计（RVNP）方法，解决了模拟基于推理（SBI）方法中的模型错配问题。RVNP 使用变分推理和误差建模来弥合模拟器与真实数据之间的差距，即使模拟器不完全准确，也能提供稳健的后验估计。实验结果表明，RVNP 可以在不需要与错配相关的超参数或先验的情况下，提供可靠的后验估计，包括使用实际天文数据的任务。

Efficient Exploration of Chemical Kinetics

Authors: Rohit Goswami

First: 2025-10-24T11:52:08+00:00 · Latest: 2025-12-12T17:13:42+00:00

Comments: Doctoral dissertation, 121 pages, ISBN: 978-9935-9826-5-0. By design, all text and figures within this thesis are original and do not appear in the associated papers

Abs · PDF · Code1 · Code2

Abstract

Estimating reaction rates and chemical stability is fundamental, yet efficient methods for large-scale simulations remain out of reach despite advances in modeling and exascale computing. Direct simulation is limited by short timescales; machine-learned potentials require large data sets and struggle with transition state regions essential for reaction rates. Reaction network exploration with sufficient accuracy is hampered by the computational cost of electronic structure calculations, and even simplifications like harmonic transition state theory rely on prohibitively expensive saddle point searches. Surrogate model-based acceleration has been promising but hampered by overhead and numerical instability. This dissertation presents a holistic solution, co-designing physical representations, statistical models, and systems architecture in the Optimal Transport Gaussian Process (OT-GP) framework. Using physics-aware optimal transport metrics, OT-GP creates compact, chemically relevant surrogates of the potential energy surface, underpinned by statistically robust sampling. Alongside EON software rewrites for long timescale simulations, we introduce reinforcement learning approaches for both minimum-mode following (when the final state is unknown) and nudged elastic band methods (when endpoints are specified). Collectively, these advances establish a representation-first, modular approach to chemical kinetics simulation. Large-scale benchmarks and Bayesian hierarchical validation demonstrate state-of-the-art performance and practical exploration of chemical kinetics, transforming a longstanding theoretical promise into a working engine for discovery.

中文标题/摘要

标题：化学动力学高效探索

估计反应速率和化学稳定性是基础的，尽管在建模和exascale计算方面取得了进展，但大规模模拟的有效方法仍然难以实现。直接模拟受限于短时间尺度；机器学习势能需要大量数据集，并且难以处理反应速率所需的过渡态区域。使用充分准确的反应网络探索受到电子结构计算成本高昂的阻碍，即使简化如谐振子过渡态理论也依赖于难以承受的鞍点搜索。基于代理模型的加速很有前景，但受限于开销和数值不稳定性。本论文提出了一种整体解决方案，在Optimal Transport Gaussian Process (OT-GP)框架中协同设计物理表示、统计模型和系统架构。利用物理感知最优传输度量，OT-GP 创建了紧凑且化学相关的势能面代理模型，基于统计稳健采样。同时，我们对EON软件进行了重写，以支持长时间尺度模拟，并引入了强化学习方法，用于最小模式追踪（当最终状态未知时）和受力弹性带方法（当端点已知时）。这些进步共同确立了一种以表示为中心、模块化的化学动力学模拟方法。大规模基准测试和贝叶斯层次验证展示了最先进的性能和化学动力学的实际探索，将长期存在的理论承诺转化为发现工作的引擎。

Summary / 总结

This dissertation addresses the challenge of efficiently estimating reaction rates and chemical stability through the development of the Optimal Transport Gaussian Process (OT-GP) framework. It combines physics-aware optimal transport metrics with statistically robust sampling to create compact surrogates of the potential energy surface. Reinforcement learning is employed for minimum-mode following and nudged elastic band methods, enabling long timescale simulations and practical exploration of chemical kinetics. Experimental results show state-of-the-art performance and practical applicability, transforming theoretical promises into a viable tool for chemical kinetics discovery.

本论文通过开发Optimal Transport Gaussian Process (OT-GP)框架来解决高效估计反应速率和化学稳定性的问题。该框架结合了物理感知的最优传输度量和统计稳健的采样，以创建潜在能量表面的紧凑近似。引入了强化学习方法用于最小模式跟踪和受力弹性带方法，从而实现化学动力学的准确和高效探索。大规模基准测试和贝叶斯层次验证显示了最先进的性能和实际应用性。

SOF: Sorted Opacity Fields for Fast Unbounded Surface Reconstruction

Authors: Lukas Radl, Felix Windisch, Thomas Deixelberger, Jozef Hladky, Michael Steiner, Dieter Schmalstieg, Markus Steinberger

Venue: SIGGRAPH Asia 2025

First: 2025-06-23T21:20:52+00:00 · Latest: 2025-12-12T17:12:11+00:00

Comments: SIGGRAPH Asia 2025; Project Page: https://r4dl.github.io/SOF/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Recent advances in 3D Gaussian representations have significantly improved the quality and efficiency of image-based scene reconstruction. Their explicit nature facilitates real-time rendering and fast optimization, yet extracting accurate surfaces - particularly in large-scale, unbounded environments - remains a difficult task. Many existing methods rely on approximate depth estimates and global sorting heuristics, which can introduce artifacts and limit the fidelity of the reconstructed mesh. In this paper, we present Sorted Opacity Fields (SOF), a method designed to recover detailed surfaces from 3D Gaussians with both speed and precision. Our approach improves upon prior work by introducing hierarchical resorting and a robust formulation of Gaussian depth, which better aligns with the level-set. To enhance mesh quality, we incorporate a level-set regularizer operating on the opacity field and introduce losses that encourage geometrically-consistent primitive shapes. In addition, we develop a parallelized Marching Tetrahedra algorithm tailored to our opacity formulation, reducing meshing time by up to an order of magnitude. As demonstrated by our quantitative evaluation, SOF achieves higher reconstruction accuracy while cutting total processing time by more than a factor of three. These results mark a step forward in turning efficient Gaussian-based rendering into equally efficient geometry extraction.

中文标题/摘要

标题：SOF：排序透明度字段以实现快速无界表面重建

近年来，3D 高斯表示的进展显著提高了基于图像场景重建的质量和效率。它们的显式性质便于实时渲染和快速优化，但提取准确的表面——特别是在大规模、无界环境中——仍然是一个难题。许多现有方法依赖于近似深度估计和全局排序启发式，这可能会引入伪影并限制重建网格的保真度。在本文中，我们提出了排序透明度字段（SOF），这是一种旨在从3D高斯中恢复详细表面的方法，兼具速度和精度。我们的方法通过引入分层重新排序和高斯深度的稳健公式改进了先前的工作，这更好地与水平集对齐。为了提高网格质量，我们在透明度字段上引入了水平集正则化，并引入了鼓励几何一致的原始形状的损失。此外，我们开发了一种针对我们透明度公式进行并行化的Marching Tetrahedra算法，将网格生成时间减少了十倍。正如我们的定量评估所显示的，SOF在提高重建精度的同时，将总处理时间缩短了三倍以上。这些结果标志着将高效的高斯渲染转换为同样高效的几何提取迈出了一步。

Summary / 总结

The research aims to improve the accuracy and efficiency of surface reconstruction from 3D Gaussian representations, particularly in large-scale environments. The Sorted Opacity Fields (SOF) method introduces hierarchical resorting and a robust Gaussian depth formulation to better align with the level-set, and includes a level-set regularizer and geometrically-consistent losses to enhance mesh quality. The method also features a parallelized Marching Tetrahedra algorithm, reducing meshing time significantly. Experimental results show that SOF achieves higher reconstruction accuracy while processing time is reduced by more than a factor of three.

SOF 是一种从 3D 高斯分布快速准确地重建未界表面的方法。它通过层次重新排序和稳健的高斯深度公式来提高准确性，并结合了水平集正则化和几何一致性的损失来提升网格质量。该方法还包含一个并行化的四面体行进算法，显著减少了网格生成时间。实验结果表明，SOF 达到了更高的重建精度，并将处理时间减少了超过 300%。

ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

Authors: Yuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste

First: 2025-12-12T17:07:59+00:00 · Latest: 2025-12-12T17:07:59+00:00

Abs · PDF · Code1 · Code2

Abstract

Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras. However, the current practice of retraining a separate model for each camera suffers from high compute and communication costs, making it unscalable. We present ECCO, a new video analytics framework designed for resource-efficient continuous learning. The key insight is that the data drift, which necessitates model retraining, often shows temporal and spatial correlations across nearby cameras. By identifying cameras that experience similar drift and retraining a shared model for them, ECCO can substantially reduce the associated compute and communication costs. Specifically, ECCO introduces: (i) a lightweight grouping algorithm that dynamically forms and updates camera groups; (ii) a GPU allocator that dynamically assigns GPU resources across different groups to improve retraining accuracy and ensure fairness; and (iii) a transmission controller at each camera that configures frame sampling and coordinates bandwidth sharing with other cameras based on its assigned GPU resources. We conducted extensive evaluations on three distinctive datasets for two vision tasks. Compared to leading baselines, ECCO improves retraining accuracy by 6.7%-18.1% using the same compute and communication resources, or supports 3.3 times more concurrent cameras at the same accuracy.

中文标题/摘要

标题：ECCO：利用跨摄像头相关性进行高效的实时视频连续学习

近期在视频分析方面的进展通过持续重新训练针对每个摄像头的专用轻量级DNN模型来应对实时数据漂移。然而，为每个摄像头重新训练单独模型的做法会带来高昂的计算和通信成本，使其难以扩展。我们提出了ECCO，这是一种新的视频分析框架，旨在实现资源高效的连续学习。关键洞察是，需要重新训练的数据显示出跨附近摄像头的时序和空间相关性。通过识别经历相似漂移的摄像头并为它们共享一个模型，ECCO可以显著降低相关的计算和通信成本。具体来说，ECCO引入了：(i) 一种轻量级的分组算法，动态形成和更新摄像头组；(ii) 一种GPU分配器，动态分配不同组的GPU资源以提高重新训练的准确性和确保公平性；以及(iii) 每个摄像头的传输控制器，根据分配的GPU资源配置帧采样并与其他摄像头协调带宽共享。我们在两个视觉任务的三个不同数据集上进行了广泛的评估。与领先基准相比，ECCO在相同的计算和通信资源下提高了6.7%-18.1%的重新训练准确性，或者在相同准确性的前提下支持3.3倍更多的并发摄像头。

Summary / 总结

ECCO leverages cross-camera correlations to reduce the costs of continuous learning for live video analytics. By grouping nearby cameras with similar data drift and retraining a shared model, ECCO significantly cuts compute and communication expenses. The framework includes a dynamic grouping algorithm, GPU allocator, and transmission controller, which together enhance retraining accuracy and resource fairness. Evaluations show ECCO improves retraining accuracy by 6.7%-18.1% or supports 3.3 times more cameras compared to existing methods.

ECCO 利用跨摄像头的相关性来降低实时视频分析中重新训练轻量级 DNN 模型的成本。通过将经历相似数据漂移的附近摄像头分组并共享模型进行重新训练，ECCO 显著减少了计算和通信成本。评估结果显示，与领先基线相比，ECCO 在相同计算和通信资源下提高了 6.7%-18.1% 的重新训练准确性，或在相同准确性的支持下最多可处理 3.3 倍的并发摄像头。

Weak-to-Strong Generalization Enables Fully Automated De Novo Training of Multi-head Mask-RCNN Model for Segmenting Densely Overlapping Cell Nuclei in Multiplex Whole-slice Brain Images

Authors: Lin Bai, Xiaoyang Li, Liqiang Huang, Quynh Nguyen, Hien Van Nguyen, Saurabh Prasad, Dragan Maric, John Redell, Pramod Dash, Badrinath Roysam

First: 2025-12-12T17:02:01+00:00 · Latest: 2025-12-12T17:02:01+00:00

Abs · PDF · Code1 · Code2

Abstract

We present a weak to strong generalization methodology for fully automated training of a multi-head extension of the Mask-RCNN method with efficient channel attention for reliable segmentation of overlapping cell nuclei in multiplex cyclic immunofluorescent (IF) whole-slide images (WSI), and present evidence for pseudo-label correction and coverage expansion, the key phenomena underlying weak to strong generalization. This method can learn to segment de novo a new class of images from a new instrument and/or a new imaging protocol without the need for human annotations. We also present metrics for automated self-diagnosis of segmentation quality in production environments, where human visual proofreading of massive WSI images is unaffordable. Our method was benchmarked against five current widely used methods and showed a significant improvement. The code, sample WSI images, and high-resolution segmentation results are provided in open form for community adoption and adaptation.

中文标题/摘要

标题：从弱到强的泛化能力使多头Mask-RCNN模型能够实现全自动从头训练以分割多重整切脑图像中紧密重叠的细胞核

我们提出了一种从弱到强的泛化方法，用于实现对多重循环免疫荧光（IF）全切片图像（WSI）中重叠细胞核的可靠分割的全自动多头Mask-RCNN方法训练，并展示了伪标签修正和覆盖扩展的关键现象，这些现象是弱到强泛化的基础。该方法可以在无需人工注释的情况下，从新的仪器和/或新的成像协议中学习分割新的图像类别。我们还提出了在生产环境中自动诊断分割质量的指标，其中人工视觉校对大规模WSI图像是不可行的。我们的方法与五种当前广泛使用的算法进行了基准测试，并显示出显著的改进。我们以开源形式提供了代码、样本WSI图像和高分辨率分割结果，供社区采用和适应。

Summary / 总结

The research aims to develop a fully automated method for segmenting densely overlapping cell nuclei in multiplex whole-slide images using a multi-head Mask-RCNN model with weak-to-strong generalization. The method involves training the model on a small set of weakly labeled data and then refining it with strongly labeled data. Key findings include significant improvements over five current methods and the ability to segment new classes of images without human annotations. The method also includes metrics for automated self-diagnosis of segmentation quality in production environments.

研究旨在开发一种方法，用于自动训练多头Mask-RCNN模型以分割多谱系免疫荧光全切片图像中的紧密重叠的细胞核。该方法采用从弱到强的泛化策略，模型从较少标注的数据中学习以提高其在更具挑战性任务上的性能。实验结果表明，所提出的方法显著优于五种现有方法，并且可以在无需人工注释的情况下分割新的图像类别。该方法还包含用于生产环境中的自动自我诊断分割质量的指标。

Methodological Precedence in Health Tech: Why ML/Big Data Analysis Must Follow Basic Epidemiological Consistency. A Case Study

Authors: Marco Roccetti

First: 2025-11-10T16:12:57+00:00 · Latest: 2025-12-12T16:59:47+00:00

Comments: 6 Tables; ML/Big data paper on medical data

Abs · PDF · Code1 · Code2

Abstract

The integration of advanced analytical tools, including Machine Learning (ML) and massive data processing, has revolutionized health research, promising unprecedented accuracy in diagnosis and risk prediction. However, the rigor of these complex methods is fundamentally dependent on the quality and integrity of the underlying datasets and the validity of their statistical design. We propose an emblematic case where advanced analysis (ML/Big Data) must necessarily be subsequent to the verification of basic methodological coherence and adherence to established medical protocols, such as the STROBE Statement. This study highlights a crucial cautionary principle: sophisticated analyses amplify, rather than correct, severe methodological flaws rooted in basic design choices, leading to misleading or contradictory findings. By applying simple, standard descriptive statistical methods and established national epidemiological benchmarks to a recently published cohort study on COVID-19 vaccine outcomes and severe adverse events, like cancer, we expose multiple, statistically irreconcilable paradoxes. These paradoxes, specifically the contradictory finding of an increased cancer incidence within an exposure subgroup, concurrent with a suppressed overall Crude Incidence Rate compared to national standards, definitively invalidate the reported risk of increased cancer in the total population. We demonstrate that the observed effects are mathematical artifacts stemming from an uncorrected selection bias in the cohort construction. This analysis serves as a robust reminder that even the most complex health studies must first pass the test of basic epidemiological consistency before any conclusion drawn from subsequent advanced statistical modeling can be considered valid or publishable.

中文标题/摘要

标题：健康科技中的方法学优先级：为什么必须在基本流行病学一致性之后进行机器学习/大数据分析。案例研究

高级分析工具，包括机器学习（ML）和大数据处理，已彻底改变了健康研究，为诊断和风险预测提供了前所未有的准确性。然而，这些复杂方法的严谨性从根本上依赖于底层数据集的质量和统计设计的有效性。我们提出一个典型案例，其中先进的分析（ML/大数据）必须在验证基本方法学的一致性和遵守既定医疗协议，如STROBE声明之后进行。本研究强调了一个重要的警示原则：复杂的分析放大了，而不是纠正了基本设计选择中根深蒂固的方法学缺陷，导致误导性或矛盾的结果。通过应用简单的标准描述性统计方法和既定的国家流行病学基准，对最近发表的关于COVID-19疫苗效果和严重不良事件（如癌症）的队列研究进行分析，我们揭示了多个无法调和的统计悖论。这些悖论，特别是暴露子组中癌症发病率增加与总体粗发病率低于国家标准的矛盾发现，最终否定了总人群中癌症风险增加的报告。我们证明，观察到的效果是由于队列构建中的未纠正的选择偏差导致的数学伪影。此分析强调，即使是最复杂的健康研究，在任何结论从随后的高级统计建模得出之前，都必须首先通过基本流行病学一致性的考验。

Summary / 总结

This study emphasizes the necessity of adhering to basic epidemiological principles before applying advanced analytical tools like Machine Learning (ML) and big data analysis in health research. By applying simple descriptive statistics and national benchmarks to a COVID-19 vaccine study, the authors reveal statistically irreconcilable paradoxes, invalidating claims of increased cancer risk. The findings highlight that complex analyses can amplify, rather than correct, methodological flaws, underscoring the importance of rigorous initial design and data verification.

本文认为，如机器学习（ML）和大数据分析等高级分析方法应在基本流行病学一致性得到验证之后应用。通过一个新冠疫苗研究案例，揭示了统计上不可调和的悖论，这些悖论否定了报告的癌症风险增加。研究显示，复杂的分析方法可能会放大而非纠正方法论上的缺陷，强调在应用复杂的统计模型之前，必须遵循基本的设计选择和标准，如STROBE声明，以确保结论的有效性和可发表性。

Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale

Authors: Zhaodong Wang, Zhenting Qi, Sherman Wong, Nathan Hu, Samuel Lin, Jun Ge, Erwin Gao, Yining Yang, Ben Maurer, Wenlin Chen, David Recordon, Yilun Du, Minlan Yu, Ying Zhang

First: 2025-12-11T08:05:58+00:00 · Latest: 2025-12-12T16:59:12+00:00

Comments: Meta requires more thorough internal review process to ensure paper quality and experiments as well as compliance with the internal research publishing process

Abs · PDF · Code1 · Code2

Abstract

Real-world AI software engineering demands coding agents that can reason over massive repositories, maintain durable memory across and within long sessions, and robustly coordinate complex toolchains at test time. Existing open-source coding agents provide transparency but frequently fall short when pushed to these industrial-scale workloads, while proprietary coding agents offer strong practical performance but limited extensibility, interpretability, and controllability. We present the Confucius Code Agent (CCA), an open-sourced AI software engineer that can operate at an industrial scale. CCA is built atop the Confucius SDK, an open-sourced agent development platform designed around three complementary perspectives: Agent Experience (AX), User Experience (UX), and Developer Experience (DX). The SDK introduces a unified orchestrator with hierarchical working memory for long-context reasoning, a persistent note-taking system for cross-session continual learning, and a modular extension module for robust tool use. Moreover, a meta-agent automates the synthesis, evaluation, and refinement of agent configurations through a build-test-improve loop, enabling rapid agent development on new tasks, environments, and tool stacks. Instantiated on Confucius SDK with these mechanisms, CCA delivers strong performance on real-world software engineering tasks. On SWE-Bench-Pro, CCA achieves a state-of-the-art Resolve@1 performance of 54.3%, substantially improving over prior coding agents. Together, the Confucius SDK and CCA provide a transparent, extensible, and reproducible foundation for AI agents, bridge gaps between research prototypes and production-grade systems, and support agent development and deployment at industrial scale.

中文标题/摘要

标题：孔夫子代码代理：工业规模的开源AI软件工程师

现实中的AI软件工程需要能够对大规模代码库进行推理、在长时间会话内外保持持久记忆，并在测试时稳健地协调复杂工具链的编码代理。现有的开源编码代理提供了透明性，但在推向工业规模的工作负载时经常表现不佳，而专有的编码代理则提供了强大的实际性能，但有限的可扩展性、可解释性和可控性。我们介绍了孔夫子代码代理（CCA），这是一种可以在工业规模上运行的开源AI软件工程师。CCA 建立在孔夫子SDK之上，这是一个围绕代理体验（AX）、用户体验（UX）和开发体验（DX）三个互补视角构建的开源代理开发平台。SDK 引入了一个统一的协调器，具有分层工作记忆以进行长上下文推理、持久的笔记系统以实现跨会话持续学习，以及模块化的扩展模块以实现稳健的工具使用。此外，一个元代理通过构建-测试-改进循环自动化编码代理配置的合成、评估和改进，从而实现新任务、环境和工具堆栈上的快速代理开发。通过这些机制在孔夫子SDK上实现，CCA 在实际软件工程任务上表现出色。在SWE-Bench-Pro上，CCA 达到了54.3%的最先进的Resolve@1性能，显著优于之前的编码代理。孔夫子SDK和CCA 一起提供了一个透明、可扩展且可重复的基础架构，为AI代理提供了一个从研究原型到生产级系统的桥梁，并支持工业规模的代理开发和部署。

Summary / 总结

The Confucius Code Agent (CCA) is an open-sourced AI software engineer designed to handle industrial-scale coding tasks. It leverages the Confucius SDK, which includes a unified orchestrator, hierarchical working memory, persistent note-taking, and a modular extension module. CCA demonstrates strong performance on real-world software engineering tasks, achieving a state-of-the-art Resolve@1 performance of 54.3% on SWE-Bench-Pro, surpassing previous coding agents. The CCA and Confucius SDK together offer a transparent, extensible, and reproducible foundation for AI agents, facilitating their development and deployment at industrial scale.

Confucius Code Agent (CCA) 是一个开源的 AI 软件工程师，旨在处理工业规模的编码任务。它利用了 Confucius SDK，该平台包括统一的协调器、层次化的工作记忆、持久的笔记系统和模块化的扩展模块。CCA 在实际软件工程任务中表现出色，其在 SWE-Bench-Pro 上的 Resolve@1 性能达到 54.3%，超越了之前的编码代理。CCA 和 Confucius SDK 一起提供了一个透明、可扩展且可重复的基础，支持 AI 代理的大规模开发和部署。

Reframing Music-Driven 2D Dance Pose Generation as Multi-Channel Image Generation

Authors: Yan Zhang, Han Zou, Lincong Feng, Cong Xie, Ruiqi Yu, Zhenpeng Zhan

First: 2025-12-12T16:57:46+00:00 · Latest: 2025-12-12T16:57:46+00:00

Abs · PDF · Code1 · Code2 · Project1

Abstract

Recent pose-to-video models can translate 2D pose sequences into photorealistic, identity-preserving dance videos, so the key challenge is to generate temporally coherent, rhythm-aligned 2D poses from music, especially under complex, high-variance in-the-wild distributions. We address this by reframing music-to-dance generation as a music-token-conditioned multi-channel image synthesis problem: 2D pose sequences are encoded as one-hot images, compressed by a pretrained image VAE, and modeled with a DiT-style backbone, allowing us to inherit architectural and training advances from modern text-to-image models and better capture high-variance 2D pose distributions. On top of this formulation, we introduce (i) a time-shared temporal indexing scheme that explicitly synchronizes music tokens and pose latents over time and (ii) a reference-pose conditioning strategy that preserves subject-specific body proportions and on-screen scale while enabling long-horizon segment-and-stitch generation. Experiments on a large in-the-wild 2D dance corpus and the calibrated AIST++2D benchmark show consistent improvements over representative music-to-dance methods in pose- and video-space metrics and human preference, and ablations validate the contributions of the representation, temporal indexing, and reference conditioning. See supplementary videos at https://hot-dance.github.io

中文标题/摘要

标题：将音乐驱动的2D舞蹈姿态生成重新构想为多通道图像生成

最近的姿势到视频模型可以将2D姿态序列转化为具有保真度的、身份保留的舞蹈视频，因此关键挑战是从音乐中生成时间上连贯、节奏对齐的2D姿态，尤其是在复杂、高变异性的真实世界分布下。我们通过将音乐到舞蹈生成重新构想为音乐标记条件下的多通道图像合成问题来解决这一问题：2D姿态序列被编码为一热图像，通过预训练的图像VAE压缩，并使用DiT风格的骨干模型进行建模，使我们能够从现代文本到图像模型中继承架构和训练方面的进步，并更好地捕捉2D姿态的高变异性分布。在此基础上，我们引入了(i)一种时间共享的时间索引方案，明确地在时间上同步音乐标记和姿态潜变量，以及(ii)一种参考姿态条件策略，该策略保留了特定主体的身体比例和屏幕上的比例，同时允许长时段的片段和缝合生成。在大型真实世界2D舞蹈语料库和校准的AIST++2D基准测试上进行的实验显示，在姿态和视频空间度量以及人类偏好方面，与代表性的音乐到舞蹈方法相比，具有一致的改进，并且消融实验验证了表示、时间索引和参考条件的贡献。请参见补充视频：https://hot-dance.github.io

Summary / 总结

This research aims to generate temporally coherent and rhythm-aligned 2D dance poses from music, addressing the challenge of high-variance in-the-wild distributions. The method reframes the problem as a multi-channel image synthesis task, using a pretrained image VAE and a DiT-style backbone to model 2D pose sequences encoded as one-hot images. Key contributions include a time-shared temporal indexing scheme and a reference-pose conditioning strategy, which improve pose and video metrics and human preference. Experiments on large in-the-wild and benchmark datasets show consistent improvements over existing methods. Ablations validate the effectiveness of the proposed techniques.

该研究旨在从音乐生成时空上一致且节奏对齐的2D舞蹈姿态，解决野外高变异性分布的挑战。方法将问题重新定义为多通道图像合成任务，使用预训练的图像VAE和DiT风格的骨干网络来建模编码为一热图像的2D姿态序列。关键贡献包括时间共享的时间索引方案和参考姿态条件策略，这些策略在姿态和视频指标以及人类偏好方面都显示出一致的改进。实验在大型野外数据集和基准数据集上验证了方法的有效性，并通过消融实验验证了所提技术的有效性。

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Authors: Wei Chow, Linfeng Li, Lingdong Kong, Zefeng Li, Qi Xu, Hang Song, Tian Ye, Xian Wang, Jinbin Bai, Shilin Xu, Xiangtai Li, Junting Pan, Shaoteng Liu, Ran Zhou, Tianshu Yang, Songhua Liu

First: 2025-12-12T16:51:19+00:00 · Latest: 2025-12-12T16:51:19+00:00

Abs · PDF · Code1 · Code2

Abstract

Recent advances in diffusion models (DMs) have achieved exceptional visual quality in image editing tasks. However, the global denoising dynamics of DMs inherently conflate local editing targets with the full-image context, leading to unintended modifications in non-target regions. In this paper, we shift our attention beyond DMs and turn to Masked Generative Transformers (MGTs) as an alternative approach to tackle this challenge. By predicting multiple masked tokens rather than holistic refinement, MGTs exhibit a localized decoding paradigm that endows them with the inherent capacity to explicitly preserve non-relevant regions during the editing process. Building upon this insight, we introduce the first MGT-based image editing framework, termed EditMGT. We first demonstrate that MGT's cross-attention maps provide informative localization signals for localizing edit-relevant regions and devise a multi-layer attention consolidation scheme that refines these maps to achieve fine-grained and precise localization. On top of these adaptive localization results, we introduce region-hold sampling, which restricts token flipping within low-attention areas to suppress spurious edits, thereby confining modifications to the intended target regions and preserving the integrity of surrounding non-target areas. To train EditMGT, we construct CrispEdit-2M, a high-resolution dataset spanning seven diverse editing categories. Without introducing additional parameters, we adapt a pre-trained text-to-image MGT into an image editing model through attention injection. Extensive experiments across four standard benchmarks demonstrate that, with fewer than 1B parameters, our model achieves similarity performance while enabling 6 times faster editing. Moreover, it delivers comparable or superior editing quality, with improvements of 3.6% and 17.6% on style change and style transfer tasks, respectively.

中文标题/摘要

标题：EditMGT: 在图像编辑中释放掩码生成变换器的潜力

最近在扩散模型（DMs）方面的进展在图像编辑任务中实现了卓越的视觉质量。然而，DMs的全局去噪动态会将局部编辑目标与全图上下文混淆，导致非目标区域出现意外修改。本文中，我们超越了DMs，转向掩码生成变换器（MGTs）作为解决这一挑战的替代方法。通过预测多个掩码标记而不是整体细化，MGTs表现出局部解码范式，赋予它们在编辑过程中显式保留非相关区域的内在能力。基于这一洞察，我们提出了第一个基于MGT的图像编辑框架，称为EditMGT。我们首先证明MGT的交叉注意力图提供了局部编辑相关区域的有用定位信号，并设计了一种多层注意力整合方案，以细化这些图以实现精细和精确的定位。在这些自适应定位结果之上，我们引入了区域保持采样，该方法限制低注意力区域内的标记翻转，以抑制虚假编辑，从而将修改限制在目标区域，并保持周围非目标区域的完整性。为了训练EditMGT，我们构建了CrispEdit-2M，这是一个跨越七个不同编辑类别的高分辨率数据集。通过注意力注入将预训练的文本到图像MGT适应为图像编辑模型，而无需引入额外参数。在四个标准基准上的广泛实验表明，尽管参数少于1亿，我们的模型在相似性方面表现出色，同时使编辑速度提高了6倍。此外，它在风格变化和风格迁移任务中提供了可比或更优的编辑质量，分别提高了3.6%和17.6%。

Summary / 总结

The paper addresses the issue of unintended modifications in non-target regions during image editing tasks using diffusion models (DMs). It introduces EditMGT, an MGT-based framework that predicts multiple masked tokens to achieve localized decoding, thereby preserving non-relevant regions. The model uses a multi-layer attention consolidation scheme and region-hold sampling to refine localization and suppress spurious edits, respectively. Experiments show that EditMGT, with fewer than 1B parameters, achieves similar performance to DMs while being 6 times faster and delivering comparable or superior editing quality.

本文通过提出基于Masked Generative Transformers (MGTs)的图像编辑框架EditMGT，解决了扩散模型在图像编辑中的局限性。MGTs通过预测多个掩码令牌，实现局部解码和非相关区域的显式保留。作者引入了多层注意力聚合方案和区域保持采样来细化定位并抑制意外编辑。实验表明，EditMGT在不到1B参数的情况下，实现了与扩散模型相似的性能，同时编辑速度提高了6倍，并在风格变化和转移任务中分别提高了3.6%和17.6%的编辑质量。

Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection

Authors: Zongxian Yang, Jiayu Qian, Zegao Peng, Haoyu Zhang, Yu-An Huang, KC Tan, Zhi-An Huang

First: 2025-06-11T14:58:38+00:00 · Latest: 2025-12-12T16:49:44+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Large reasoning models excel in domains like mathematics where intermediate reasoning is straightforward to verify, but struggle to self-correct in medicine fields where evaluating intermediate reasoning is cumbersome and expensive. This verification bottleneck hinders the development of reliable AI reasoners for high-stakes application. Here we propose Med-REFL, a novel framework that learns fine-grained reflection without human labels or model distillation. Med-REFL introduces a deterministic structural assessment of the reasoning space to automatically generate preference data for reflection. By globally evaluating all explored reasoning paths in a tree-of-thoughts, our method quantifies the value of corrective actions, enabling the automated construction of direct preference optimization pairs. This trains the model to recognize and amend its own reasoning fallacies. Extensive experiments show Med-REFL delivers robust gains across diverse models architectures and medical benchmarks, boosting a general-purpose Llama3.1-8B by +5.82% and the state-of-the-art Huatuo-o1 by +4.13% on the MedQA benchmark. Our Med-REFL-8B achieves state-of-the-art performance among 7-8B models while even competing with models twice its size. Crucially, targeted ablations prove its success generalizes to other domains such as logical reasoning and mitigates the `fake reflection' phenomenon in LRMs. Ultimately, our framework provides a scalable solution to the verification bottleneck, paving the way for more reliable AI reasoners in high-stakes domains like medicine. Med-REFL has been made publicly available in https://github.com/TianYin123/Med-REFL.

中文标题/摘要

标题：Med-REFL：通过自我纠正的细粒度反思提升医学推理能力

大型推理模型在数学等领域表现出色，因为中间推理易于验证，但在医学领域却难以自我纠正，因为评估中间推理既繁琐又昂贵。这种验证瓶颈阻碍了可靠AI推理器在高风险应用中的发展。为此，我们提出了一种名为Med-REFL的新框架，该框架无需人工标签或模型蒸馏即可学习细粒度的反思。Med-REFL引入了一种确定性的结构评估方法，以自动生成反思的偏好数据。通过全局评估思维树中探索的所有推理路径，我们的方法量化了纠正行动的价值，从而实现直接偏好优化对的自动化构建。这使模型能够识别并修正自身的推理谬误。广泛实验表明，Med-REFL在多种模型架构和医学基准测试中均表现出稳健的提升，分别将通用Llama3.1-8B和最先进的Huatuo-o1在MedQA基准测试上的性能提升了5.82%和4.13%。我们的Med-REFL-8B在7-8B模型中达到最先进的性能，甚至能与两倍大小的模型竞争。关键的是，有针对性的消融实验表明，其成功可以推广到其他领域如逻辑推理，并减轻LRMs中的“假反思”现象。最终，我们的框架提供了一种可扩展的解决方案，以克服验证瓶颈，为医学等高风险领域中的更可靠AI推理器铺平道路。Med-REFL已在https://github.com/TianYin123/Med-REFL/公开发布。

Summary / 总结

Med-REFL is a framework designed to enhance medical reasoning in AI models by enabling self-correction through fine-grained reflection without human labels or model distillation. It evaluates all reasoning paths in a tree-of-thoughts to generate preference data, allowing the model to recognize and correct its own reasoning errors. Experiments show Med-REFL improves performance across various model architectures and medical benchmarks, with notable gains of +5.82% and +4.13% on Llama3.1-8B and Huatuo-o1 respectively. The framework also generalizes to other domains and mitigates the 'fake reflection' issue, making it a scalable solution for high-stakes applications like medicine.

Med-REFL 是一种框架，旨在通过细粒度反思增强 AI 模型在医学领域的推理能力，无需人工标签或模型蒸馏。它通过评估思维树中的所有推理路径来生成偏好数据，使模型能够识别并纠正自身的推理错误。实验显示，Med-REFL 在各种模型架构和医学基准测试中表现出色，分别提高了 Llama3.1-8B 和 Huatuo-o1 的性能 5.82% 和 4.13%。该框架还能应用于其他领域，并解决了‘假反思’问题，使其成为高风险应用如医学领域中的一种可扩展解决方案。

Text2Graph: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios

Authors: João Lucas Luz Lima Sarcinelli, Ricardo Marcondes Marcacini

First: 2025-12-10T20:31:30+00:00 · Latest: 2025-12-12T16:45:54+00:00

Abs · PDF · Code1 · Code2

Abstract

Large Language Models (LLMs) have become effective zero-shot classifiers, but their high computational requirements and environmental costs limit their practicality for large-scale annotation in high-performance computing (HPC) environments. To support more sustainable workflows, we present Text2Graph, an open-source Python package that provides a modular implementation of existing text-to-graph classification approaches. The framework enables users to combine LLM-based partial annotation with Graph Neural Network (GNN) label propagation in a flexible manner, making it straightforward to swap components such as feature extractors, edge construction methods, and sampling strategies. We benchmark Text2Graph on a zero-shot setting using five datasets spanning topic classification and sentiment analysis tasks, comparing multiple variants against other zero-shot approaches for text classification. In addition to reporting performance, we provide detailed estimates of energy consumption and carbon emissions, showing that graph-based propagation achieves competitive results at a fraction of the energy and environmental cost.

中文标题/摘要

标题：Text2Graph：结合轻量级LLM和GNN的高效文本分类方法

大型语言模型（LLMs）已成为有效的零样本分类器，但其高计算需求和环境成本限制了其在高性能计算（HPC）环境中的大规模注释实用性。为支持更可持续的工作流程，我们提出了Text2Graph，这是一个开源的Python包，提供了现有文本到图分类方法的模块化实现。该框架使用户能够以灵活的方式结合基于LLM的部分注释与图神经网络（GNN）标签传播，使得可以方便地更换特征提取器、边构建方法和采样策略等组件。我们在五个涵盖主题分类和情感分析任务的数据集上对Text2Graph进行了零样本设置下的基准测试，将多种变体与其他文本分类的零样本方法进行了比较。除了报告性能外，我们还提供了详细的能源消耗和碳排放估计，表明基于图的传播在极低的能源和环境成本下实现了具有竞争力的结果。

Summary / 总结

The research aims to address the high computational demands of Large Language Models (LLMs) for text classification in label-scarce scenarios. Text2Graph, an open-source Python package, combines LLMs and Graph Neural Networks (GNNs) for efficient label propagation. The framework allows users to customize components like feature extractors and sampling strategies. Experiments on five datasets show that Text2Graph achieves competitive performance with significantly lower energy consumption and carbon emissions compared to other zero-shot approaches.

研究旨在解决大型语言模型（LLMs）在标签稀缺场景下进行文本分类时的高计算需求。Text2Graph 是一个开源的 Python 包，结合了 LLMs 和图神经网络（GNNs）进行高效的标签传播。该框架允许用户自定义特征提取器和采样策略等组件。实验结果显示，Text2Graph 在五个数据集上的性能与传统方法相当，但能耗和碳排放却大幅降低。

Geometry-Informed Neural Operator Transformer

Authors: Qibang Liu, Weiheng Zhong, Hadi Meidani, Diab Abueidda, Seid Koric, Philippe Geubelle

First: 2025-04-28T03:39:27+00:00 · Latest: 2025-12-12T16:45:00+00:00

Abs · PDF · Code1 · Code2

Abstract

Machine-learning-based surrogate models offer significant computational efficiency and faster simulations compared to traditional numerical methods, especially for problems requiring repeated evaluations of partial differential equations. This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions on arbitrary geometries. GINOT employs a sampling and grouping strategy together with an attention mechanism to encode surface point clouds that are unordered, exhibit non-uniform point densities, and contain varying numbers of points for different geometries. The geometry information is seamlessly integrated with query points in the solution decoder through the attention mechanism. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.

中文标题/摘要

标题：几何导向的神经算子变换器

基于机器学习的代理模型在计算效率和更快的模拟方面比传统数值方法具有显著优势，尤其是在需要多次评估偏微分方程的问题中。本文介绍了几何导向的神经算子变换器（GINOT），它将变换器架构与神经算子框架相结合，以在任意几何形状上进行前向预测。GINOT 使用采样和分组策略以及注意力机制来编码无序、非均匀点密度且不同几何形状包含不同数量点的表面点云。几何信息通过注意力机制无缝地与解码器中的查询点结合。GINOT 的性能在多个具有挑战性的数据集上得到验证，展示了其在复杂和任意 2D 和 3D 几何形状上的高准确性和强大的泛化能力。

Summary / 总结

The research aims to enhance the computational efficiency of solving partial differential equations by developing the Geometry-Informed Neural Operator Transformer (GINOT), which combines transformer architecture with neural operator framework. GINOT uses a sampling and grouping strategy along with an attention mechanism to handle unordered and non-uniform point clouds from different geometries. Key experimental findings show that GINOT achieves high accuracy and strong generalization for complex 2D and 3D geometries, outperforming traditional methods in terms of simulation speed and accuracy.

研究旨在提高偏微分方程代理模型的计算效率和准确性，特别是对于涉及复杂几何的问题。几何导向的神经操作变换器（GINOT）将变压器架构与神经操作结合，以处理任意几何形状。GINOT 使用采样和分组策略以及注意力机制来编码无序和非均匀点云，并将几何信息整合到解码器中。在多个数据集上的实验展示了 GINOT 在复杂 2D 和 3D 几何形状上的高准确性和强泛化能力。

Efficient Action Counting with Dynamic Queries

Authors: Xiaoxuan Ma, Zishi Li, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang

First: 2024-03-03T15:43:11+00:00 · Latest: 2025-12-12T16:31:01+00:00

Comments: code: https://github.com/lizishi/DeTRC, proj page: https://shirleymaxx.github.io/DeTRC/

Abs · PDF · Code1 · Code2 · Code3 · Project1

Abstract

Temporal repetition counting aims to quantify the repeated action cycles within a video. The majority of existing methods rely on the similarity correlation matrix to characterize the repetitiveness of actions, but their scalability is hindered due to the quadratic computational complexity. In this work, we introduce a novel approach that employs an action query representation to localize repeated action cycles with linear computational complexity. Based on this representation, we further develop two key components to tackle the essential challenges of temporal repetition counting. Firstly, to facilitate open-set action counting, we propose the dynamic update scheme on action queries. Unlike static action queries, this approach dynamically embeds video features into action queries, offering a more flexible and generalizable representation. Secondly, to distinguish between actions of interest and background noise actions, we incorporate inter-query contrastive learning to regularize the video representations corresponding to different action queries. As a result, our method significantly outperforms previous works, particularly in terms of long video sequences, unseen actions, and actions at various speeds. On the challenging RepCountA benchmark, we outperform the state-of-the-art method TransRAC by 26.5% in OBO accuracy, with a 22.7% mean error decrease and 94.1% computational burden reduction. Code is available at https://github.com/lizishi/DeTRC.

中文标题/摘要

标题：使用动态查询的高效动作计数

时间重复计数旨在量化视频中的重复动作周期。现有方法大多依赖于相似性相关矩阵来表征动作的重复性，但由于计算复杂度呈二次方，其扩展性受到限制。在本工作中，我们提出了一种新颖的方法，利用动作查询表示来以线性计算复杂度定位重复动作周期。基于此表示，我们进一步开发了两个关键组件来解决时间重复计数的基本挑战。首先，为了促进开放集动作计数，我们提出了动作查询的动态更新方案。与静态动作查询不同，这种方法动态地将视频特征嵌入到动作查询中，提供了一种更灵活和通用的表示。其次，为了区分感兴趣的动作和背景噪声动作，我们引入了查询间的对比学习来正则化不同动作查询对应的视频表示。因此，我们的方法在长视频序列、未见过的动作和不同速度的动作方面显著优于先前的工作。在具有挑战性的RepCountA基准上，我们的方法在OBO准确性上比最先进的方法TransRAC高出26.5%，平均误差降低22.7%，计算负担减少94.1%。代码可在https://github.com/lizishi/DeTRC/ 获取。

Summary / 总结

This work addresses the challenge of temporal repetition counting by proposing a method that uses dynamic action queries to localize repeated action cycles with linear computational complexity, outperforming previous methods in terms of open-set action counting, distinguishing between actions of interest and background noise, and computational efficiency. On the RepCountA benchmark, the proposed method achieves a 26.5% improvement in OBO accuracy and a 94.1% reduction in computational burden compared to the state-of-the-art TransRAC method.

该研究通过提出使用动态动作查询的方法，以线性计算复杂度定位重复动作周期，显著提升了开放集动作计数、区分动作与背景噪声以及处理长视频序列的能力。该方法在RepCountA基准测试上比TransRAC提高了26.5%的OBO准确率，并减少了94.1%的计算负担。

Particle Image Velocimetry Refinement via Consensus ADMM

Authors: Alan Bonomi, Francesco Banelli, Antonio Terpin

First: 2025-12-12T16:20:04+00:00 · Latest: 2025-12-12T16:20:04+00:00

Comments: Code: https://github.com/antonioterpin/flowgym

Abs · PDF · Code1 · Code2 · Code3

Abstract

Particle Image Velocimetry (PIV) is an imaging technique in experimental fluid dynamics that quantifies flow fields around bluff bodies by analyzing the displacement of neutrally buoyant tracer particles immersed in the fluid. Traditional PIV approaches typically depend on tuning parameters specific to the imaging setup, making the performance sensitive to variations in illumination, flow conditions, and seeding density. On the other hand, even state-of-the-art machine learning methods for flow quantification are fragile outside their training set. In our experiments, we observed that flow quantification would improve if different tunings (or algorithms) were applied to different regions of the same image pair. In this work, we parallelize the instantaneous flow quantification with multiple algorithms and adopt a consensus framework based on the alternating direction method of multipliers, seamlessly incorporating priors such as smoothness and incompressibility. We perform several numerical experiments to demonstrate the benefits of this approach. For instance, we achieve a decrease in end-point-error of up to 20% of a dense-inverse-search estimator at an inference rate of 60Hz, and we show how this performance boost can be increased further with outlier rejection. Our method is implemented in JAX, effectively exploiting hardware acceleration, and integrated in Flow Gym, enabling (i) reproducible comparisons with the state-of-the-art, (ii) testing different base algorithms, (iii) straightforward deployment for active fluids control applications.

中文标题/摘要

标题：基于共识ADMM的粒子图像 velocimetry 精炼

粒子图像 velocimetry (PIV) 是实验流体力学中的一种成像技术，通过分析流体中悬浮的中性浮力示踪颗粒的位移来量化绕流体的流场。传统的 PIV 方法通常依赖于特定于成像设置的调节参数，使得性能对照明、流条件和种子密度的变化敏感。另一方面，即使是最先进的机器学习方法在流量化方面的表现也容易在训练集外失效。在我们的实验中，我们观察到如果对同一图像对的不同区域应用不同的调节（或算法），流量化会有所改进。在这项工作中，我们并行化了瞬时流量化，并采用基于交替方向乘子法的共识框架，无缝地结合了平滑性和不可压缩性等先验知识。我们进行了多项数值实验以展示该方法的优势。例如，我们实现了端点误差最多减少 20% 的密集逆搜索估计器，并展示了通过异常值剔除进一步提高性能的方法。我们的方法在 JAX 中实现，有效利用了硬件加速，并集成在 Flow Gym 中，使 (i) 可重复与最先进的方法进行比较，(ii) 测试不同的基础算法，(iii) 为活性流体控制应用提供简便部署成为可能。

Summary / 总结

The research aims to improve the accuracy of flow quantification in Particle Image Velocimetry (PIV) by using a consensus framework based on the alternating direction method of multipliers (ADMM). Different algorithms are applied to different regions of the same image pair, and the results are combined using priors such as smoothness and incompressibility. Experiments show a 20% decrease in end-point-error compared to a dense-inverse-search estimator at a 60Hz inference rate, with further improvements possible through outlier rejection. The method is implemented in JAX and integrated into Flow Gym for reproducible comparisons and active fluids control applications.

研究旨在通过使用交替方向乘子法（ADMM）的共识框架，结合多种算法的结果来提高粒子图像 velocimetry (PIV) 中流场量化精度。该方法整合了平滑性和不可压缩性等先验知识，并在60Hz的推理速率下将端点误差降低了最多20%，并通过剔除异常值进一步提高性能。JAX实现支持硬件加速，并集成到Flow Gym中，以实现可重复的比较和主动流体控制应用。

Behaviour Policy Optimization: Provably Lower Variance Return Estimates for Off-Policy Reinforcement Learning

Authors: Alexander W. Goodall, Edwin Hamel-De le Court, Francesco Belardinelli

Venue: AAAI 2026

First: 2025-11-13T23:06:40+00:00 · Latest: 2025-12-12T16:18:48+00:00

Comments: Accepted at AAAI 2026 (main track)

Abs · PDF · Code1 · Code2

Abstract

Many reinforcement learning algorithms, particularly those that rely on return estimates for policy improvement, can suffer from poor sample efficiency and training instability due to high-variance return estimates. In this paper we leverage new results from off-policy evaluation; it has recently been shown that well-designed behaviour policies can be used to collect off-policy data for provably lower variance return estimates. This result is surprising as it means collecting data on-policy is not variance optimal. We extend this key insight to the online reinforcement learning setting, where both policy evaluation and improvement are interleaved to learn optimal policies. Off-policy RL has been well studied (e.g., IMPALA), with correct and truncated importance weighted samples for de-biasing and managing variance appropriately. Generally these approaches are concerned with reconciling data collected from multiple workers in parallel, while the policy is updated asynchronously, mismatch between the workers and policy is corrected in a mathematically sound way. Here we consider only one worker - the behaviour policy, which is used to collect data for policy improvement, with provably lower variance return estimates. In our experiments we extend two policy-gradient methods with this regime, demonstrating better sample efficiency and performance over a diverse set of environments.

中文标题/摘要

标题：行为策略优化：可证明降低方差的离策 Reinforcement Learning 回报估计

许多强化学习算法，尤其是依赖于回报估计进行策略改进的算法，可能会因高方差的回报估计而表现出较差的样本效率和训练不稳定性。在本文中，我们利用了离策评估的新成果；最近的研究表明，精心设计的行为策略可以用于收集离策数据，从而获得可证明方差较低的回报估计。这一结果令人惊讶，因为它意味着在策略上收集数据并不是方差最优的。我们将这一关键见解扩展到在线强化学习环境中，在该环境中，策略评估和改进交替进行以学习最优策略。离策RL已经被广泛研究（例如，IMPALA），正确和截断的重要性加权样本用于去偏差和适当管理方差。通常，这些方法关注的是在策略异步更新时，从多个并行工作的工人收集的数据的协调，而工人与策略之间的不匹配以数学上正确的方式进行修正。在这里，我们只考虑一个工人——行为策略，它用于收集用于策略改进的数据，具有可证明方差较低的回报估计。在我们的实验中，我们扩展了两种策略梯度方法，展示了在各种环境中的更好样本效率和性能。

Summary / 总结

This paper addresses the issue of high variance in return estimates in reinforcement learning, which can lead to poor sample efficiency and training instability. By leveraging new results from off-policy evaluation, the authors show that using well-designed behavior policies can provide lower variance return estimates. This insight is extended to the online reinforcement learning setting, where the behavior policy is used to collect data for policy improvement, leading to better sample efficiency and performance across various environments.

该论文解决了强化学习中高方差的回报估计问题，这可能导致样本效率低下和训练不稳定。通过利用来自离策略评估的新结果，作者表明使用精心设计的行为策略可以提供可证明更低方差的回报估计。他们将这一见解扩展到在线强化学习环境中，并证明两种策略梯度方法在这种模式下适应后，在多种环境中的样本效率和性能都更好。

Hierarchical Bayesian Model for Gene Deconvolution and Functional Analysis in Human Endometrium Across the Menstrual Cycle

Authors: Crystal Su, Kuai Yu, Mingyuan Shao, Daniel Bauer

First: 2025-10-31T01:48:25+00:00 · Latest: 2025-12-12T16:09:58+00:00

Comments: This paper is withdrawn due to issues with attribution and citation accuracy

Abs · PDF · Code1 · Code2

Abstract

Bulk tissue RNA sequencing of heterogeneous samples provides averaged gene expression profiles, obscuring cell type-specific dynamics. To address this, we present a probabilistic hierarchical Bayesian model that deconvolves bulk RNA-seq data into constituent cell-type expression profiles and proportions, leveraging a high-resolution single-cell reference. We apply our model to human endometrial tissue across the menstrual cycle, a context characterized by dramatic hormone-driven cellular composition changes. Our extended framework provides a principled inference of cell type proportions and cell-specific gene expression changes across cycle phases. We demonstrate the model's structure, priors, and inference strategy in detail, and we validate its performance with simulations and comparisons to existing methods. The results reveal dynamic shifts in epithelial, stromal, and immune cell fractions between menstrual phases, and identify cell-type-specific differential gene expression associated with endometrial function (e.g., decidualization markers in stromal cells during the secretory phase). We further conduct robustness tests and show that our Bayesian approach is resilient to reference mismatches and noise. Finally, we discuss the biological significance of our findings, potential clinical implications for fertility and endometrial disorders, and future directions, including integration of spatial transcriptomics.

中文标题/摘要

标题：层次贝叶斯模型在月经周期中人类子宫内膜基因去混和和功能分析

批量组织RNA测序提供了混杂样本的平均基因表达谱，掩盖了细胞类型特异性动态。为了解决这一问题，我们提出了一种概率层次贝叶斯模型，将批量RNA-seq数据分解为组成细胞类型表达谱及其比例，利用高分辨率单细胞参考。我们将该模型应用于月经周期中的人类子宫内膜组织，这一背景下细胞组成经历了显著的激素驱动变化。我们的扩展框架提供了跨周期阶段细胞类型比例和细胞特异性基因表达变化的原理性推断。我们详细阐述了模型的结构、先验和推断策略，并通过模拟和与现有方法的比较验证了其性能。结果揭示了月经周期各阶段上皮细胞、间质细胞和免疫细胞比例的动态变化，并识别了与子宫功能相关的细胞类型特异性差异基因表达（例如，在分泌期间质细胞中的蜕膜化标记物）。我们还进行了稳健性测试，表明我们的贝叶斯方法对参考不匹配和噪声具有鲁棒性。最后，我们讨论了我们发现的生物学意义、对生育和子宫疾病临床影响的潜在意义以及未来方向，包括整合空间转录组学。

Summary / 总结

The research aims to deconvolute bulk RNA-seq data from human endometrial tissue to reveal cell-type-specific gene expression dynamics across the menstrual cycle. The study employs a probabilistic hierarchical Bayesian model, which integrates a high-resolution single-cell reference to infer cell type proportions and gene expression changes. Key findings include dynamic shifts in epithelial, stromal, and immune cell fractions and identification of cell-type-specific differential gene expression associated with endometrial function, such as decidualization markers in stromal cells during the secretory phase.

研究旨在通过使用概率层次贝叶斯模型，从人类子宫组织的bulk RNA-seq数据中解析出细胞类型特异性的基因表达动态，特别是在月经周期中。该研究整合了一个高分辨率的单细胞参考来推断细胞类型的比例和基因表达变化。主要发现包括在月经周期中上皮细胞、间质细胞和免疫细胞比例的动态变化，以及识别出与子宫功能相关的细胞类型特异性差异基因表达，例如分泌期间质细胞中的蜕膜化标记物。

Stable spectral neural operator for learning stiff PDE systems from limited data

Authors: Rui Zhang, Han Wan, Yang Liu, Hao Sun

First: 2025-12-12T16:09:38+00:00 · Latest: 2025-12-12T16:09:38+00:00

Abs · PDF · Code1 · Code2

Abstract

Accurate modeling of spatiotemporal dynamics is crucial to understanding complex phenomena across science and engineering. However, this task faces a fundamental challenge when the governing equations are unknown and observational data are sparse. System stiffness, the coupling of multiple time-scales, further exacerbates this problem and hinders long-term prediction. Existing methods fall short: purely data-driven methods demand massive datasets, whereas physics-aware approaches are constrained by their reliance on known equations and fine-grained time steps. To overcome these limitations, we introduce an equation-free learning framework, namely, the Stable Spectral Neural Operator (SSNO), for modeling stiff partial differential equation (PDE) systems based on limited data. Instead of encoding specific equation terms, SSNO embeds spectrally inspired structures in its architecture, yielding strong inductive biases for learning the underlying physics. It automatically learns local and global spatial interactions in the frequency domain, while handling system stiffness with a robust integrating factor time-stepping scheme. Demonstrated across multiple 2D and 3D benchmarks in Cartesian and spherical geometries, SSNO achieves prediction errors one to two orders of magnitude lower than leading models. Crucially, it shows remarkable data efficiency, requiring only very few (2--5) training trajectories for robust generalization to out-of-distribution conditions. This work offers a robust and generalizable approach to learning stiff spatiotemporal dynamics from limited data without explicit \textit{a priori} knowledge of PDE terms.

中文标题/摘要

标题：基于有限数据学习刚性PDE系统的稳定光谱神经算子

准确建模时空动态对于理解自然科学和工程学中的复杂现象至关重要。然而，当支配方程未知且观测数据稀少时，这一任务面临根本性的挑战，进一步加剧了长期预测的难度。现有方法存在不足：纯数据驱动方法需要大量数据集，而物理感知方法则受限于其对已知方程和细粒度时间步长的依赖。为克服这些限制，我们提出了一种无方程学习框架，即稳定光谱神经算子（SSNO），用于基于有限数据建模刚性偏微分方程（PDE）系统。SSNO 不编码特定的方程项，而是将其架构嵌入光谱启发的结构，从而为学习潜在的物理规律提供强大的归纳偏置。它在频域中自动学习局部和全局的空间交互，并通过稳健的积分因子时间步长方案处理系统刚性。在多个2D和3D基准测试中，无论是在笛卡尔几何还是球形几何中，SSNO 的预测误差比领先模型低一个到两个数量级。最关键的是，它表现出显著的数据效率，仅需少量（2-5）训练轨迹即可实现对未见过分布条件的稳健泛化。这项工作提供了一种在没有显式先验知识的情况下从有限数据学习刚性时空动态的稳健且通用的方法。

Summary / 总结

The research aims to develop a method for accurately modeling complex spatiotemporal dynamics when the governing equations are unknown and data is sparse, especially for stiff PDE systems. The Stable Spectral Neural Operator (SSNO) is introduced, which embeds spectral structures in its architecture to handle system stiffness and learn underlying physics from limited data. SSNO achieves significantly lower prediction errors compared to existing models and demonstrates remarkable data efficiency, requiring only a few training trajectories for robust generalization to out-of-distribution conditions.

研究旨在开发一种方法，以准确建模当未知动力学方程且数据稀少时的复杂时空动态，特别是对于刚性偏微分方程（PDE）系统。引入了稳定的谱神经算子（SSNO），该算子在其架构中嵌入谱结构以处理系统刚性并从有限数据中学习潜在的物理规律。SSNO在预测误差上显著低于现有模型，并且表现出出色的数据效率，只需要少量的训练轨迹即可实现对未见过分布条件的稳健泛化。

MedRule-KG: A Knowledge-Graph--Steered Scaffold for Mathematical Reasoning with a Lightweight Verifier

Authors: Crystal Su

First: 2025-10-18T02:39:13+00:00 · Latest: 2025-12-12T16:08:36+00:00

Comments: This paper is withdrawn due to issues with attribution and citation accuracy

Abs · PDF · Code1 · Code2

Abstract

Large language models (LLMs) often produce fluent reasoning steps while violating simple mathematical or logical constraints. We introduce MedRule-KG, a compact typed knowledge graph coupled with a symbolic verifier, designed to enforce mathematically interpretable rules in reasoning tasks. MedRule-KG encodes entities, relations, and three domain-inspired rules, while the verifier checks predictions and applies minimal corrections to guarantee consistency. On a 90-example FDA-derived benchmark, grounding in MedRule-KG improves exact match (EM) from 0.767 to 0.900, and adding the verifier yields 1.000 EM while eliminating rule violations entirely. We demonstrate how MedRule-KG provides a general scaffold for safe mathematical reasoning, discuss ablations, and release code and data to encourage reproducibility.

中文标题/摘要

标题：MedRule-KG：一种基于知识图谱的轻量级验证器框架以实现数学推理

大型语言模型（LLMs）通常会产生流畅的推理步骤，但违反了简单的数学或逻辑约束。我们引入了MedRule-KG，这是一种紧凑的类型化知识图谱，结合了一个符号验证器，旨在在推理任务中强制执行可解释的数学规则。MedRule-KG 编码实体、关系和三个领域启发式规则，而验证器检查预测并应用最小的修正以确保一致性。在由90个例子组成的FDA衍生基准测试中，基于MedRule-KG 的准确匹配率（EM）从0.767提高到0.900，添加验证器后EM达到1.000，同时完全消除了规则违反。我们展示了MedRule-KG 如何提供一个通用的框架以实现安全的数学推理，讨论了消融实验，并发布了代码和数据以促进可重复性。

Summary / 总结

MedRule-KG is a knowledge graph-based system that includes a symbolic verifier to ensure mathematical correctness in reasoning tasks. It encodes domain-specific rules and improves exact match scores from 0.767 to 1.000 on a benchmark, eliminating rule violations. The system provides a general framework for safe mathematical reasoning.

MedRule-KG 是一个基于知识图谱的系统，包含一个符号验证器以确保推理任务中的数学一致性。它编码实体、关系和领域特定规则，验证器检查预测并应用最小修正。在基准测试中，使用 MedRule-KG 提高了精确匹配率从 0.767 到 0.900，添加验证器实现了 1.000 的精确匹配率且无规则违反。该系统提供了一个安全数学推理的一般框架，并已发布以促进可重复性。

iPINNER: An Iterative Physics-Informed Neural Network with Ensemble Kalman Filter

Authors: Binghang Lu, Changhong Mou, Guang Lin

First: 2025-05-31T22:20:18+00:00 · Latest: 2025-12-12T16:05:56+00:00

Abs · PDF · Code1 · Code2

Abstract

Physics-informed neural networks (PINNs) have emerged as a powerful tool for solving forward and inverse problems involving partial differential equations (PDEs) by incorporating physical laws into the training process. However, the performance of PINNs is often hindered in real-world scenarios involving noisy observational data and missing physics, particularly in inverse problems. In this work, we propose an iterative multi-objective PINN ensemble Kalman filter (iPINNER) framework that improves the robustness and accuracy of PINNs in both forward and inverse problems by using the \textit{ensemble Kalman filter} and the \textit{non-dominated sorting genetic algorithm} III (NSGA-III). Specifically, NSGA-III is used as a multi-objective optimizer that can generate various ensemble members of PINNs along the optimal Pareto front, while accounting the model uncertainty in the solution space. These ensemble members are then utilized within the EnKF to assimilate noisy observational data. The EnKF's analysis is subsequently used to refine the data loss component for retraining the PINNs, thereby iteratively updating their parameters. The iterative procedure generates improved solutions to the PDEs. The proposed method is tested on two benchmark problems: the one-dimensional viscous Burgers equation and the time-fractional mixed diffusion-wave equation (TFMDWE). The numerical results show it outperforms standard PINNs in handling noisy data and missing physics.

中文标题/摘要

标题：iPINNER：迭代物理导向神经网络与集合卡尔曼滤波结合框架

物理导向神经网络（PINNs）已成为通过将物理定律纳入训练过程来解决涉及偏微分方程（PDEs）的前向和逆向问题的强大工具。然而，在涉及嘈杂观测数据和缺失物理现象的实际场景中，PINNs 的性能往往受到阻碍，尤其是在逆向问题中。本文提出了一种迭代多目标 PINN 集合卡尔曼滤波（iPINNER）框架，通过使用集合卡尔曼滤波和非支配排序遗传算法 III（NSGA-III）来提高 PINNs 在前向和逆向问题中的稳健性和准确性。具体而言，NSGA-III 用作多目标优化器，可以生成 PINNs 的各种集合成员，同时考虑解空间中的模型不确定性。这些集合成员随后在 EnKF 中用于吸收嘈杂的观测数据。EnKF 的分析随后用于细化数据损失项，以重新训练 PINNs，从而迭代更新其参数。迭代过程生成了 PDEs 的改进解。该方法在两个基准问题上进行了测试：一维粘性 Burgers 方程和时间分数混合扩散-波动方程（TFMDWE）。数值结果表明，该方法在处理嘈杂数据和缺失物理现象方面优于标准 PINNs。

Summary / 总结

The research aims to enhance the robustness and accuracy of physics-informed neural networks (PINNs) in solving both forward and inverse problems, especially when dealing with noisy data and missing physics. The proposed iPINNER framework uses an ensemble Kalman filter and non-dominated sorting genetic algorithm III to generate multiple PINN ensemble members and refine their parameters iteratively. The method is tested on two benchmark problems and demonstrates superior performance in handling noisy data compared to standard PINNs.

研究动机是提高物理感知神经网络（PINNs）在处理噪声观测数据和缺失物理现象，特别是在逆问题中的稳健性和准确性。方法是采用迭代多目标PINN集成卡尔曼滤波器（iPINNER）框架，使用非支配排序遗传算法III（NSGA-III）生成集成成员，并使用集成卡尔曼滤波器（EnKF）进行数据同化。关键实验结果表明，iPINNER在处理噪声数据和缺失物理现象方面优于标准PINNs，如在涉及一维粘性Burgers方程和时间分数混合扩散波方程（TFMDWE）的基准问题中所展示的那样。

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

Authors: Tim Cofala, Christian Kalfar, Jingge Xiao, Johanna Schrader, Michelle Tang, Wolfgang Nejdl

First: 2025-12-12T16:01:48+00:00 · Latest: 2025-12-12T16:01:48+00:00

Comments: 7 pages, 3 figures

Abs · PDF · Code1 · Code2 · Project1

Abstract

Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy of both the reasoning trace and the sequence of tool invocations critical. These considerations motivate evaluation protocols treating token-level reasoning and tool-usage behaviors as explicit supervision signals. This work presents insights derived from our participation in the CURE-Bench NeurIPS 2025 Challenge, which benchmarks therapeutic-reasoning systems using metrics that assess correctness, tool utilization, and reasoning quality. We analyze how retrieval quality for function (tool) calls influences overall model performance and demonstrate performance gains achieved through improved tool-retrieval strategies. Our work was awarded the Excellence Award in Open Science. Complete information can be found at https://curebench.ai/.

中文标题/摘要

标题：MedAI：在NeurIPS CURE-Bench 竞赛中评估TxAgent的治疗代理推理

临床医学中的治疗决策构成一个高风险领域，在该领域中，AI指导与患者特征、疾病过程和药物剂型之间的复杂相互作用相互作用。诸如药物推荐、治疗规划和不良反应预测等任务需要基于可靠生物医学知识的稳健、多步推理。代理AI方法，如TxAgent，通过迭代检索增强生成（RAG）来应对这些挑战。TxAgent 使用微调后的 Llama-3.1-8B 模型，动态生成并执行对统一生物医学工具套件（ToolUniverse）的功能调用，整合FDA药物API、OpenTargets和Monarch资源，以确保获取当前的治疗信息。与通用RAG系统不同，医疗应用施加了严格的安全部署约束，因此推理轨迹的准确性和工具调用序列的准确性至关重要。这些考虑促使评估协议将标记级推理和工具使用行为视为显式监督信号。本研究介绍了我们参加CURE-Bench NeurIPS 2025挑战的见解，该挑战使用评估正确性、工具使用和推理质量的指标来评估治疗推理系统。我们分析了功能（工具）调用检索质量对整体模型性能的影响，并展示了通过改进工具检索策略实现的性能提升。我们的工作获得了开放科学卓越奖。更多信息请参见https://curebench.ai/。

Summary / 总结

This study evaluates TxAgent's therapeutic reasoning in the NeurIPS CURE-Bench competition, focusing on its ability to generate and execute function calls to a biomedical tool suite for drug recommendation and adverse-effect prediction. The research uses a fine-tuned Llama-3.1-8B model and demonstrates performance gains through enhanced tool-retrieval strategies, with insights into the impact of retrieval quality on overall model performance. The work received the Excellence Award in Open Science.

该研究在NeurIPS CURE-Bench竞赛中评估了TxAgent在治疗推理方面的表现，重点在于其整合生物医学知识和工具使用的能力，用于药物推荐和不良反应预测。TxAgent使用一个微调后的Llama-3.1-8B模型动态生成并执行功能调用，确保获取最新的治疗信息。关键发现表明，提高功能调用的检索质量可以提升整体模型性能，并且该工作获得了开放科学卓越奖。

Cross-modal Context-aware Learning for Visual Prompt Guided Multimodal Image Understanding in Remote Sensing

Authors: Xu Zhang, Jiabin Fang, Zhuoming Ding, Jin Yuan, Xuan Liu, Qianjun Zhang, Zhiyong Li

First: 2025-12-12T15:59:49+00:00 · Latest: 2025-12-12T15:59:49+00:00

Comments: 12 pages, 5 figures

Abs · PDF · Code1 · Code2

Abstract

Recent advances in image understanding have enabled methods that leverage large language models for multimodal reasoning in remote sensing. However, existing approaches still struggle to steer models to the user-relevant regions when only simple, generic text prompts are available. Moreover, in large-scale aerial imagery many objects exhibit highly similar visual appearances and carry rich inter-object relationships, which further complicates accurate recognition. To address these challenges, we propose Cross-modal Context-aware Learning for Visual Prompt-Guided Multimodal Image Understanding (CLV-Net). CLV-Net lets users supply a simple visual cue, a bounding box, to indicate a region of interest, and uses that cue to guide the model to generate correlated segmentation masks and captions that faithfully reflect user intent. Central to our design is a Context-Aware Mask Decoder that models and integrates inter-object relationships to strengthen target representations and improve mask quality. In addition, we introduce a Semantic and Relationship Alignment module: a Cross-modal Semantic Consistency Loss enhances fine-grained discrimination among visually similar targets, while a Relationship Consistency Loss enforces alignment between textual relations and visual interactions. Comprehensive experiments on two benchmark datasets show that CLV-Net outperforms existing methods and establishes new state-of-the-art results. The model effectively captures user intent and produces precise, intention-aligned multimodal outputs.

中文标题/摘要

标题：遥感多模态图像理解中的跨模态上下文感知学习用于视觉提示引导

图像理解的最新进展使方法能够利用大型语言模型进行遥感多模态推理。然而，现有的方法在仅提供简单的通用文本提示时，仍然难以引导模型关注用户相关区域。此外，在大规模航空图像中，许多对象具有高度相似的视觉外观，并携带丰富的对象间关系，这进一步增加了准确识别的复杂性。为了解决这些挑战，我们提出了跨模态上下文感知学习用于视觉提示引导的多模态图像理解（CLV-Net）。CLV-Net 允许用户提供一个简单的视觉提示，一个边界框，以指示感兴趣的区域，并使用该提示引导模型生成与用户意图一致的相关分割掩码和描述。我们设计的核心是上下文感知掩码解码器，它建模并整合对象间关系以增强目标表示并提高掩码质量。此外，我们引入了语义和关系对齐模块：跨模态语义一致性损失增强了视觉上相似目标之间的细粒度区分，而关系一致性损失强制文本关系与视觉交互之间的对齐。在两个基准数据集上的全面实验表明，CLV-Net 超过了现有方法并建立了新的最先进的结果。该模型有效地捕捉了用户意图并产生了精确、意图一致的多模态输出。

Summary / 总结

The paper addresses the challenge of guiding multimodal image understanding in remote sensing with simple text prompts. It introduces CLV-Net, which uses a visual cue like a bounding box to generate accurate segmentation masks and captions. Key findings show that CLV-Net outperforms existing methods and sets new state-of-the-art results on benchmark datasets.

研究旨在通过解决仅用简单文本提示引导模型的问题，提高遥感中的多模态图像理解。CLV-Net 使用视觉提示，如边界框，生成准确的分割掩码和描述。关键发现表明，CLV-Net 在基准数据集上的表现优于现有方法，并建立了新的最佳水平。

Stochastics of shapes and Kunita flows

Authors: Stefan Sommer, Gefan Yang, Elizabeth Louise Baker

First: 2025-12-12T15:54:32+00:00 · Latest: 2025-12-12T15:54:32+00:00

Abs · PDF · Code1 · Code2

Abstract

Stochastic processes of evolving shapes are used in applications including evolutionary biology, where morphology changes stochastically as a function of evolutionary processes. Due to the non-linear and often infinite-dimensional nature of shape spaces, the mathematical construction of suitable stochastic shape processes is far from immediate. We define and formalize properties that stochastic shape processes should ideally satisfy to be compatible with the shape structure, and we link this to Kunita flows that, when acting on shape spaces, induce stochastic processes that satisfy these criteria by their construction. We couple this with a survey of other relevant shape stochastic processes and show how bridge sampling techniques can be used to condition shape stochastic processes on observed data thereby allowing for statistical inference of parameters of the stochastic dynamics.

中文标题/摘要

标题：形状的随机性与Kunita流

演化形状的随机过程在进化生物学等应用中被使用，其中形态会作为进化过程的函数而随机变化。由于形状空间的非线性和通常的无限维性质，构造合适的随机形状过程远非易事。我们定义并形式化了理想的随机形状过程应满足的属性，使其与形状结构相兼容，并将此与Kunita流联系起来，当Kunita流作用于形状空间时，会诱导出满足这些标准的随机过程。我们结合了其他相关形状随机过程的综述，并展示了如何使用桥梁抽样技术来根据观测数据条件化形状随机过程，从而允许对随机动力学参数进行统计推断。

Summary / 总结

This paper addresses the challenge of modeling stochastic processes of evolving shapes, which is crucial in fields like evolutionary biology. The authors define and formalize the necessary properties for stochastic shape processes to be compatible with the shape structure. They link these properties to Kunita flows, which, when applied to shape spaces, generate stochastic processes that meet these criteria. Additionally, the paper explores other relevant shape stochastic processes and demonstrates how bridge sampling techniques can condition these processes on observed data, enabling parameter inference for the stochastic dynamics.

本文探讨了定义演化形状的随机过程的挑战，这些过程在进化生物学等领域至关重要。作者正式化了这些过程需要满足的属性，使其与形状结构兼容，并将这些属性与Kunita流联系起来，Kunita流在作用于形状空间时可以诱导满足这些标准的随机过程。他们还探讨了其他相关的形状随机过程，并展示了如何使用桥梁采样技术将这些过程调整到观测数据上，从而能够对随机动力学的参数进行统计推断。