arXiv 论文速递

Snapshot: 20260206_0347

Reinforced Attention Learning

Authors: Bangzheng Li, Jianmo Ni, Chen Qu, Ian Miao, Liu Yang, Xingyu Fu, Muhao Chen, Derek Zhiyuan Cheng

First: 2026-02-04T18:59:52+00:00 · Latest: 2026-02-04T18:59:52+00:00

Abstract

Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.

中文标题/摘要

标题：强化注意学习

通过测试时缩放，后训练的强化学习（RL）显著提高了大型语言模型（LLMs）的推理能力。然而，通过冗长的推理将这一范式扩展到多模态LLMs（MLLMs）在感知方面仅获得有限的收益，甚至可能降低性能。我们提出了一种强化注意学习（RAL），这是一种策略梯度框架，直接优化内部注意分布而非输出标记序列。通过将优化从生成什么转移到注意哪里，RAL促进了有效信息分配并提高了对复杂多模态输入的语义关联。跨多种图像和视频基准的实验显示，RAL在GRPO和其他基线方法上均表现出一致的改进。我们进一步引入了策略梯度注意蒸馏，表明转移潜在的注意行为比标准的知识蒸馏能获得更强的跨模态对齐。我们的结果将注意力策略定位为多模态后训练的一种原则性和通用替代方案。

Summary / 总结

Reinforced Attention Learning (RAL) is a policy-gradient framework that optimizes internal attention distributions in Large Language Models (LLMs) and Multimodal LLMs (MLLMs) rather than output token sequences. This approach enhances effective information allocation and improves grounding in complex multimodal inputs, leading to consistent gains across various image and video benchmarks over existing methods like GRPO. Additionally, the study introduces On-Policy Attention Distillation, which shows that transferring latent attention behaviors improves cross-modal alignment more effectively than standard knowledge distillation.

Reinforced Attention Learning (RAL) 是一种策略梯度框架，直接优化大型语言模型（LLM）和多模态大型语言模型（MLLM）的内部注意力分布，而不是输出的标记序列。这种方法增强了有效信息分配，并在复杂多模态输入中提高了语义对齐，跨多种图像和视频基准测试显示，相比现有方法（如GRPO）具有持续的改进。此外，研究还引入了策略相关注意力蒸馏，表明转移潜在注意力行为比标准知识蒸馏能更有效地提高跨模态对齐。

Protein Autoregressive Modeling via Multiscale Structure Generation

Authors: Yanru Qu, Cheng-Yen Hsieh, Zaixiang Zheng, Ge Liu, Quanquan Gu

First: 2026-02-04T18:59:49+00:00 · Latest: 2026-02-04T18:59:49+00:00

Comments: ByteDance Seed Tech Report; Page: https://par-protein.github.io/

Abs · PDF · Code1 · Code2 · Project1

Abstract

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.

中文标题/摘要

标题：基于多尺度结构生成的蛋白质自回归建模

我们提出了蛋白质自回归建模（PAR），这是第一个通过从粗到细的逐级预测来生成蛋白质主链的多尺度自回归框架。利用蛋白质的分层特性，PAR 生成结构，模仿雕塑雕像的过程，形成粗略的拓扑结构并逐级细化结构细节。为此，PAR 包含三个关键组件：(i) 多尺度下采样操作，在训练过程中表示不同尺度的蛋白质结构；(ii) 一个自回归变压器，编码多尺度信息并生成条件嵌入以指导结构生成；(iii) 一个基于流的主链解码器，根据这些嵌入生成主链原子。此外，自回归模型会受到曝光偏差的影响，这是由于训练和生成过程的不匹配，这显著降低了结构生成的质量。我们通过采用噪声上下文学习和计划采样有效缓解了这一问题，使主链生成更加稳健。值得注意的是，PAR 具有强大的零样本泛化能力，支持灵活的人类提示条件生成和模式支架，无需微调。在无条件生成基准测试中，PAR 有效地学习了蛋白质分布并生成了高质量的主链，表现出良好的扩展行为。这些特性使 PAR 成为蛋白质结构生成的一个有前途的框架。

Summary / 总结

The research introduces Protein Autoregressive Modeling (PAR), a novel multi-scale autoregressive framework for protein backbone generation. PAR uses coarse-to-fine next-scale prediction, leveraging hierarchical protein structures. It comprises multi-scale downsampling, an autoregressive transformer, and a flow-based backbone decoder. PAR addresses exposure bias through noisy context learning and scheduled sampling, enhancing backbone generation quality. Key findings include strong zero-shot generalization and high-quality unconditional protein backbone generation, demonstrating favorable scaling behavior.

研究引入了蛋白质自回归建模（PAR），这是一种多尺度自回归框架，通过粗到细的预测生成蛋白质骨架结构。PAR 使用多尺度下采样、自回归变压器和基于流的骨架解码器来生成详细的蛋白质结构。研究通过使用噪声上下文学习和计划采样来解决自回归模型中的暴露偏差问题，从而提高结构生成质量。PAR 具有强大的零样本泛化能力，支持灵活的条件生成和基序构建，无需微调。在基准测试中，PAR 有效地学习了蛋白质分布并生成了高质量的骨架结构，显示出良好的扩展行为。

Contrastive Continual Learning for Model Adaptability in Internet of Things

Authors: Ajesh Koyatan Chathoth

First: 2026-02-04T18:59:14+00:00 · Latest: 2026-02-04T18:59:14+00:00

Abs · PDF · Code1 · Code2

Abstract

Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements can affect application utility. Continual learning (CL) addresses this by adapting models over time without catastrophic forgetting. Meanwhile, contrastive learning has emerged as a powerful representation-learning paradigm that improves robustness and sample efficiency in a self-supervised manner. This paper reviews the usage of \emph{contrastive continual learning} (CCL) for IoT, connecting algorithmic design (replay, regularization, distillation, prompts) with IoT system realities (TinyML constraints, intermittent connectivity, privacy). We present a unifying problem formulation, derive common objectives that blend contrastive and distillation losses, propose an IoT-oriented reference architecture for on-device, edge, and cloud-based CCL, and provide guidance on evaluation protocols and metrics. Finally, we highlight open unique challenges with respect to the IoT domain, such as spanning tabular and streaming IoT data, concept drift, federated settings, and energy-aware training.

中文标题/摘要

标题：对比连续学习在物联网模型适应性中的应用

物联网(IoT)部署在非平稳、动态环境中运行，其中传感器漂移、用户行为演变和异质用户隐私要求等因素会影响应用效用。连续学习(CL)通过在不发生灾难性遗忘的情况下适应模型来解决这一问题。同时，对比学习作为一种强大的自监督表示学习范式，提高了鲁棒性和样本效率。本文回顾了对比连续学习(CCL)在物联网中的应用，将算法设计（重放、正则化、蒸馏、提示）与物联网系统现实（TinyML约束、间歇性连接、隐私）联系起来。我们提出了一种统一的问题表述，推导出结合对比和蒸馏损失的共同目标，提出了面向设备、边缘和云的CCL参考架构，并提供了评估协议和指标的指导。最后，我们强调了与物联网领域相关的开放独特挑战，如跨越表单和流式物联网数据、概念漂移、联邦设置和能量感知训练。

Summary / 总结

This paper explores contrastive continual learning (CCL) for adapting IoT models in dynamic environments. It connects algorithmic design elements like replay, regularization, and distillation with IoT-specific constraints. Key findings include a unifying problem formulation, a reference architecture for on-device, edge, and cloud-based CCL, and guidance on evaluation protocols. The study highlights unique challenges in the IoT domain, such as concept drift and federated settings.

本文探讨了对比连续学习（CCL）在动态环境中的应用，将算法设计元素如重放、正则化和蒸馏与物联网特定约束相结合。主要发现包括统一的问题表述、针对设备端、边缘和云的CCL参考架构以及评估协议的指导。研究还强调了物联网领域的独特挑战，如概念漂移和联邦设置。

Robust inverse material design with physical guarantees using the Voigt-Reuss Net

Authors: Sanath Keshav, Felix Fritzen

First: 2025-11-14T15:17:37+00:00 · Latest: 2026-02-04T18:59:08+00:00

Abs · PDF · Code1 · Code2

Abstract

We propose a spectrally normalized surrogate for forward and inverse mechanical homogenization with hard physical guarantees. Leveraging the Voigt-Reuss bounds, we factor their difference via a Cholesky-like operator and learn a dimensionless, symmetric positive semi-definite representation with eigenvalues in $[0,1]$; the inverse map returns symmetric positive-definite predictions that lie between the bounds in the Löwner sense. In 3D linear elasticity on an open dataset of stochastic biphasic microstructures, a fully connected Voigt-Reuss net trained on $>\!7.5\times 10^{5}$ FFT-based labels with 236 isotropy-invariant descriptors and three contrast parameters recovers the isotropic projection with near-perfect fidelity (isotropy-related entries: $R^2 \ge 0.998$), while anisotropy-revealing couplings are unidentifiable from $SO(3)$-invariant inputs. Tensor-level relative Frobenius errors have median $\approx 1.7\%$ and mean $\approx 3.4\%$ across splits. For 2D plane strain on thresholded trigonometric microstructures, coupling spectral normalization with a differentiable renderer and a CNN yields $R^2>0.99$ on all components, subpercent normalized losses, accurate tracking of percolation-induced eigenvalue jumps, and robust generalization to out-of-distribution images. Treating the parametric microstructure as design variables, batched first-order optimization with a single surrogate matches target tensors within a few percent and returns diverse near-optimal designs. Overall, the Voigt-Reuss net unifies accurate, physically admissible forward prediction with large-batch, constraint-consistent inverse design, and is generic to elliptic operators and coupled-physics settings.

中文标题/摘要

标题：使用Voigt-Reuss网进行具有物理保证的稳健逆材料设计

我们提出了一种基于谱归一化的代理模型，用于前向和逆向机械均质化，具有严格的物理保证。利用Voigt-Reuss界限，我们通过Cholesky-like算子将它们的差异分解，并学习一个无量纲、对称半正定表示，其特征值在[0,1]区间内；逆映射返回在Löwner意义上位于界限之间的对称正定预测。在3D线性弹性中，对于一个开放的数据集中的随机双相微观结构，一个全连接的Voigt-Reuss网在超过7.5×10^5个FFT基标签和236个各向同性不变描述符及三个对比参数的训练下，以近乎完美的精度恢复了各向同性投影（与各向同性相关的条目：R^2 ≥ 0.998），而从SO(3)不变输入中无法识别出各向异性揭示的耦合。张量级相对Frobenius误差在各分割中的中位数约为1.7%，平均值约为3.4%。对于阈值化三角形微结构上的二维平面应变，将谱归一化与可微渲染器和CNN结合使用，所有分量上的R^2>0.99，亚百分比归一化损失，准确跟踪渗流引起的特征值跳跃，并对未见过的数据图像具有鲁棒泛化能力。将参数化微观结构视为设计变量，批量一阶优化与单个代理匹配目标张量在几百分点内，并返回多样化的近最优设计。总体而言，Voigt-Reuss网统一了准确的、物理可接受的前向预测与大规模批次、约束一致的逆设计，并适用于椭圆算子和耦合物理设置。

Rethinking the Trust Region in LLM Reinforcement Learning

Authors: Penghui Qi, Xiangxin Zhou, Zichen Liu, Tianyu Pang, Chao Du, Min Lin, Wee Sun Lee

First: 2026-02-04T18:59:04+00:00 · Latest: 2026-02-04T18:59:04+00:00

Abs · PDF · Code1 · Code2

Abstract

Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning.

中文标题/摘要

标题：重新思考LLM强化学习中的信任区域

强化学习（RL）已成为精细调整大型语言模型（LLMs）的基石，其中近端策略优化（PPO）已成为事实上的标准算法。尽管其普及，我们仍认为PPO中的核心概率剪辑机制在本质上不适合LLMs固有的大词汇量。PPO基于采样标记的概率比来约束策略更新，这实际上是一个噪声单样本蒙特卡洛估计的真实策略差异。这造成了次优的学习动态：低概率标记的更新被过度惩罚，而高概率标记的重大变化则被过度约束，导致训练效率低下和不稳定。为了解决这一问题，我们提出了差异近端策略优化（DPPO），它用基于策略差异直接估计（例如，总变差或KL）的更原则性的约束替代了启发式剪辑。为了避免巨大的内存占用，我们引入了高效的二进制和Top-K近似来捕获关键的差异，几乎没有额外开销。广泛的实证评估表明，DPPO在训练稳定性和效率方面优于现有方法，为基于RL的LLM精细调整提供了更稳健的基础。

Summary / 总结

The paper addresses the limitations of the ratio clipping mechanism in Proximal Policy Optimization (PPO) for fine-tuning Large Language Models (LLMs), proposing Divergence Proximal Policy Optimization (DPPO) to improve training stability and efficiency. DPPO uses a direct estimate of policy divergence, such as Total Variation or KL, instead of heuristic clipping, and introduces Binary and Top-K approximations to reduce memory usage. Experiments show that DPPO outperforms existing methods in terms of training stability and efficiency.

论文针对Proximal Policy Optimization (PPO)中比例剪裁机制在大型语言模型（LLM）微调中的局限性，提出了Divergence Proximal Policy Optimization (DPPO)，用基于策略偏差估计的原理性约束替代了启发式剪裁。DPPO 使用二进制和Top-K近似来高效地捕捉关键的偏差。实验结果表明，DPPO 提高了训练的稳定性和效率，使其成为基于RL的LLM微调的更稳健方法。

CoWTracker: Tracking by Warping instead of Correlation

Authors: Zihang Lai, Eldar Insafutdinov, Edgar Sucar, Andrea Vedaldi

First: 2026-02-04T18:58:59+00:00 · Latest: 2026-02-04T18:58:59+00:00

Comments: Project website: cowtracker.github.io

Abs · PDF · Code1 · Code2

Abstract

Dense point tracking is a fundamental problem in computer vision, with applications ranging from video analysis to robotic manipulation. State-of-the-art trackers typically rely on cost volumes to match features across frames, but this approach incurs quadratic complexity in spatial resolution, limiting scalability and efficiency. In this paper, we propose \method, a novel dense point tracker that eschews cost volumes in favor of warping. Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate. Combined with a transformer architecture that performs joint spatiotemporal reasoning across all tracks, our design establishes long-range correspondences without computing feature correlations. Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP. Remarkably, the model also excels at optical flow, sometimes outperforming specialized methods on the Sintel, KITTI, and Spring benchmarks. These results suggest that warping-based architectures can unify dense point tracking and optical flow estimation.

中文标题/摘要

标题：CoWTracker：通过扭曲而非相关性进行跟踪

密集点跟踪是计算机视觉中的一个基本问题，应用于从视频分析到机器人操作的多个领域。最先进的跟踪器通常依赖于代价体来跨帧匹配特征，但这种方法在空间分辨率上导致了二次复杂度，限制了其可扩展性和效率。在本文中，我们提出了一种名为\method的新颖密集点跟踪器，它放弃了代价体，转而使用扭曲。受最近光流进展的启发，我们的方法通过基于当前估计值将目标帧中的特征扭曲到查询帧上来迭代细化跟踪估计。结合一个在所有跟踪上进行联合时空推理的变压器架构，我们的设计能够在不计算特征相关性的情况下建立长距离对应关系。我们的模型简单且在标准密集点跟踪基准测试（包括TAP-Vid-DAVIS、TAP-Vid-Kinetics和Robo-TAP）上达到了最先进的性能。令人惊讶的是，该模型在光流估计上也表现出色，有时在Sintel、KITTI和Spring基准测试上甚至超过了专门的方法。这些结果表明，基于扭曲的架构可以统一密集点跟踪和光流估计。

Summary / 总结

CoWTracker is a novel dense point tracker that uses warping instead of cost volumes, addressing the quadratic complexity issue of traditional methods. It iteratively refines track estimates by warping features from the target frame to the query frame and employs a transformer for joint spatiotemporal reasoning. CoWTracker achieves state-of-the-art performance on dense point tracking benchmarks and also excels in optical flow estimation, sometimes outperforming specialized methods.

CoWTracker 是一种新颖的密集点跟踪器，使用了变形而不是代价体，解决了传统方法的二次复杂性问题。它通过迭代将目标帧中的特征变形到查询帧，并使用变压器进行时空联合推理。CoWTracker 在密集点跟踪基准测试中达到了最先进的性能，并且在光流估计方面也表现出色，有时甚至超过了专门的方法。

PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation

Authors: Jiahao Zhan, Zizhang Li, Hong-Xing Yu, Jiajun Wu

First: 2026-02-04T18:58:55+00:00 · Latest: 2026-02-04T18:58:55+00:00

Comments: Project website: https://johnzhan2023.github.io/PerpetualWonder/

Abs · PDF · Code1 · Code2 · Project1

Abstract

We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because their physical state is decoupled from their visual representation, which prevents generative refinements to update the underlying physics for subsequent interactions. PerpetualWonder solves this by introducing the first true closed-loop system. It features a novel unified representation that creates a bidirectional link between the physical state and visual primitives, allowing generative refinements to correct both the dynamics and appearance. It also introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments demonstrate that from a single image, PerpetualWonder can successfully simulate complex, multi-step interactions from long-horizon actions, maintaining physical plausibility and visual consistency.

中文标题/摘要

标题：PerpetualWonder：长时程动作条件下的4D场景生成

我们介绍了PerpetualWonder，这是一种混合生成模拟器，能够从单张图像生成长时程、动作条件下的4D场景。当前的工作无法完成这一任务，因为它们的物理状态与视觉表示脱钩，这阻止了生成性改进更新后续交互的基础物理。PerpetualWonder通过引入第一个真正的闭环系统解决了这一问题。它采用了一种新颖的统一表示，建立了物理状态和视觉原语之间的双向链接，使生成性改进能够同时纠正动力学和外观。它还引入了一种稳健的更新机制，从多个视角收集监督以解决优化的不确定性。实验表明，从单张图像出发，PerpetualWonder能够成功模拟复杂的多步骤交互，保持物理合理性和视觉一致性。

Summary / 总结

PerpetualWonder is a hybrid generative simulator designed for long-horizon, action-conditioned 4D scene generation from a single image. It addresses the limitation of previous works by integrating a unified representation that links physical state and visual primitives bidirectionally, enabling generative refinements to update both dynamics and appearance. Additionally, it introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments show that PerpetualWonder can simulate complex, multi-step interactions with physical plausibility and visual consistency from a single image input.

PerpetualWonder 是一种混合生成模拟器，用于从单张图像生成长时序、动作条件下的4D场景。它通过将物理状态与视觉原语双向链接的真闭环系统解决了前人工作的局限性，使生成改进能够同时更新动态和外观。实验表明，PerpetualWonder 可以模拟复杂的多步交互，保持物理合理性和视觉一致性。

CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation

Authors: Yannick Denker, Alexander Gepperth

First: 2026-02-04T18:54:26+00:00 · Latest: 2026-02-04T18:54:26+00:00

Abs · PDF · Code1 · Code2

Abstract

Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.

中文标题/摘要

标题：CRoSS：一种用于具有高任务多样性和真实物理模拟的连续强化学习的持续机器人模拟套件

持续强化学习（CRL）要求智能体从任务序列中学习，而不忘记之前学到的策略。在本工作中，我们基于Gazebo模拟器中真实模拟的机器人引入了一个新的基准套件。我们的持续机器人模拟套件（CRoSS）基准依赖于两个机器人平台：一个具有激光雷达、摄像头和碰撞传感器的两轮差速驱动机器人，以及一个具有七个关节的机械臂。前者代表了在路径跟随和物体推移场景中的智能体，其中视觉和结构参数的变化产生了大量不同的任务，而后者则用于两个目标到达场景，分别采用高级笛卡尔手位置控制（模仿Continual World基准）和基于关节角度的低级控制。对于机械臂基准，我们还提供了仅包含运动学的变体，以绕过物理模拟（只要不需要传感器读数），并且可以快两个数量级运行。CRoSS旨在易于扩展，并允许在具有高度物理真实性的机器人环境中进行持续强化学习的受控研究，特别是允许使用几乎任意的模拟传感器。为了确保可再现性和易用性，我们提供了一个容器化设置（Apptainer），可以开箱即用，并报告了标准RL算法（包括深度Q网络（DQN）和策略梯度方法）的性能。这突显了其作为CRL研究的可扩展和可再现基准的适用性。

Summary / 总结

The research introduces CRoSS, a benchmark suite for continual reinforcement learning (CRL) using realistically simulated robots in Gazebo. It includes two robotic platforms: a two-wheeled differential-drive robot for line-following and object-pushing tasks, and a robotic arm for goal-reaching scenarios. The benchmarks are designed to be highly extensible and enable controlled studies of CRL in robotic settings with high physical realism. The suite provides performances of standard RL algorithms, indicating its suitability for scalable and reproducible CRL research.

研究引入了CRoSS，这是一个使用Gazebo中的真实模拟机器人进行持续强化学习（CRL）的基准套件。它包括两个机器人平台：一个两轮差速驱动机器人用于线跟随和物体推移任务，以及一个机器人手臂用于目标抓取场景。这些基准旨在高度可扩展，并允许在具有高物理真实性的机器人设置中进行控制研究。该套件提供了机器人手臂的物理模拟变体和仅运动学变体，允许更快执行。作者报告了标准RL算法的性能，表明CRoSS适合作为CRL研究的可扩展和可重复基准。

When LLaVA Meets Objects: Token Composition for Vision-Language-Models

Authors: Soumya Jahagirdar, Walid Bousselham, Anna Kukleva, Hilde Kuehne

First: 2026-02-04T18:50:46+00:00 · Latest: 2026-02-04T18:50:46+00:00

Abs · PDF · Code1 · Code2

Abstract

Current autoregressive Vision Language Models (VLMs) usually rely on a large number of visual tokens to represent images, resulting in a need for more compute especially at inference time. To address this problem, we propose Mask-LLaVA, a framework that leverages different levels of visual features to create a compact yet information-rich visual representation for autoregressive VLMs. Namely, we combine mask-based object representations together with global tokens and local patch tokens. While all tokens are used during training, it shows that the resulting model can flexibly drop especially the number of mask-based object-tokens at test time, allowing to adapt the number of tokens during inference without the need to retrain the model and without a significant drop in performance. We evaluate the proposed approach on a suite of standard benchmarks showing results competitive to current token efficient methods and comparable to the original LLaVA baseline using only a fraction of visual tokens. Our analysis demonstrates that combining multi-level features enables efficient learning with fewer tokens while allowing dynamic token selection at test time for good performance.

中文标题/摘要

标题：当LLaVA遇到物体：视觉语言模型的标记组成

当前自回归视觉语言模型（VLMs）通常依赖大量的视觉标记来表示图像，这在推理时需要更多的计算资源。为了解决这个问题，我们提出了Mask-LLaVA框架，该框架利用不同级别的视觉特征来创建一种紧凑但信息丰富的视觉表示，以供自回归VLMs使用。具体来说，我们结合了基于掩码的对象表示、全局标记和局部补丁标记。虽然所有标记都在训练中使用，但在测试时，结果显示模型可以灵活地减少基于掩码的对象标记的数量，允许在推理时动态调整标记数量，而无需重新训练模型且性能无显著下降。我们在一系列标准基准上评估了该方法，结果显示其结果与当前的标记高效方法相当，并且仅使用了原始LLaVA基线的一小部分视觉标记。我们的分析表明，结合多级特征可以在较少的标记下实现高效的训练，并允许在测试时动态选择标记以获得良好的性能。

Summary / 总结

The paper addresses the high computational demands of current Vision Language Models (VLMs) by proposing Mask-LLaVA, which uses a combination of mask-based object representations, global tokens, and local patch tokens to create a compact visual representation. The model can flexibly reduce the number of mask-based object tokens at inference time without significant performance loss. Evaluations on standard benchmarks show that Mask-LLaVA performs competitively with token-efficient methods and matches the original LLaVA baseline using fewer visual tokens.

论文通过提出Mask-LLaVA框架，结合基于掩码的对象表示、全局令牌和局部补丁令牌，创建了一个紧凑的视觉表示，以解决当前视觉语言模型（VLMs）的高计算需求问题。该模型可以在推理时灵活减少基于掩码的对象令牌的数量，而不显著影响性能。在标准基准上的评估显示，Mask-LLaVA在使用更少的视觉令牌的情况下，与现有的高效令牌方法和原始的LLaVA基线表现相当。

From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

Authors: Ryan Liu, Eric Qu, Tobias Kreiman, Samuel M. Blau, Aditi S. Krishnapriyan

First: 2026-02-04T18:50:10+00:00 · Latest: 2026-02-04T18:50:10+00:00

Comments: 13 pages main text, 10 pages reference & appendix, 8 figures

Abs · PDF · Code1 · Code2

Abstract

Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.

中文标题/摘要

标题：从评估到设计：使用势能面平滑度度量指导机器学习原子间势能架构

机器学习原子间势能（MLIPs）有时无法重现量子势能面（PES）的物理平滑性，导致下游模拟中出现标准能量和力回归评估可能忽略的错误行为。现有的评估方法，如微canonical分子动力学（MD），计算成本高昂且主要探测近平衡态。为了改进MLIP的评估指标，我们引入了键平滑度表征测试（BSCT）。该高效基准通过受控键变形探测PES，并检测非平滑性，包括不连续性、人工极小值和虚假力，无论是在平衡态附近还是远离平衡态。我们证明BSCT与MD稳定性高度相关，而成本仅为MD的一小部分。为了展示BSCT如何指导迭代模型设计，我们使用无约束的Transformer骨干作为测试平台，说明了诸如新的可微k最近邻算法和温度控制注意力等改进如何减少由我们指标识别的伪影。通过基于BSCT系统优化模型设计，最终生成的MLIP同时实现了较低的传统能量/力回归误差、稳定MD模拟和稳健的原子级属性预测。我们的结果确立了BSCT作为验证指标和“在环”模型设计代理的地位，提醒MLIP开发人员注意当前MLIP基准无法高效评估的物理挑战。

Summary / 总结

The research aims to improve the evaluation of Machine Learning Interatomic Potentials (MLIPs) by introducing the Bond Smoothness Characterization Test (BSCT), which efficiently detects non-smoothness in the potential energy surface (PES) through controlled bond deformations. The study demonstrates that BSCT strongly correlates with microcanonical molecular dynamics (MD) stability at a fraction of the computational cost. By using an unconstrained Transformer backbone as a testbed, the researchers show how BSCT can guide iterative model design, leading to MLIPs that achieve low energy and force regression errors, stable MD simulations, and robust atomistic property predictions.

研究旨在通过解决MLIPs无法再现量子势能面(PES)平滑性的问题，改进MLIP的评估。研究引入了键平滑性表征测试(BSCT)，通过受控的键变形检测MLIP中的非平滑性。BSCT与微canonical分子动力学(MD)稳定性高度相关，但计算成本较低。该方法展示了BSCT如何指导迭代模型设计，从而生成同时具有低常规能量/力回归误差、稳定MD模拟和稳健原子性质预测的MLIP。

Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization

Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli

First: 2026-01-30T16:58:40+00:00 · Latest: 2026-02-04T18:42:12+00:00

Comments: 18 pages, 3 figures

Abs · PDF · Code1 · Code2 · Code3

Abstract

Neural audio codecs are at the core of modern conversational speech technologies, converting continuous speech into sequences of discrete tokens that can be processed by LLMs. However, existing codecs typically operate at fixed frame rates, allocating tokens uniformly in time and producing unnecessarily long sequences. In this work, we introduce DyCAST, a Dynamic Character-Aligned Speech Tokenizer that enables variable-frame-rate tokenization through soft character-level alignment and explicit duration modeling. DyCAST learns to associate tokens with character-level linguistic units during training and supports alignment-free inference with direct control over token durations at decoding time. To improve speech resynthesis quality at low frame rates, we further introduce a retrieval-augmented decoding mechanism that enhances reconstruction fidelity without increasing bitrate. Experiments show that DyCAST achieves competitive speech resynthesis quality and downstream performance while using significantly fewer tokens than fixed-frame-rate codecs. Code and checkpoints will be released publicly at https://github.com/lucadellalib/dycast.

中文标题/摘要

标题：超越固定帧：动态字符对齐语音分词

神经音频编解码器是现代对话式语音技术的核心，将连续语音转换为可以被LLMs处理的离散令牌序列。然而，现有的编解码器通常以固定帧率运行，时间上均匀分配令牌，产生不必要的长序列。在本工作中，我们引入了DyCAST，这是一种动态字符对齐语音分词器，通过软字符级对齐和显式时长建模，实现可变帧率分词。DyCAST在训练期间学习将令牌与字符级语言单元关联，并支持在解码时直接控制令牌时长的无对齐推理。为了在低帧率下提高语音重合成质量，我们进一步引入了一种检索增强解码机制，无需增加比特率即可增强重建保真度。实验表明，DyCAST在使用显著较少的令牌的同时，实现了与固定帧率编解码器相当的语音重合成质量和下游性能。代码和检查点将在https://github.com/lucadellalib/dycast公开。

Summary / 总结

This work addresses the inefficiency of fixed frame rate tokenization in neural audio codecs by introducing DyCAST, a dynamic character-aligned speech tokenizer. DyCAST allows for variable frame rates and explicit duration modeling, improving speech resynthesis quality and downstream performance with fewer tokens compared to fixed-frame-rate codecs.

本文通过引入动态字符对齐语音分词器DyCAST，解决了固定帧率分词在神经音频编解码器中的低效问题。DyCAST支持可变帧率和显式时长建模，能够在较低比特率下提高语音重建质量并提升下游性能，同时使用更少的分词。

LitS: A novel Neighborhood Descriptor for Point Clouds

Authors: Jonatan B. Bastos, Francisco F. Rivera, Oscar G. Lorenzo, David L. Vilariño, José C. Cabaleiro, Alberto M. Esmorís, Tomás F. Pena

First: 2026-02-04T18:31:02+00:00 · Latest: 2026-02-04T18:31:02+00:00

Abs · PDF · Code1 · Code2

Abstract

With the advancement of 3D scanning technologies, point clouds have become fundamental for representing 3D spatial data, with applications that span across various scientific and technological fields. Practical analysis of this data depends crucially on available neighborhood descriptors to accurately characterize the local geometries of the point cloud. This paper introduces LitS, a novel neighborhood descriptor for 2D and 3D point clouds. LitS are piecewise constant functions on the unit circle that allow points to keep track of their surroundings. Each element in LitS' domain represents a direction with respect to a local reference system. Once constructed, evaluating LitS at any given direction gives us information about the number of neighbors in a cone-like region centered around that same direction. Thus, LitS conveys a lot of information about the local neighborhood of a point, which can be leveraged to gain global structural understanding by analyzing how LitS changes between close points. In addition, LitS comes in two versions ('regular' and 'cumulative') and has two parameters, allowing them to adapt to various contexts and types of point clouds. Overall, they are a versatile neighborhood descriptor, capable of capturing the nuances of local point arrangements and resilient to common point cloud data issues such as variable density and noise.

中文标题/摘要

标题：LitS：点云的新颖邻域描述符

随着3D扫描技术的进步，点云已成为表示3D空间数据的基础，其应用涵盖了各个科学和技术领域。对这些数据的实际分析依赖于可用的邻域描述符来准确地表征点云的局部几何结构。本文介绍了LitS，这是一种新颖的2D和3D点云邻域描述符。LitS是单位圆上的分段常数函数，允许点跟踪其周围的环境。LitS的每个元素代表相对于局部参考系统的方向。一旦构建完成，LitS在任何给定方向上的评估将提供关于该方向中心的锥形区域中邻居数量的信息。因此，LitS提供了关于点局部邻域的大量信息，可以通过分析LitS在相邻点之间的变化来获得全局结构理解。此外，LitS有两种版本（“常规”和“累积”）并有两个参数，使其能够适应各种上下文和类型的点云。总体而言，它们是一种多功能的邻域描述符，能够捕捉局部点排列的细微差别，并对常见的点云数据问题（如密度变化和噪声）具有鲁棒性。

Summary / 总结

This paper presents LitS, a novel neighborhood descriptor for 2D and 3D point clouds, designed to accurately characterize local geometries. LitS are piecewise constant functions on the unit circle that provide information about the number of neighbors in a cone-like region around a given direction. This descriptor helps in understanding the local neighborhood of points and can be adapted to different contexts through two versions and parameters. Key findings show that LitS effectively captures local point arrangements and is resilient to issues like variable density and noise.

本文提出了LitS，一种用于2D和3D点云的新颖邻域描述符，旨在准确刻画局部几何结构。LitS 是单位圆上的分段常数函数，通过在锥形区域计数邻点来跟踪点的周围环境。LitS 有多种版本和参数，使其能够适应不同类型的点云。主要发现表明，LitS 能有效捕捉局部点的排列方式，并且能够应对如密度变化和噪声等常见问题。

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

Authors: Zhaotian Weng, Antonis Antoniades, Deepak Nathani, Zhen Zhang, Xiao Pu, Xin Eric Wang

First: 2026-02-04T18:29:36+00:00 · Latest: 2026-02-04T18:29:36+00:00

Comments: 18 pages

Abs · PDF · Code1 · Code2

Abstract

Open-ended self-improving agents can autonomously modify their own structural designs to advance their capabilities and overcome the limits of pre-defined architectures, thus reducing reliance on human intervention. We introduce Group-Evolving Agents (GEA), a new paradigm for open-ended self-improvements, which treats a group of agents as the fundamental evolutionary unit, enabling explicit experience sharing and reuse within the group throughout evolution. Unlike existing open-ended self-evolving paradigms that adopt tree-structured evolution, GEA overcomes the limitation of inefficient utilization of exploratory diversity caused by isolated evolutionary branches. We evaluate GEA on challenging coding benchmarks, where it significantly outperforms state-of-the-art self-evolving methods (71.0% vs. 56.7% on SWE-bench Verified, 88.3% vs. 68.3% on Polyglot) and matches or exceeds top human-designed agent frameworks (71.8% and 52.0% on two benchmarks, respectively). Analysis reveals that GEA more effectively converts early-stage exploratory diversity into sustained, long-term progress, achieving stronger performance under the same number of evolved agents. Furthermore, GEA exhibits consistent transferability across different coding models and greater robustness, fixing framework-level bugs in 1.4 iterations on average, versus 5 for self-evolving methods.

中文标题/摘要

标题：群体演化智能体：通过经验分享实现自主持续改进

开放式的自主持续改进智能体能够自主修改自身的结构设计，以提升能力并克服预定义架构的限制，从而减少对人类干预的依赖。我们提出了群体演化智能体（GEA）这一新的开放性自主持续改进范式，将一群智能体视为基本的进化单元，使群体内部能够明确地分享和重用经验。与现有的采用树状结构进化的开放性自主演化范式不同，GEA 能够克服孤立进化分支导致的探索多样性利用效率低下的局限。我们在具有挑战性的编码基准测试上评估了 GEA，结果显示它显著优于最先进的自主演化方法（SWE-bench Verified 上为 71.0% 对比 56.7%，Polyglot 上为 88.3% 对比 68.3%），并且在某些基准测试上达到了或超过了顶级人工设计的智能体框架（分别为 71.8% 和 52.0%）。分析表明，GEA 更有效地将早期探索多样性转化为持续的长期进步，在相同数量的进化智能体下实现了更强的性能。此外，GEA 在不同编码模型之间表现出一致的可迁移性，并且具有更大的鲁棒性，平均只需 1.4 个迭代就能修复框架级别的错误，而自主演化方法则需要 5 个迭代。

Summary / 总结

The research aims to develop open-ended self-improving agents that can autonomously modify their designs to enhance their capabilities. Group-Evolving Agents (GEA) are introduced as a new paradigm that treats a group of agents as the fundamental evolutionary unit, allowing for explicit experience sharing and reuse. GEA outperforms state-of-the-art self-evolving methods on coding benchmarks, achieving higher success rates and demonstrating stronger performance with the same number of evolved agents. Additionally, GEA shows better transferability and robustness, fixing bugs more efficiently.

研究引入了组演化代理（GEA）这一新的自适应改进范式，允许一组代理共享和重用经验，克服了孤立进化分支导致的低效问题。GEA 在编码基准测试中的表现优于最先进的自适应改进方法，取得了更高的成功率，并展示了更强的迁移能力和鲁棒性。

Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments

Authors: Zhao Tong, Chunlin Gong, Yimeng Gu, Haichao Shi, Qiang Liu, Shu Wu, Xiao-Yu Zhang

First: 2025-10-10T04:39:57+00:00 · Latest: 2026-02-04T18:29:24+00:00

Comments: 10 pages, 12 figures

Abs · PDF · Code1 · Code2

Abstract

Online fake news profoundly distorts public judgment and erodes trust in social platforms. While existing detectors achieve competitive performance on benchmark datasets, they remain notably vulnerable to malicious comments designed specifically to induce misclassification. This evolving threat landscape necessitates detection systems that simultaneously prioritize predictive accuracy and structural robustness. However, current detectors often fail to generalize across diverse and novel comment attack patterns. To bridge this gap, we propose AdComment, an adaptive adversarial training framework for robustness enhancement against diverse malicious comments. Based on cognitive psychology, we categorize adversarial comments into Fact Distortion, Logical Confusion, and Emotional Manipulation, and leverage LLMs to synthesize diverse, category-specific perturbations. Central to our framework is an InfoDirichlet Resampling (IDR) mechanism that dynamically adjusts malicious comment proportions during training, thereby steering optimization toward the model's most susceptible regions. Experimental results demonstrate that our approach achieves state-of-the-art performance on three benchmark datasets, improving the F1 scores by 17.9%, 14.5% and 9.0%, respectively.

中文标题/摘要

标题：针对恶意评论的分组自适应对抗学习以增强鲁棒的假新闻检测

在线假新闻严重扭曲公众判断并侵蚀社交平台的信任。尽管现有的检测器在基准数据集上取得了竞争力的表现，但它们仍然明显容易受到专门设计以诱导分类错误的恶意评论的影响。这种不断演变的威胁环境需要同时注重预测准确性和结构鲁棒性的检测系统。然而，当前的检测器往往无法在多样且新颖的评论攻击模式中泛化。为了弥合这一差距，我们提出了一种AdComment，这是一种针对多种恶意评论的自适应对抗训练框架，以增强鲁棒性。基于认知心理学，我们将对抗性评论分为事实扭曲、逻辑混淆和情感操控三类，并利用大语言模型（LLM）生成多样化的、类别特定的扰动。我们框架的核心是InfoDirichlet重采样（IDR）机制，该机制在训练过程中动态调整恶意评论的比例，从而引导优化向模型最脆弱的区域。实验结果表明，我们的方法在三个基准数据集上取得了最先进的性能，分别提高了F1分数17.9%、14.5%和9.0%。

Summary / 总结

The paper addresses the vulnerability of existing fake news detectors to malicious comments that induce misclassification. It introduces AdComment, an adaptive adversarial training framework that uses LLMs to generate diverse adversarial comments categorized into Fact Distortion, Logical Confusion, and Emotional Manipulation. The framework employs an InfoDirichlet Resampling (IDR) mechanism to dynamically adjust the proportions of malicious comments during training, enhancing the model's robustness. Experiments show that AdComment improves F1 scores by 17.9%, 14.5%, and 9.0% on three benchmark datasets compared to existing methods.

论文针对现有假新闻检测器对恶意评论的脆弱性，提出了AdComment，一种适应性对抗训练框架，利用LLMs生成事实扭曲、逻辑混淆和情感操控等多样化的恶意评论。该框架通过InfoDirichlet重采样机制动态调整训练过程中恶意评论的比例，增强模型的鲁棒性。实验结果显示，AdComment在三个基准数据集上的F1分数分别提高了17.9%、14.5%和9.0%。

It's not a Lottery, it's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Authors: Hannah Pinson

First: 2026-02-04T18:22:40+00:00 · Latest: 2026-02-04T18:22:40+00:00

Abs · PDF · Code1 · Code2

Abstract

Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles -- mutual alignment, unlocking and racing -- that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to obtain higher weight norms.

中文标题/摘要

标题：这不是彩票，而是比赛：理解梯度下降如何调整网络容量以适应任务

我们对神经网络的理论理解落后于其实际应用。其中一个重要的未解释现象是在使用梯度下降训练过程中，神经网络的理论容量是如何被减少到一个适合任务的有效容量的。我们通过分析单隐藏层ReLU网络中单个神经元的学习动态，研究了梯度下降如何实现这一点。我们确定了三种动态原则——相互对齐、解锁和比赛——它们共同解释了为什么我们可以通过等价神经元的合并或低范数权重的剪枝在训练后通常能够成功地减少容量。我们具体解释了彩票票猜想背后的机制，即为什么某些神经元的特定、有益的初始条件使它们获得更高的权重范数。

Domain Generalization Under Posterior Drift

Authors: Yilun Zhu, Naihao Deng, Naichen Shi, Aditya Gangrade, Clayton Scott

First: 2025-10-06T02:17:12+00:00 · Latest: 2026-02-04T18:18:17+00:00

Abs · PDF · Code1 · Code2

Abstract

Domain generalization (DG) is the problem of generalizing from several distributions (or domains), for which labeled training data are available, to a new test domain for which no labeled data is available. For the prevailing benchmark datasets in DG, there exists a single classifier that performs well across all domains. In this work, we study a fundamentally different regime where the domains satisfy a \emph{posterior drift} assumption, in which the optimal classifier might vary substantially with domain. We establish a decision-theoretic framework for DG under posterior drift, and investigate the practical implications of this framework through experiments on language and vision tasks.

中文标题/摘要

标题：在后验漂移下的领域泛化

领域泛化(DG)是指在可以获取标记训练数据的多个分布(或领域)上进行泛化，以适应一个没有标记数据的新测试领域。对于DG领域的现有基准数据集，存在一个分类器可以在所有领域中表现良好。在本文中，我们研究了一种根本不同的情况，其中领域满足后验漂移假设，在这种假设下，最优分类器可能会在不同领域之间有很大差异。我们为后验漂移下的领域泛化建立了一个决策理论框架，并通过语言和视觉任务的实验探讨了该框架的实际意义。

Summary / 总结

This paper addresses domain generalization (DG) under the posterior drift assumption, where the optimal classifier can vary significantly across domains. The authors develop a decision-theoretic framework for DG in this regime and validate its practical implications through experiments on language and vision tasks, showing that the framework can effectively handle varying classifiers across different domains.

本文探讨了后验漂移假设下的域泛化（DG），在这种假设下，最优分类器在不同域中可能会有很大差异。作者提出了一个决策理论框架来处理这种DG情况，并通过语言和视觉任务的实验验证了该框架的实际意义，表明该框架能够有效处理不同域中变化的分类器。

Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning

Authors: Joydeep Chandra, Satyam Kumar Navneet, Aleksandr Algazinov, Yong Zhang

First: 2026-02-04T18:10:59+00:00 · Latest: 2026-02-04T18:10:59+00:00

Abs · PDF · Code1 · Code2

Abstract

Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.

中文标题/摘要

标题：基于不确定性感知同变预测和世界模型强化学习的城市交通安全控制

城市交通管理需要能够同时预测未来状况、检测异常并采取安全纠正措施的系统——同时提供可靠性保证。我们提出了STREAM-RL，一个统一框架，引入了三个新颖的算法贡献：(1) PU-GAT+，一种不确定性引导自适应同变预测器，利用预测不确定性动态重新加权图注意力，通过置信单调注意力实现无分布覆盖保证；(2) CRFN-BY，一种同变残差流网络，通过贝叶斯-耶库埃利随机发现控制下的归一化流模型不确定性归一化的残差；(3) LyCon-WRL+，一种不确定性引导的安全世界模型强化学习代理，具有李亚普诺夫稳定性证书、认证的利普希茨界和不确定性传播的想象滚动。据我们所知，这是第一个从预测到异常检测再到安全策略学习传播校准不确定性的一体化框架，具有端到端的理论保证。在多个真实世界交通轨迹数据上的实验表明，STREAM-RL 的覆盖效率为 91.4%，在验证相关性下 FDR 控制在 4.1%，安全率提高到 95.2%，而标准 PPO 为 69%，同时获得更高的奖励，端到端推理延迟为 23ms。

Summary / 总结

The research aims to develop a system for urban traffic management that can predict future traffic conditions, detect anomalies, and take safe corrective actions with reliability guarantees. The STREAM-RL framework introduces three novel algorithms: PU-GAT+, which uses prediction uncertainty to dynamically reweight graph attention; CRFN-BY, which models uncertainty-normalized residuals; and LyCon-WRL+, which ensures safe policy learning with theoretical guarantees. Experiments show that STREAM-RL achieves 91.4% coverage efficiency, controls FDR at 4.1%, and improves safety rates to 95.2% compared to 69% for standard PPO, with high reward and 23ms inference latency.

研究旨在解决城市交通管理系统需要同时预测未来交通状况、检测异常并采取安全纠正措施的需求。提出了STREAM-RL统一框架，包含PU-GAT+、CRFN-BY和LyCon-WRL+三个新颖组件，分别提升预测准确性、残差建模和安全强化学习。实验结果显示，STREAM-RL实现了91.4%的覆盖率效率，控制FDR在4.1%，并将安全率提高到95.2%，相比标准PPO的69%，同时具有23ms的端到端推理延迟。

XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas

Authors: Aqsa Sultana, Rayan Afsar, Ahmed Rahu, Surendra P. Singh, Brian Shula, Brandon Combs, Derrick Forchetti, Vijayan K. Asari

First: 2026-02-04T18:07:51+00:00 · Latest: 2026-02-04T18:07:51+00:00

Comments: 13 pages, 8 figures

Abs · PDF · Code1 · Code2

Abstract

Accurate risk stratification of precancerous polyps during routine colonoscopy screenings is essential for lowering the risk of developing colorectal cancer (CRC). However, assessment of low-grade dysplasia remains limited by subjective histopathologic interpretation. Advancements in digital pathology and deep learning provide new opportunities to identify subtle and fine morphologic patterns associated with malignant progression that may be imperceptible to the human eye. In this work, we propose XtraLight-MedMamba, an ultra-lightweight state-space-based deep learning framework for classifying neoplastic tubular adenomas from whole-slide images (WSIs). The architecture is a blend of ConvNext based shallow feature extractor with parallel vision mamba to efficiently model both long- and short-range dependencies and image generalization. An integration of Spatial and Channel Attention Bridge (SCAB) module enhances multiscale feature extraction, while Fixed Non-Negative Orthogonal Classifier (FNOClassifier) enables substantial parameter reduction and improved generalization. The model was evaluated on a curated dataset acquired from patients with low-grade tubular adenomas, stratified into case and control cohorts based on subsequent CRC development. XtraLight-MedMamba achieved an accuracy of 97.18% and an F1-score of 0.9767 using approximately 32,000 parameters, outperforming transformer-based and conventional Mamba architectures with significantly higher model complexity.

中文标题/摘要

标题：XtraLight-MedMamba用于鉴别性腺瘤性管状腺瘤分类

在常规结肠镜筛查中，准确的风险分层对于降低结直肠癌（CRC）的风险至关重要。然而，低度异型增生的评估仍然受限于主观的组织病理学解释。数字病理学和深度学习的进步为识别与恶性进展相关的细微和精细的形态学模式提供了新的机会，这些模式可能难以被肉眼察觉。在本工作中，我们提出了一种超轻量级的状态空间基于的深度学习框架XtraLight-MedMamba，用于从全切片图像（WSIs）中分类性腺瘤性管状腺瘤。该架构结合了基于ConvNext的浅层特征提取器与并行的Vision Mamba，以高效地建模长程和短程依赖关系以及图像泛化。空间和通道注意桥接（SCAB）模块的集成增强了多尺度特征提取，而固定非负正交分类器（FNOClassifier）实现了参数显著减少和泛化能力的提升。该模型在从低度管状腺瘤患者中获取的精心构建的数据集上进行了评估，该数据集根据后续CRC的发展情况分为病例组和对照组。XtraLight-MedMamba使用约32,000个参数实现了97.18%的准确率和0.9767的F1分数，性能优于基于变换器和传统Mamba架构，且模型复杂度显著更高。

Summary / 总结

The research aims to improve the accuracy of identifying precancerous polyps during colonoscopies through digital pathology and deep learning. XtraLight-MedMamba, an ultra-lightweight deep learning framework, was developed to classify neoplastic tubular adenomas. It combines ConvNext for feature extraction and parallel vision mamba for dependency modeling, with SCAB and FNOClassifier modules enhancing feature extraction and reducing parameters. The model achieved high accuracy and F1-score with minimal parameters, outperforming more complex architectures.

研究旨在通过提高对肿瘤性管状腺瘤的分类准确性来帮助在结肠镜筛查中对癌前息肉进行风险分层。研究引入了XtraLight-MedMamba，这是一种超轻量级的深度学习框架，使用了基于ConvNext的浅层特征提取器和并行的vision mamba来建模长程和短程依赖关系。该模型使用约32,000个参数实现了97.18%的准确率和0.9767的F1分数，优于更复杂的基于变换器和传统Mamba架构。

X2HDR: HDR Image Generation in a Perceptually Uniform Space

Authors: Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao, Rafał K. Mantiuk

First: 2026-02-04T17:59:51+00:00 · Latest: 2026-02-04T17:59:51+00:00

Comments: Project page: https://x2hdr.github.io/, Code: https://github.com/X2HDR/X2HDR

Abs · PDF · Code1 · Code2 · Code3 · Project1

Abstract

High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dynamic-range (LDR) output due to the lack of large-scale HDR training data. In this work, we show that existing pretrained diffusion models can be easily adapted to HDR generation without retraining from scratch. A key challenge is that HDR images are natively represented in linear RGB, whose intensity and color statistics differ substantially from those of sRGB-encoded LDR images. This gap, however, can be effectively bridged by converting HDR inputs into perceptually uniform encodings (e.g., using PU21 or PQ). Empirically, we find that LDR-pretrained variational autoencoders (VAEs) reconstruct PU21-encoded HDR inputs with fidelity comparable to LDR data, whereas linear RGB inputs cause severe degradations. Motivated by this finding, we describe an efficient adaptation strategy that freezes the VAE and finetunes only the denoiser via low-rank adaptation in a perceptually uniform space. This results in a unified computational method that supports both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction. Experiments demonstrate that our perceptually encoded adaptation consistently improves perceptual fidelity, text-image alignment, and effective dynamic range, relative to previous techniques.

中文标题/摘要

标题：X2HDR：在感知均匀空间中生成高动态范围图像

高动态范围（HDR）格式和显示器正在变得越来越普遍，然而最先进的图像生成器（例如Stable Diffusion和FLUX）通常仍然局限于低动态范围（LDR）输出，因为缺乏大规模HDR训练数据。在本文中，我们展示了现有的预训练扩散模型可以轻松适应HDR生成，而无需从头开始重新训练。一个关键挑战是HDR图像以线性RGB形式原生表示，其强度和颜色统计与sRGB编码的LDR图像有显著差异。然而，通过将HDR输入转换为感知均匀编码（例如使用PU21或PQ），这一差距可以得到有效解决。实验证明，LDR预训练的变分自编码器（VAEs）可以以与LDR数据相当的保真度重建PU21编码的HDR输入，而线性RGB输入会导致严重的降解。受此发现的启发，我们描述了一种高效的适应策略，即冻结VAE并仅通过低秩适应在感知均匀空间中微调去噪器。这产生了一种统一的计算方法，支持文本到HDR合成和单张图像RAW到HDR重建。实验表明，与以前的技术相比，我们的感知编码适应始终提高了感知保真度、文本-图像对齐和有效动态范围。

Summary / 总结

This work addresses the challenge of generating HDR images using pretrained low-dynamic-range (LDR) models by converting HDR inputs into perceptually uniform encodings. The key finding is that LDR-pretrained variational autoencoders (VAEs) can reconstruct PU21-encoded HDR inputs with high fidelity, while linear RGB inputs lead to severe degradations. The proposed method, X2HDR, adapts the VAE by freezing the encoder and fine-tuning only the denoiser in a perceptually uniform space, enabling both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction with improved perceptual quality and dynamic range.

研究通过将HDR输入转换为感知均匀编码来解决使用预训练LDR模型生成HDR图像的挑战。方法使用预训练在LDR数据上的VAE，并仅在感知均匀空间中微调去噪器，从而在感知保真度、文本-图像对齐和动态范围方面取得改进。该方法支持文本到HDR合成和单图像RAW到HDR重建，无需从头重新训练整个模型。

Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents

Authors: Shubham Vatsal, Harsh Dubey, Aditi Singh

Venue: IEEE Access, vol. 14, pp. 4840-4863, 2026

First: 2026-02-04T17:59:14+00:00 · Latest: 2026-02-04T17:59:14+00:00

Abs · PDF · Code1 · Code2

Abstract

Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).

中文标题/摘要

标题：医疗健康领域中的代理型AI：基于大型语言模型代理的七维度分类学及其实证评估

基于大型语言模型（LLM）的代理已经开始塑造医疗健康领域。已有研究展示了这些代理在从电子病历分析和诊断到治疗规划和研究工作流等各项任务上的能力。然而，现有文献主要是一些综述性文章，要么是广泛的综述，要么是单一能力（如记忆、规划、推理）的深入探讨，缺乏一个共同的框架。我们通过使用七维度分类学（认知能力、知识管理、交互模式、适应与学习、安全与伦理、框架类型学和核心任务与子任务，包含29个操作子维度）来审查49项研究，解决了这一问题。我们使用明确的纳入和排除标准以及标签标准（完全实现、部分实现、未实现），将每项研究映射到分类学，并报告了能力的出现频率和共现模式的定量总结。我们的实证分析揭示了明显的不对称性。例如，在知识管理下的外部知识集成子维度通常被实现（约76%完全实现），而交互模式下的事件触发激活子维度几乎不存在（约92%未实现），适应与学习下的漂移检测与缓解子维度很少见（约98%未实现）。从架构上看，框架类型学下的多代理设计子维度是最常见的模式（约82%完全实现），而编排层仍然主要是部分实现。在核心任务与子任务中，以信息为中心的能力占主导地位，如医学问答与决策支持和基准测试与模拟，而以行动和发现为导向的领域，如治疗规划与处方，仍然存在显著差距（约59%未实现）。

Summary / 总结

This study aims to provide a common framework for evaluating LLM-based agents in healthcare by developing a seven-dimensional taxonomy. The method involves reviewing 49 studies and mapping them to 29 sub-dimensions. Key findings include clear asymmetries in capability realization, such as high implementation of external knowledge integration but low implementation of event-triggered activation and drift detection. Multi-agent design is the dominant architectural pattern, while action and discovery-oriented tasks still show substantial gaps.

该研究通过提出一个七维度分类法来评估医疗领域的LLM代理，包括认知能力、知识管理、交互模式、适应与学习、安全与伦理、架构类型和核心任务与子任务。对49项研究的实证分析显示，虽然外部知识集成被广泛实现，但事件触发激活和漂移检测与缓解很少见，而多代理设计是主要的架构模式。信息中心型能力比行动和发现导向型能力更常见。

SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization

Authors: Jiarui Yuan, Tailin Jin, Weize Chen, Zeyuan Liu, Zhiyuan Liu, Maosong Sun

First: 2026-02-04T17:58:32+00:00 · Latest: 2026-02-04T17:58:32+00:00

Comments: Under review

Abs · PDF · Code1 · Code2 · Code3

Abstract

True self-evolution requires agents to act as lifelong learners that internalize novel experiences to solve future problems. However, rigorously measuring this foundational capability is hindered by two obstacles: the entanglement of prior knowledge, where ``new'' knowledge may appear in pre-training data, and the entanglement of reasoning complexity, where failures may stem from problem difficulty rather than an inability to recall learned knowledge. We introduce SE-Bench, a diagnostic environment that obfuscates the NumPy library and its API doc into a pseudo-novel package with randomized identifiers. Agents are trained to internalize this package and evaluated on simple coding tasks without access to documentation, yielding a clean setting where tasks are trivial with the new API doc but impossible for base models without it. Our investigation reveals three insights: (1) the Open-Book Paradox, where training with reference documentation inhibits retention, requiring "Closed-Book Training" to force knowledge compression into weights; (2) the RL Gap, where standard RL fails to internalize new knowledge completely due to PPO clipping and negative gradients; and (3) the viability of Self-Play for internalization, proving models can learn from self-generated, noisy tasks when coupled with SFT, but not RL. Overall, SE-Bench establishes a rigorous diagnostic platform for self-evolution with knowledge internalization. Our code and dataset can be found at https://github.com/thunlp/SE-Bench.

中文标题/摘要

标题：SE-Bench：基于知识内化自我进化基准测试

真正的自我进化要求代理作为终身学习者，将新经验内化以解决未来的问题。然而，由于先前知识的纠缠，即“新”知识可能出现在预训练数据中，以及推理复杂性的纠缠，即失败可能源于问题难度而非无法回忆已学知识，使得严格测量这一基础能力变得困难。我们引入了SE-Bench，这是一种诊断环境，将NumPy库及其API文档混淆成一个伪新颖的包，并使用随机标识符。代理被训练内化这个包，并在没有访问文档的情况下评估其在简单编程任务上的表现，从而形成一个干净的环境，在新API文档下任务是简单的，但对于没有它的基础模型来说是不可能的。我们的研究揭示了三个见解：（1）开卷悖论，即使用参考文档进行训练会抑制记忆，需要“闭卷训练”来迫使知识压缩到权重中；（2）RL差距，即标准RL由于PPO剪辑和负梯度无法完全内化新知识；（3）自我对弈在内化中的可行性，证明当与SFT结合时，模型可以从自我生成的、嘈杂的任务中学习，但不是RL。总体而言，SE-Bench为基于知识内化的自我进化建立了严格的诊断平台。我们的代码和数据集可以在https://github.com/thunlp/SE-Bench找到。

Summary / 总结

SE-Bench is designed to evaluate an agent's ability to self-evolve and internalize new knowledge by obfuscating the NumPy library. The study reveals that training with reference documentation hinders knowledge retention, necessitating closed-book training. It also finds that standard reinforcement learning fails to fully internalize new knowledge, while self-play combined with supervised fine-tuning is effective. This benchmark provides a clean setting to diagnose self-evolution capabilities.

SE-Bench 通过混淆 NumPy 库及其 API 文档来评估知识内化的自我进化能力。研究揭示了三个关键见解：开卷悖论、强化学习差距以及自我博弈的有效性。开卷悖论表明，使用参考文档进行训练会阻碍知识保留，需要采用闭卷训练。强化学习差距显示，标准强化学习由于 PPO 剪辑和负梯度无法完全内化新知识。当与监督微调结合时，自我博弈被证明是有效的，但与强化学习结合则无效。SE-Bench 提供了一个严格的诊断平台，用于评估自我进化能力。

Dynamic Pyramid Network for Efficient Multimodal Large Language Model

Authors: Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo

First: 2025-03-26T08:44:11+00:00 · Latest: 2026-02-04T17:56:00+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Multimodal large language models (MLLMs) have demonstrated impressive performance in various vision-language (VL) tasks, but their expensive computations still limit the real-world application. To address this issue, recent efforts aim to compress the visual features to save the computational costs of MLLMs. However, direct visual compression methods, e.g. efficient projectors, inevitably destroy the visual semantics in MLLM, especially in difficult samples. To overcome this shortcoming, we propose a novel dynamic pyramid network (DPN) for efficient MLLMs. Specifically, DPN formulates MLLM as a hierarchical structure where visual features are gradually compressed with increasing depth. In this case, even with a high compression ratio, fine-grained visual information can still be perceived in shallow layers. To maximize the benefit of DPN, we further propose an innovative Dynamic Pooling Experts (DPE) that can dynamically choose the optimal visual compression rate according to input features. With this design, harder samples will be assigned larger computations, thus preserving the model performance. To validate our approach, we conduct extensive experiments on two popular MLLMs and ten benchmarks. Experimental results show that DPN can save up to 56% average FLOPs on LLaVA while further achieving +0.74% performance gains. Besides, the generalization ability of DPN is also validated on the existing high-resolution MLLM called LLaVA-HR. The source code will be released at https://github.com/aihao2000/DPN-LLaVA.

中文标题/摘要

标题：高效多模态大型语言模型的动态金字塔网络

多模态大型语言模型（MLLMs）在各种视觉-语言（VL）任务中表现出色，但其昂贵的计算成本仍然限制了其在实际中的应用。为解决这一问题，最近的努力集中在压缩视觉特征以节省MLLMs的计算成本。然而，直接的视觉压缩方法，例如高效的投影器，不可避免地会破坏MLLM中的视觉语义，尤其是在困难样本中。为克服这一不足，我们提出了一种新颖的动态金字塔网络（DPN）以实现高效的MLLMs。具体而言，DPN将MLLM建模为一个分层结构，在深度增加时逐步压缩视觉特征。在这种情况下，即使压缩比很高，浅层中仍然可以感知到细粒度的视觉信息。为了最大化DPN的优势，我们进一步提出了一种创新的动态池化专家（DPE），可以根据输入特征动态选择最佳的视觉压缩率。通过这种设计，更难的样本将分配更多的计算量，从而保持模型性能。为了验证我们的方法，我们在两个流行的MLLMs和十个基准上进行了广泛的实验。实验结果表明，DPN可以在LLaVA上节省高达56%的平均FLOPs，同时进一步实现0.74%的性能提升。此外，DPN的泛化能力也在现有的高分辨率MLLM LLaVA-HR上得到了验证。源代码将在https://github.com/aihao2000/DPN-LLaVA上发布。

Summary / 总结

The paper proposes a Dynamic Pyramid Network (DPN) to address the computational inefficiency of multimodal large language models (MLLMs) in vision-language tasks. DPN compresses visual features hierarchically, allowing fine-grained information to be preserved in shallow layers. An innovative Dynamic Pooling Experts (DPE) is introduced to adaptively choose the compression rate based on input features. Experiments on LLaVA and LLaVA-HR show that DPN can reduce up to 56% average FLOPs while improving performance by +0.74% on ten benchmarks.

研究旨在通过解决其高计算成本来提升多模态大型语言模型（MLLMs）的效率。提出了一种动态金字塔网络（DPN），该网络通过分层压缩视觉特征，保留浅层的细粒度视觉信息。此外，还提出了一种创新的动态池化专家（DPE），可以根据输入特征动态调整压缩率。实验表明，DPN在LLaVA和LLaVA-HR上最多可节省56%的平均FLOPs，并且在各种基准测试中提高了0.74%的性能。

Beyond Rewards in Reinforcement Learning for Cyber Defence

Authors: Elizabeth Bates, Chris Hicks, Vasilios Mavroudis

First: 2026-02-04T17:55:23+00:00 · Latest: 2026-02-04T17:55:23+00:00

Abs · PDF · Code1 · Code2

Abstract

Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth evaluation approach which allows directly comparing between different reward functions, illuminating the nuanced inter-relationships between rewards, action space and the risks of suboptimal policies in cyber environments. Our results show that sparse rewards, provided they are goal aligned and can be encountered frequently, uniquely offer both enhanced training reliability and more effective cyber defence agents with lower-risk policies. Surprisingly, sparse rewards can also yield policies that are better aligned with cyber defender goals and make sparing use of costly defensive actions without explicit reward-based numerical penalties.

中文标题/摘要

标题：超越强化学习中网络防御中的奖励

近年来，使用深度强化学习训练自主网络防御代理以防御计算机网络的兴趣激增。这些代理通常在精心设计的密集奖励函数的网络健身房环境中进行训练，这些奖励函数结合了多种惩罚和激励措施，以应对各种（不）希望的状态和昂贵的操作。密集奖励有助于缓解探索复杂环境的挑战，但也可能使代理偏向次优甚至更危险的解决方案，这是复杂网络环境中一个关键问题。我们使用稀疏和密集奖励函数、两个成熟的网络健身房、不同规模的网络以及策略梯度和值基强化学习算法，全面评估了奖励函数结构对学习和策略行为特征的影响。我们的评估得益于一种新颖的基准评估方法，该方法允许直接比较不同奖励函数之间的差异，揭示了奖励、动作空间和网络环境中次优策略风险之间的复杂关系。我们的结果表明，只要它们与目标对齐并且可以频繁遇到，稀疏奖励可以提供增强的训练可靠性和具有较低风险的更有效的网络防御代理。令人惊讶的是，稀疏奖励还可以产生与网络防御者目标更一致的策略，并在没有明确基于奖励的数值惩罚的情况下节省使用昂贵的防御行动。

Summary / 总结

The paper investigates the impact of reward function structure in reinforcement learning for cyber defence, comparing dense and sparse rewards. Using various cyber gym environments and RL algorithms, the study finds that sparse rewards, when aligned with goals and frequently encountered, enhance training reliability and produce lower-risk cyber defence policies. Surprisingly, sparse rewards also lead to more goal-aligned policies that use costly defensive actions sparingly.

论文探讨了奖励函数结构对强化学习在网络安全防御代理中的影响。研究在多种网络规模和网络环境中，使用策略梯度和值基RL算法评估密集和稀疏奖励函数。研究发现，当稀疏奖励与目标对齐且频繁出现时，可以提高训练可靠性并生成更有效的、风险更低的网络安全防御策略。令人惊讶的是，稀疏奖励还能导致更符合网络安全防御者目标的策略，并在不使用显式奖励数值惩罚的情况下减少昂贵防御行动的使用。

Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning

Authors: Wolfgang Maass, Sabine Janzen, Prajvi Saxena, Sach Mukherjee

First: 2026-02-04T17:53:28+00:00 · Latest: 2026-02-04T17:53:28+00:00

Comments: 16 pages, 6 figures

Abs · PDF · Code1 · Code2

Abstract

We introduce Afferent Learning, a framework that produces Computational Afferent Traces (CATs) as adaptive, internal risk signals for damage-avoidance learning. Inspired by biological systems, the framework uses a two-level architecture: evolutionary optimization (outer loop) discovers afferent sensing architectures that enable effective policy learning, while reinforcement learning (inner loop) trains damage-avoidance policies using these signals. This formalizes afferent sensing as providing an inductive bias for efficient learning: architectures are selected based on their ability to enable effective learning (rather than directly minimizing damage). We provide theoretical convergence guarantees under smoothness and bounded-noise assumptions. We illustrate the general approach in the challenging context of biomechanical digital twins operating over long time horizons (multiple decades of the life-course). Here, we find that CAT-based evolved architectures achieve significantly higher efficiency and better age-robustness than hand-designed baselines, enabling policies that exhibit age-dependent behavioral adaptation (23% reduction in high-risk actions). Ablation studies validate CAT signals, evolution, and predictive discrepancy as essential. We release code and data for reproducibility.

中文标题/摘要

标题：进化传入架构：生物启发的损伤避免学习模型

我们引入了传入学习框架，该框架产生计算传入痕迹（CATs），作为适应性的内部风险信号，用于损伤避免学习。该框架受到生物系统的启发，采用两层架构：进化优化（外层循环）发现能够有效学习策略的传入感知架构，而强化学习（内层循环）则使用这些信号训练损伤避免策略。这将传入感知形式化为提供高效学习的归纳偏置：架构是基于其能够促进有效学习的能力来选择的（而不是直接最小化损伤）。在平滑性和有界噪声假设下，我们提供了理论收敛保证。我们通过生物力学数字孪生在长时间跨度（生命历程的多个十年）的挑战性背景下展示了该通用方法。在这里，我们发现基于CAT的进化架构在效率和年龄稳健性方面显著优于手工设计的基线，使策略能够表现出年龄相关的行为适应（高风险行为减少23%）。消融研究验证了CAT信号、进化和预测差异的必要性。我们发布了代码和数据以实现可重复性。

Summary / 总结

The research introduces Afferent Learning, a framework that generates Computational Afferent Traces (CATs) for damage-avoidance learning, inspired by biological systems. It employs an evolutionary optimization process to discover effective afferent sensing architectures, which are then used by reinforcement learning to train damage-avoidance policies. The study demonstrates that CAT-based evolved architectures outperform hand-designed baselines in biomechanical digital twins, achieving higher efficiency and better age-robustness, with a 23% reduction in high-risk actions. Ablation studies confirm the importance of CAT signals, evolution, and predictive discrepancy for the framework's success.

研究引入了Afferent Learning框架，生成计算性传入痕迹（CATs）作为适应性的风险信号，用于损伤避免学习。该框架使用外环的进化优化来发现有效的传入感知架构，然后在内环中使用强化学习进行训练。研究显示，CAT基进化的架构在生物力学数字孪生中优于手工设计的基线，实现更高的效率和更好的年龄稳健性，高风险行为减少23%。消融研究证实了CAT信号、进化和预测差异的重要性。

Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging

Authors: Jia-peng Zhang, Cheng-Feng Pu, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu

First: 2026-02-04T17:52:17+00:00 · Latest: 2026-02-04T17:52:17+00:00

Comments: 14 pages, 10 figures

Abs · PDF · Code1 · Code2

Abstract

The rapid proliferation of generative 3D models has created a critical bottleneck in animation pipelines: rigging. Existing automated methods are fundamentally limited by their approach to skinning, treating it as an ill-posed, high-dimensional regression task that is inefficient to optimize and is typically decoupled from skeleton generation. We posit this is a representation problem and introduce SkinTokens: a learned, compact, and discrete representation for skinning weights. By leveraging an FSQ-CVAE to capture the intrinsic sparsity of skinning, we reframe the task from continuous regression to a more tractable token sequence prediction problem. This representation enables TokenRig, a unified autoregressive framework that models the entire rig as a single sequence of skeletal parameters and SkinTokens, learning the complicated dependencies between skeletons and skin deformations. The unified model is then amenable to a reinforcement learning stage, where tailored geometric and semantic rewards improve generalization to complex, out-of-distribution assets. Quantitatively, the SkinTokens representation leads to a 98%-133% percents improvement in skinning accuracy over state-of-the-art methods, while the full TokenRig framework, refined with RL, enhances bone prediction by 17%-22%. Our work presents a unified, generative approach to rigging that yields higher fidelity and robustness, offering a scalable solution to a long-standing challenge in 3D content creation.

中文标题/摘要

标题：皮肤代币：统一自回归布线的学习紧凑表示

生成的3D模型的迅速增长在动画流水线中造成了关键瓶颈：布线。现有的自动化方法在布线方面受到其方法的限制，将其视为一个病态的、高维的回归任务，优化效率低下，通常与骨架生成脱钩。我们认为这是一个表示问题，并引入了SkinTokens：一种学习的、紧凑的和离散的布线权重表示。通过利用FSQ-CVAE捕捉布线的固有稀疏性，我们将任务重新定义为更易于处理的标记序列预测问题。这种表示使TokenRig成为一种统一的自回归框架，可以将整个布线建模为单一的骨骼参数和SkinTokens序列，学习骨骼和皮肤变形之间的复杂依赖关系。统一模型随后可以适应强化学习阶段，其中定制的几何和语义奖励可以提高对复杂、分布外资产的泛化能力。定量上，SkinTokens表示在布线准确性上比最先进的方法提高了98%-133%，而经过强化学习优化的完整TokenRig框架提高了骨骼预测17%-22%。我们的工作提供了一种统一的、生成的布线方法，具有更高的保真度和鲁棒性，为3D内容创作中的长期挑战提供了一个可扩展的解决方案。

Summary / 总结

The paper addresses the challenge of rigging in 3D animation pipelines by introducing SkinTokens, a learned compact representation for skinning weights. By using an FSQ-CVAE to capture the intrinsic sparsity of skinning, the authors reframe the skinning task as a token sequence prediction problem, enabling a unified autoregressive framework called TokenRig. This framework models the entire rig as a sequence of skeletal parameters and SkinTokens, improving the modeling of complex dependencies. The TokenRig framework, further refined with reinforcement learning, significantly enhances bone prediction accuracy by 17%-22% and improves skinning accuracy by 98%-133% compared to state-of-the-art methods.

该论文通过引入SkinTokens，一种用于皮肤权重的紧凑表示方法，解决了3D动画流水线中的布线瓶颈问题。通过使用FSQ-CVAE捕获皮肤的内在稀疏性，该方法将皮肤任务转化为一个标记序列预测问题，从而构建了一个统一的自回归框架TokenRig。TokenRig框架通过强化学习增强后，提高了骨骼预测和皮肤变形的准确性。定量结果显示，与最先进的方法相比，SkinTokens在皮肤变形准确性上提高了98%-133%，并且在骨骼预测准确性上提高了17%-22%。

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Authors: Qing'an Liu, Juntong Feng, Yuhao Wang, Xinzhe Han, Yujie Cheng, Yue Zhu, Haiwen Diao, Yunzhi Zhuge, Huchuan Lu

First: 2026-02-04T17:48:55+00:00 · Latest: 2026-02-04T17:48:55+00:00

Comments: 27 pages, 19 figures

Abs · PDF · Code1 · Code2 · Code3

Abstract

Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also frequently appears as visualized text embedded in images, raising the question of whether current VLMs handle such input requests comparably. We introduce VISTA-Bench, a systematic benchmark from multimodal perception, reasoning, to unimodal understanding domains. It evaluates visualized text understanding by contrasting pure-text and visualized-text questions under controlled rendering conditions. Extensive evaluation of over 20 representative VLMs reveals a pronounced modality gap: models that perform well on pure-text queries often degrade substantially when equivalent semantic content is presented as visualized text. This gap is further amplified by increased perceptual difficulty, highlighting sensitivity to rendering variations despite unchanged semantics. Overall, VISTA-Bench provides a principled evaluation framework to diagnose this limitation and to guide progress toward more unified language representations across tokenized text and pixels. The source dataset is available at https://github.com/QingAnLiu/VISTA-Bench.

中文标题/摘要

标题：VISTA-Bench：视觉语言模型真的能像纯文本一样理解可视化文本吗？

视觉语言模型（VLMs）在跨模态理解文本和视觉输入方面取得了令人印象深刻的性能，但现有的基准测试主要集中在纯文本查询上。在现实世界中，语言也经常以嵌入在图像中的可视化文本形式出现，这引发了这样一个问题：当前的VLMs是否能够以类似的方式处理此类输入请求。我们引入了VISTA-Bench，这是一个从多模态感知、推理到单模态理解领域的系统基准测试。它通过在受控渲染条件下对比纯文本和可视化文本问题来评估可视化文本理解。对超过20个代表性VLMs的广泛评估揭示了一个明显的模态差距：在纯文本查询上表现良好的模型在等效语义内容以可视化文本形式呈现时往往会大幅退化。随着感知难度的增加，这一差距进一步扩大，突显了尽管语义不变，对渲染变化的敏感性。总体而言，VISTA-Bench提供了一个原则性的评估框架，用于诊断这一局限性，并指导朝着更统一的语言表示方向的进步。源数据集可在https://github.com/QingAnLiu/VISTA-Bench获取。

Summary / 总结

VISTA-Bench evaluates the ability of Vision-Language Models (VLMs) to understand visualized text compared to pure text, revealing a significant modality gap where models perform well on pure-text queries but degrade when the same semantic content is presented as visualized text. The benchmark includes controlled rendering conditions to assess multimodal perception, reasoning, and unimodal understanding, and it highlights the sensitivity of VLMs to rendering variations. This work provides a principled framework for diagnosing and addressing this limitation in VLMs.

VISTA-Bench 评估了视觉语言模型（VLMs）在理解视觉化文本与纯文本之间的性能差异。该基准测试模型在不同领域中，对比纯文本和视觉化文本问题，在受控条件下进行测试。研究发现，那些在纯文本查询中表现优异的模型，在相同语义内容以视觉化文本形式呈现时，往往会表现不佳，尤其是在感知难度增加的情况下。这强调了需要在标记文本和像素之间实现更统一的语言表示的必要性。

Verification and Identification in ECG biometric on large-scale

Authors: Scagnetto Arjuna

First: 2026-02-02T20:30:35+00:00 · Latest: 2026-02-04T17:47:07+00:00

Abs · PDF · Code1 · Code2

Abstract

This work studies electrocardiogram (ECG) biometrics at large scale, directly addressing a critical gap in the literature: the scarcity of large-scale evaluations with operational metrics and protocols that enable meaningful standardization and comparison across studies. We show that identity information is already present in tabular representations (fiducial features): even a simple MLP-based embedding network yields non-trivial performance, establishing a strong baseline before waveform modeling. We then adopt embedding-based deep learning models (ArcFace), first on features and then on ECG waveforms, showing a clear performance jump when moving from tabular inputs to waveforms, and a further gain with larger training sets and consistent normalization across train/val/test. On a large-scale test set, verification achieves high TAR at strict FAR thresholds (TAR=0.908 @ FAR=1e-3; TAR=0.820 @ FAR=1e-4) with EER=2.53\% (all-vs-all); closed-set identification yields Rank@1=0.812 and Rank@10=0.910. In open-set, a two-stage pipeline (top-$K$ shortlist on embeddings + re-ranking) reaches DIR@FAR up to 0.976 at FAR=1e-3 and 1e-4. Overall, the results show that ECG carries a measurable individual signature and that large-scale testing is essential to obtain realistic, comparable metrics. The study provides an operationally grounded benchmark that helps standardize evaluation across protocols.

中文标题/摘要

标题：ECG生物特征识别与验证在大规模应用中的研究

本研究探讨了大规模心电图（ECG）生物特征识别，直接填补了文献中的关键空白：大规模评估中操作性指标和协议的稀缺性，这些指标和协议能够实现有意义的标准和比较。我们展示了身份信息已经存在于表格表示（关键特征）中：即使简单的基于MLP的嵌入网络也能获得非平凡的性能，从而在波形建模之前建立了强大的基线。然后我们采用基于嵌入的深度学习模型（ArcFace），首先在特征上，然后在ECG波形上，当从表格输入转换为波形时，性能有了明显的提升，并且随着更大训练集和训练/验证/测试中的一致归一化，进一步提高了性能。在大规模测试集上，验证在严格的FAR阈值下实现了高TAR（TAR=0.908 @ FAR=1e-3；TAR=0.820 @ FAR=1e-4）且EER=2.53%（全对全）；闭集识别的Rank@1=0.812，Rank@10=0.910。在开放集情况下，两阶段管道（前K短列表嵌入+重新排序）在FAR=1e-3和1e-4时达到了DIR@FAR高达0.976。总体而言，结果表明ECG携带可测量的个体特征，大规模测试对于获得现实且可比的指标至关重要。该研究提供了一个操作性基础的基准，有助于标准化不同协议的评估。

Summary / 总结

This work addresses the lack of large-scale evaluations in ECG biometrics by introducing operational metrics and protocols. Using a simple MLP-based embedding network, the study establishes a strong baseline, which is further improved by embedding-based deep learning models like ArcFace. On a large-scale test set, verification achieves high true acceptance rates (TAR) at low false acceptance rates (FAR), with EER at 2.53%. Identification yields good Rank@1 and Rank@10 scores, and an open-set two-stage pipeline achieves high DIR@FAR values. The study emphasizes the importance of large-scale testing for realistic and comparable metrics in ECG biometrics.

这项工作通过引入操作性指标和协议，解决了ECG生物识别领域大规模评估的稀缺性问题。使用简单的MLP嵌入网络建立了身份验证的基准。通过使用基于嵌入的深度学习模型（ArcFace）进行波形建模，性能显著提升，特别是在使用更大规模的训练集和一致的归一化处理后。在大规模测试集上，验证实现了在低误接受率（FAR）下的高真接受率（TAR），EER为2.53%。识别任务达到了具有竞争力的Rank@1和Rank@10得分，开放集中的两阶段管道在低FAR下的检测识别率（DIR）也达到了很高水平。研究强调了大规模测试对于获得现实且可比的指标的重要性。

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Authors: Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang

First: 2026-02-04T17:41:53+00:00 · Latest: 2026-02-04T17:41:53+00:00

Comments: 14 pages, 7 figures

Abs · PDF · Code1 · Code2 · Code3

Abstract

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the \textit{first} sparse attention solution tailored for AR video generation models. It incorporates a \textit{Chunk-Aware Growth} mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a \textit{Hierarchical Sparse Attention} to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (\ie, frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (\eg, 84.5 on VBench) and efficiency (\eg, $1.2{\sim}1.3\times$ end-to-end speedup). Combined with FP8 quantization and LightVAE, \textsc{Light Forcing} further achieves a $2.3\times$ speedup and 19.7\,FPS on an RTX~5090 GPU. Code will be released at \href{https://github.com/chengtao-lv/LightForcing}{https://github.com/chengtao-lv/LightForcing}.

中文标题/摘要

标题：轻量强迫：通过稀疏注意加速自回归视频扩散

先进的自回归（AR）视频生成模型提高了视觉保真度和交互性，但注意力的二次复杂性仍然是高效部署的主要瓶颈。尽管现有的稀疏注意力解决方案在双向模型上显示出前景，但我们发现将其应用于AR模型会导致显著的性能下降，原因有两个：块生成的孤立考虑和过去信息性上下文的不足利用。受这些观察的启发，我们提出了\textsc{Light Forcing}，这是\textit{首个}针对AR视频生成模型的稀疏注意力解决方案。它引入了\textit{块感知增长}机制，定量估计每个块的贡献，从而确定其稀疏分配。这种渐进稀疏增加策略使当前块在生成过程中能够继承早期块的知识。此外，我们引入了\textit{分层稀疏注意力}，以粗到细的方式捕捉历史和局部上下文信息。这种两层掩码选择策略（即帧和块级别）能够适应各种注意力模式。大量实验表明，我们的方法在质量和效率上都优于现有的稀疏注意力（例如，在VBench上为84.5，在端到端速度上为1.2至1.3倍的加速）。结合FP8量化和LightVAE，\textsc{Light Forcing}在RTX~5090 GPU上实现了2.3倍的加速和19.7 FPS。代码将在\href{https://github.com/chengtao-lv/LightForcing}{https://github.com/chengtao-lv/LightForcing}发布。

Summary / 总结

The research aims to address the efficiency bottleneck in autoregressive (AR) video generation models by proposing Light Forcing, a novel sparse attention solution. It introduces a Chunk-Aware Growth mechanism and Hierarchical Sparse Attention to improve the utilization of past context and handle diverse attention patterns. Experiments show that Light Forcing outperforms existing methods in both quality and efficiency, achieving up to 2.3 times speedup and 19.7 FPS on an RTX 5090 GPU.

Light Forcing 是一种针对自回归视频生成模型的稀疏注意力解决方案，旨在解决现有稀疏注意力方法应用于这些模型时导致的性能下降问题。它引入了 Chunk-Aware Growth 机制和 Hierarchical Sparse Attention 来提高视频生成的效率和质量。实验表明，Light Forcing 在质量和效率上均优于现有方法，实现了 1.2 到 1.3 倍的加速，并且结合 FP8 量化和 LightVAE 后，可在 RTX 5090 GPU 上达到 19.7 FPS。

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Authors: Dianyi Wang, Chaofan Ma, Feng Han, Size Wu, Wei Song, Yibin Wang, Zhixiong Zhang, Tianhang Wang, Siyuan Wang, Zhongyu Wei, Jiaqi Wang

First: 2026-02-02T18:34:35+00:00 · Latest: 2026-02-04T17:38:12+00:00

Abs · PDF · Code1 · Code2

Abstract

Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rather than interconnected reasoning steps. To address this, we propose UniReason, a unified framework that harmonizes these two tasks through two complementary reasoning paradigms. We incorporate world knowledge-enhanced textual reasoning into generation to infer implicit knowledge, and leverage editing capabilities for fine-grained editing-like visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared architecture, mirroring the human cognitive process of planning followed by refinement. We support this framework by systematically constructing a large-scale reasoning-centric dataset (~300k samples) covering five major knowledge domains (e.g., cultural commonsense, physics, etc.) for textual reasoning, alongside an agent-generated corpus for visual refinement. Extensive experiments demonstrate that UniReason achieves advanced performance on reasoning-intensive benchmarks such as WISE, KrisBench and UniREditBench, while maintaining superior general synthesis capabilities.

中文标题/摘要

标题：UniReason 1.0：统一的知识对齐图像生成与编辑推理框架

统一的多模态模型在处理需要深入推理的复杂合成任务时往往表现不佳，通常将文本到图像生成和图像编辑视为孤立的能力，而不是相互关联的推理步骤。为了解决这个问题，我们提出了UniReason，这是一种通过两种互补的推理范式将这两种任务统一起来的框架。我们通过增强文本推理来生成，以推断隐含知识，并利用编辑能力进行精细的视觉修正，通过自我反思进一步纠正视觉错误。这种方法在共享架构中统一了生成和编辑，类似于人类认知过程中的计划和改进。我们通过系统地构建一个大规模的以推理为中心的数据集（约30万样本），涵盖了五个主要的知识领域（例如，文化常识、物理等）来支持文本推理，以及一个代理生成的语料库来支持视觉改进。广泛的实验表明，UniReason在如WISE、KrisBench和UniREditBench等推理密集型基准测试中实现了先进的性能，同时保持了优越的一般合成能力。

Summary / 总结

The paper introduces UniReason 1.0, a unified framework that combines text-to-image generation and image editing through two reasoning paradigms, incorporating world knowledge-enhanced textual reasoning and leveraging editing capabilities for visual refinement. The framework demonstrates superior performance on reasoning-intensive benchmarks and maintains strong general synthesis capabilities.

UniReason 1.0 是一个统一框架，通过两种推理范式将文本到图像生成和图像编辑结合起来，增强模型处理复杂合成任务的能力。通过引入增强文本推理和利用编辑能力进行视觉细化，UniReason 在推理密集型基准测试中表现出色，同时保持强大的通用合成能力。该框架得到了涵盖五个主要知识领域的大型数据集和一个代理生成的语料库的支持，用于视觉细化。

Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias

Authors: Philip A. LeMaitre, Marius Krumm, Hans J. Briegel

Venue: Artificial Intelligence 352, 104489 (2026)

First: 2024-02-15T18:48:32+00:00 · Latest: 2026-02-04T17:34:48+00:00

Comments: 41 pages, 9 figures; Code repository at https://github.com/MariusKrumm/ManyBodyMEPS. Updated to be consistent with AIJ version

Abs · PDF · Code1 · Code2 · Code3

Abstract

With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.

中文标题/摘要

标题：基于多体物理启发式偏置的多激发投影模拟

随着深度学习的飞速发展，依赖机器学习的应用程序越来越多地融入日常生活。然而，大多数深度学习模型具有不透明的黑箱性质，使其难以解释和理解其决策。这一问题导致了可解释人工智能（XAI）这一领域的出现。该领域中的一种方法，即投影模拟（PS），将思维链描述为粒子在具有概念节点的图上的随机游走。虽然这种方法具有多种优势，包括量化可能性，但它无法自然地用于同时组合多个概念的思维建模。为克服这一限制，我们引入了多激发投影模拟（mePS），这是一种将思维链视为多个粒子在超图上的随机游走的一般化方法。提出了动态超图的定义来描述代理的训练历史，并将其应用于人工智能和超图可视化。我们从量子多体物理中极其成功的少数体相互作用模型中汲取灵感，为我们的经典mePS框架形式化了一种归纳偏置，并将其应用于解决朴素超图实现的指数复杂性问题。我们证明，我们的归纳偏置将复杂性从指数级降低到多项式级，指数表示可以相互作用的粒子数量的截止值。我们通过两个玩具环境和一个更复杂的场景（模拟计算机故障诊断）对我们的方法进行了数值应用，这些环境展示了适当选择归纳偏置带来的资源节省，并展示了可解释性方面的特点。我们还简要概述了mePS的量子模型，并讨论了其未来方向。

Summary / 总结

The research aims to address the interpretability issue in deep learning models by developing a new method called Multi-Excitation Projective Simulation (mePS). This method models a chain-of-thought as a random walk of multiple particles on a hypergraph, overcoming the limitation of traditional Projective Simulation which cannot naturally handle simultaneous concept combinations. The key finding is that an inductive bias inspired by quantum many-body physics reduces the complexity from exponential to polynomial, making the model more efficient. The method is numerically validated in two toy environments and a complex scenario, demonstrating resource savings and interpretability benefits.

研究旨在通过提出Multi-Excitation Projective Simulation (mePS) 来解决深度学习模型的可解释性问题，该方法将传统的Projective Simulation (PS) 扩展到通过超图上的随机行走处理多概念思考。该方法引入了受量子物理中成功的小范围相互作用模型启发的归纳偏置，将超图的复杂性从指数级降低到多项式级。在玩具和复杂场景上的实验显示，mePS 可以提供资源节省并增强可解释性。还简要讨论了 mePS 的量子模型及其未来方向。

Self-Improving Pretraining: using post-trained models to pretrain better models

Authors: Ellen Xiaoqing Tan, Shehzaad Dhuliawala, Jing Xu, Ping Yu, Sainbayar Sukhbaatar, Jason Weston, Olga Golovneva

First: 2026-01-29T07:09:30+00:00 · Latest: 2026-02-04T17:31:10+00:00

Abs · PDF · Code1 · Code2

Abstract

Ensuring safety, factuality and overall quality in the generations of large language models is a critical challenge, especially as these models are increasingly deployed in real-world applications. The prevailing approach to addressing these issues involves collecting expensive, carefully curated datasets and applying multiple stages of fine-tuning and alignment. However, even this complex pipeline cannot guarantee the correction of patterns learned during pretraining. Therefore, addressing these issues during pretraining is crucial, as it shapes a model's core behaviors and prevents unsafe or hallucinated outputs from becoming deeply embedded. To tackle this issue, we introduce a new pretraining method that streams documents and uses reinforcement learning (RL) to improve the next K generated tokens at each step. A strong, post-trained model judges candidate generations -- including model rollouts, the original suffix, and a rewritten suffix -- for quality, safety, and factuality. Early in training, the process relies on the original and rewritten suffixes; as the model improves, RL rewards high-quality rollouts. This approach builds higher quality, safer, and more factual models from the ground up. In experiments, our method gives 36.2% and 18.5% relative improvements over standard pretraining in terms of factuality and safety, and up to 86.3% win rate improvements in overall generation quality.

中文标题/摘要

标题：自我提升预训练：使用后训练模型预训练更好的模型

在生成大规模语言模型时确保安全、事实性和整体质量是一个关键挑战，尤其是在这些模型越来越多地应用于实际应用中。目前解决这些问题的方法是收集昂贵的、精心策划的数据集，并应用多个阶段的微调和对齐。然而，即使这个复杂的流水线也不能保证纠正预训练期间学到的模式。因此，在预训练阶段解决这些问题至关重要，因为这塑造了模型的核心行为，并防止了不安全或虚构的输出变得根深蒂固。为了解决这一问题，我们提出了一种新的预训练方法，该方法逐文档流式传输，并使用强化学习（RL）在每一步改进生成的下一个K个标记。一个强大的、后训练的模型评估候选生成物——包括模型的展开、原始后缀和重写后的后缀——的质量、安全性和事实性。在训练初期，该过程依赖于原始后缀和重写后的后缀；随着模型的改进，RL奖励高质量的展开。这种方法从基础开始构建更高质量、更安全和更事实性的模型。在实验中，我们的方法在事实性和安全性方面分别比标准预训练提高了36.2%和18.5%，并在整体生成质量方面最高提高了86.3%的胜率。

Summary / 总结

The research aims to improve the safety, factuality, and overall quality of large language models by addressing issues during pretraining. It introduces a new method that uses reinforcement learning to iteratively refine the next K tokens generated at each step. A post-trained model evaluates candidate generations for quality, safety, and factuality. Experiments show a 36.2% and 18.5% relative improvement in factuality and safety, and up to an 86.3% win rate improvement in overall generation quality.

研究旨在通过在预训练阶段解决问题来提高大型语言模型的安全性、事实性和整体质量。它提出了一种新方法，使用强化学习生成和评估候选文本，并由一个后训练模型判断生成文本的质量、安全性和事实性。实验结果显示，与标准预训练方法相比，在事实性和安全性方面分别提高了36.2%和18.5%，整体生成质量的胜率提高了高达86.3%。

Legendre Memory Unit with A Multi-Slice Compensation Model for Short-Term Wind Speed Forecasting Based on Wind Farm Cluster Data

Authors: Mumin Zhang, Haochen Zhang, Xin Zhi Khoo, Yilin Zhang, Nuo Chen, Ting Zhang, Junjie Tang

First: 2026-02-04T17:28:42+00:00 · Latest: 2026-02-04T17:28:42+00:00

Comments: 10 pages, 11 figures,

Abs · PDF · Code1 · Code2

Abstract

With more wind farms clustered for integration, the short-term wind speed prediction of such wind farm clusters is critical for normal operation of power systems. This paper focuses on achieving accurate, fast, and robust wind speed prediction by full use of cluster data with spatial-temporal correlation. First, weighted mean filtering (WMF) is applied to denoise wind speed data at the single-farm level. The Legendre memory unit (LMU) is then innovatively applied for the wind speed prediction, in combination with the Compensating Parameter based on Kendall rank correlation coefficient (CPK) of wind farm cluster data, to construct the multi-slice LMU (MSLMU). Finally, an innovative ensemble model WMF-CPK-MSLMU is proposed herein, with three key blocks: data pre-processing, forecasting, and multi-slice compensation. Advantages include: 1) LMU jointly models linear and nonlinear dependencies among farms to capture spatial-temporal correlations through backpropagation; 2) MSLMU enhances forecasting by using CPK-derived weights instead of random initialization, allowing spatial correlations to fully activate hidden nodes across clustered wind farms.; 3) CPK adaptively weights the compensation model in MSLMU and complements missing data spatially, to facilitate the whole model highly accurate and robust. Test results on different wind farm clusters indicate the effectiveness and superiority of proposed ensemble model WMF-CPK-MSLMU in the short-term prediction of wind farm clusters compared to the existing models.

中文标题/摘要

标题：基于风电场集群数据的短时风速预测的勒让德记忆单元与多片补偿模型

随着越来越多的风电场集群化并网，风电场集群的短时风速预测对于电力系统的正常运行至关重要。本文旨在充分利用具有空间-时间相关性的集群数据，实现准确、快速和鲁棒的风速预测。首先，应用加权平均滤波（WMF）对单风电场的风速数据进行去噪。然后，创新地应用勒让德记忆单元（LMU）结合基于风场集群数据肯德尔秩相关系数的补偿参数（CPK）进行风速预测，构建多片LMU（MSLMU）。最后，提出了一种创新的集成模型WMF-CPK-MSLMU，包含三个关键模块：数据预处理、预测和多片补偿。优点包括：1）LMU通过反向传播联合建模农场间的线性和非线性依赖关系，捕捉空间-时间相关性；2）MSLMU通过使用CPK衍生的权重而非随机初始化增强预测，使空间相关性完全激活集群风电场中的隐藏节点；3）CPK在MSLMU中自适应加权补偿模型，填补空间上缺失的数据，使整个模型更加准确和鲁棒。在不同风电场集群上的测试结果表明，WMF-CPK-MSLMU集成模型在风电场集群的短时预测中比现有模型更有效和优越。

Summary / 总结

This paper addresses the need for accurate short-term wind speed prediction for wind farm clusters to ensure power system operations. It introduces a novel ensemble model, WMF-CPK-MSLMU, which combines weighted mean filtering, a compensating parameter based on Kendall rank correlation coefficient, and a multi-slice Legendre memory unit. The model effectively captures spatial-temporal correlations and enhances forecasting accuracy and robustness. Experimental results show that this model outperforms existing methods in predicting wind farm clusters.

本文旨在通过风场集群数据准确预测短期风速，以确保电力系统的正常运行。提出了一种新颖的集成模型WMF-CPK-MSLMU，结合了加权平均滤波、基于肯德尔等级相关系数的补偿参数和多片Legendre记忆单元。该模型能够有效捕捉空间-时间相关性，提高预测准确性和鲁棒性。实验结果表明，该模型在预测风场集群方面优于现有方法。

Dynamical Regimes of Multimodal Diffusion Models

Authors: Emil Albrychiewicz, Andrés Franco Valiente, Li-Ching Chen

First: 2026-02-04T17:16:12+00:00 · Latest: 2026-02-04T17:16:12+00:00

Comments: 40 pages, 14 figures

Abs · PDF · Code1 · Code2

Abstract

Diffusion based generative models have achieved unprecedented fidelity in synthesizing high dimensional data, yet the theoretical mechanisms governing multimodal generation remain poorly understood. Here, we present a theoretical framework for coupled diffusion models, using coupled Ornstein-Uhlenbeck processes as a tractable model. By using the nonequilibrium statistical physics of dynamical phase transitions, we demonstrate that multimodal generation is governed by a spectral hierarchy of interaction timescales rather than simultaneous resolution. A key prediction is the ``synchronization gap'', a temporal window during the reverse generative process where distinct eigenmodes stabilize at different rates, providing a theoretical explanation for common desynchronization artifacts. We derive analytical conditions for speciation and collapse times under both symmetric and anisotropic coupling regimes, establishing strict bounds for coupling strength to avoid unstable symmetry breaking. We show that the coupling strength acts as a spectral filter that enforces a tunable temporal hierarchy on generation. We support these predictions through controlled experiments with diffusion models trained on MNIST datasets and exact score samplers. These results motivate time dependent coupling schedules that target mode specific timescales, offering a potential alternative to ad hoc guidance tuning.

中文标题/摘要

标题：多模态扩散模型的动力学机制

基于扩散的生成模型在合成高维数据方面取得了前所未有的保真度，但多模态生成的理论机制仍然知之甚少。在这里，我们提出了一种耦合扩散模型的理论框架，使用耦合的Ornstein-Uhlenbeck过程作为可处理的模型。通过非平衡统计物理中的动力学相变，我们证明了多模态生成受相互作用时间尺度的谱层次支配，而不是同时解决。一个关键预测是“同步间隙”，在生成过程的反向过程中，不同的特征模态以不同的速率稳定，提供了对常见的脱同步现象的理论解释。我们推导了在对称和各向异性耦合条件下分异和崩溃时间的解析条件，建立了耦合强度避免不稳定对称破坏的严格界限。我们表明，耦合强度充当了一个谱过滤器，对生成过程施加可调的时间层次结构。我们通过在MNIST数据集上训练的扩散模型和精确分数采样器进行的受控实验支持了这些预测。这些结果促使了针对特定模态时间尺度的时间依赖耦合调度，提供了一种潜在的替代自适应指导调优的方法。

Summary / 总结

The research aims to understand the theoretical mechanisms of multimodal generation in diffusion-based generative models. The authors use coupled Ornstein-Uhlenbeck processes to develop a theoretical framework and demonstrate that multimodal generation is governed by a spectral hierarchy of interaction timescales. Key findings include the existence of a 'synchronization gap' during the reverse generative process, which explains desynchronization artifacts. The study also derives analytical conditions for coupling strength and shows that it acts as a spectral filter, enforcing a tunable temporal hierarchy on generation, supported by experiments on MNIST datasets and exact score samplers.

研究旨在理解基于扩散的生成模型中多模态生成的理论机制。通过使用耦合的Ornstein-Uhlenbeck过程进行分析，发现多模态生成由交互时间尺度的光谱层次结构所支配。主要发现包括在反向生成过程中存在的‘同步缺口’，这解释了脱同步现象。研究还提供了避免不稳定对称破坏的耦合强度的分析条件，并建议使用时间依赖的耦合调度以实现更好的模态特定生成。

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Authors: Yuqi Li, Matthew M. Engelhard

First: 2026-02-04T17:12:04+00:00 · Latest: 2026-02-04T17:12:04+00:00

Abs · PDF · Code1 · Code2

Abstract

In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.

中文标题/摘要

标题：基于区间的方法AUC (iAUC): 将ROC分析扩展到不确定性感知分类

在高风险预测中，通过区间值预测量化不确定性对于可靠决策至关重要。然而，标准评估工具如受试者操作特征（ROC）曲线和曲线下面积（AUC）是为点分数设计的，无法捕捉预测不确定性对排名性能的影响。我们提出了一种专门针对区间值预测的不确定性感知ROC框架，引入了两个新的度量：$AUC_L$ 和 $AUC_U$。该框架允许ROC平面的有信息性的三区域分解，将成对排名划分为正确的、错误的和不确定的排序。这种方法自然支持选择性预测，允许模型在区间重叠的情况下放弃排名，从而优化弃权率与判别可靠性的权衡。我们证明，在有效的类条件覆盖下，$AUC_L$ 和 $AUC_U$ 提供了理论最优AUC ($AUC^*$) 的正式下界和上界，表征了可实现的判别能力的物理极限。所提出的框架广泛适用于区间值预测模型，无论区间构建方法如何。基于自助法的区间在真实基准数据集上的实验验证了该框架的正确性，并展示了其在不确定性感知评估和决策中的实际用途。

Summary / 总结

The paper introduces the Interval-Based AUC (iAUC) framework to evaluate interval-valued predictions in high-stakes risk prediction tasks. It proposes $AUC_L$ and $AUC_U$ measures to quantify the impact of predictive uncertainty on ranking performance. The framework decomposes the ROC plane into three regions, supporting selective prediction and optimizing the trade-off between abstention rate and discriminative reliability. Experiments on real-world datasets confirm the framework's effectiveness in evaluating uncertainty-aware classification.

论文提出了基于区间预测的AUC（iAUC）方法，以评估高风险预测中的区间预测，解决了传统ROC分析的局限性。它提出了$AUC_L$和$AUC_U$来量化预测不确定性的影响，并使ROC平面分解为三个区域。实验结果证实了该框架在评估和优化弃权率与区分可靠性之间的权衡方面的有效性。

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Authors: Blake Bordelon, Francesco Mori

First: 2026-02-04T17:11:36+00:00 · Latest: 2026-02-04T17:11:36+00:00

Abs · PDF · Code1 · Code2

Abstract

Setting the learning rate for a deep learning model is a critical part of successful training, yet choosing this hyperparameter is often done empirically with trial and error. In this work, we explore a solvable model of optimal learning rate schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the total training horizon. This schedule is computed both numerically and analytically (when possible) using optimal control methods. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant (in $T$) initial learning rate and annealing performed over a vanishing (in $T$) fraction of training steps. We investigate joint optimization of learning rate and batch size, identifying a degenerate optimality condition. Our model also predicts the compute-optimal scaling laws (where model size and training steps are chosen optimally) in both easy and hard regimes. Going beyond SGD, we consider optimal schedules for the momentum $β(t)$, where speedups in the hard phase are possible. We compare our optimal schedule to various benchmarks in our task including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that learning rate transfer across training horizon depends on the structure of the model and task. We explore these ideas in simple experimental pretraining setups.

Summary / 总结

This work investigates the optimal learning rate schedules for a random feature model trained with stochastic gradient descent (SGD). The study reveals two regimes: the easy phase with a polynomial decay schedule and the hard phase with a warmup-stable-decay schedule. The theory predicts better performance than constant or power law learning rates and suggests that learning rate transfer depends on the model and task structure. Experiments show that the proposed schedule outperforms benchmarks in both easy and hard regimes.

该研究探讨了使用随机梯度下降（SGD）训练的幂律随机特征模型的最优学习率调度。研究识别了两种不同的阶段，即容易阶段和困难阶段，每种阶段具有不同的最优学习率调度。在容易阶段，最优调度为多项式衰减，而在困难阶段，它类似于暖启动-稳定衰减。模型还预测了计算最优的缩放定律，并表明学习率转移取决于模型和任务结构。实验表明，提出的调度方案优于恒定和幂律学习率。

Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

Authors: Puyu Wang, Junyu Zhou, Philipp Liznerski, Marius Kloft

First: 2026-01-29T23:43:26+00:00 · Latest: 2026-02-04T17:03:18+00:00

Comments: 41 pages, 3 figures

Abs · PDF · Code1 · Code2

Abstract

Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ε,δ)$-DP and obtain a utility bound of order $\sqrt{d}/(nε)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.

中文标题/摘要

标题：优化、泛化和Kolmogorov-Arnold网络的差分隐私边界

Kolmogorov--Arnold网络（KANs）最近作为一种结构化的替代标准MLPs出现，但对其训练动力学、泛化和隐私属性的原理性理论仍然有限。在本文中，我们分析了训练两层KANs的梯度下降（GD）并推导出一般界，这些界可以表征其训练动力学、泛化和差分隐私（DP）下的实用性。作为具体的实例，我们专门分析了在NTK可分假设下的逻辑损失函数，证明了多项式对数网络宽度足以使GD达到优化速率$1/T$和泛化速率$1/n$，其中$T$表示GD迭代次数，$n$表示样本数量。在私人设置中，我们刻画了所需的$(ε,δ)$-DP噪声，并获得了一个实用性的界，其阶数为$\sqrt{d}/(nε)$（其中$d$是输入维度），这与一般凸Lipschitz问题的经典下界相匹配。我们的结果表明，在差分隐私下，多项式对数宽度不仅是充分的，而且是必要的，揭示了非私人（仅充分性）和私人（必要性也出现）训练模式之间的定性差距。实验进一步说明了这些理论见解如何指导实际选择，包括网络宽度选择和提前停止。

Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations

Authors: Hang Yu, Yu-Hu Yan, Peng Zhao

First: 2026-02-04T16:58:53+00:00 · Latest: 2026-02-04T16:58:53+00:00

Abs · PDF · Code1 · Code2

Abstract

Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.

中文标题/摘要

标题：改进的维度依赖性：带梯度变化的凸优化多臂老虎机

梯度变化在线学习由于与博弈论、最优化等领域的深刻联系而引起了越来越多的关注。它在完全信息设置下得到了广泛研究，但在带反馈信息的设置下研究较少。在本文中，我们关注带两点反馈的凸优化多臂老虎机（BCO）中的梯度变化。通过提出对带反馈信息的梯度变化中非连续梯度变化的精细分析，我们改进了凸函数和强凸函数的维度依赖性，优于已知的最佳结果（Chiang等，2013）。我们对非连续梯度变化的改进分析还隐含了其他有利的问题依赖性保证，如梯度方差和小损失后悔。超越两点设置，我们通过实现第一个带反馈信息的一点线性优化在超矩形域中的梯度变化界，展示了我们技术的通用性。最后，我们在动态/通用后悔最小化和多臂博弈等更具挑战性的任务中验证了我们结果的有效性，建立了两点BCO中的第一个梯度变化动态和通用后悔界，并在多臂博弈中实现了快速收敛率。

Summary / 总结

This work addresses the gradient-variation online learning in Bandit Convex Optimization with two-point feedback, improving the dimension dependence for both convex and strongly convex functions. By refining the analysis of non-consecutive gradient variation, the authors achieve better results than previous work. The improved analysis also leads to favorable problem-dependent guarantees. The technique is versatile, providing the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. The study validates its effectiveness in dynamic and universal regret minimization and in bandit games, establishing new bounds and convergence rates.

该研究关注带点反馈的凸优化（BCO）中的梯度变异在线学习，通过改进非连续梯度变异的分析，提高了凸函数和强凸函数的维度依赖性，优于以往的结果。该分析还带来了其他有利的问题依赖性保证，并首次为超矩形域的一点带点线性优化提供了梯度变异界。此外，该方法还应用于动态和通用后悔最小化以及带点博弈，首次为两点BCO提供了梯度变异动态和通用后悔界，并在带点博弈中实现了快速收敛率。

A Dual-TransUNet Deep Learning Framework for Multi-Source Precipitation Merging and Improving Seasonal and Extreme Estimates

Authors: Yuchen Ye, Zixuan Qi, Shixuan Li, Wei Qi, Yanpeng Cai, Chaoxia Yuan

First: 2026-02-04T16:55:43+00:00 · Latest: 2026-02-04T16:55:43+00:00

Comments: 75 pages,20 figures

Abs · PDF · Code1 · Code2

Abstract

Multi-source precipitation products (MSPs) from satellite retrievals and reanalysis are widely used for hydroclimatic monitoring, yet spatially heterogeneous biases and limited skill for extremes still constrain their hydrologic utility. Here we develop a dual-stage TransUNet-based multi-source precipitation merging framework (DDL-MSPMF) that integrates six MSPs with four ERA5 near-surface physical predictors. A first-stage classifier estimates daily precipitation occurrence probability, and a second-stage regressor fuses the classifier outputs together with all predictors to estimate daily precipitation amount at 0.25 degree resolution over China for 2001-2020. Benchmarking against multiple deep learning and hybrid baselines shows that the TransUNet - TransUNet configuration yields the best seasonal performance (R = 0.75; RMSE = 2.70 mm/day) and improves robustness relative to a single-regressor setting. For heavy precipitation (>25 mm/day), DDL-MSPMF increases equitable threat scores across most regions of eastern China and better reproduces the spatial pattern of the July 2021 Zhengzhou rainstorm, indicating enhanced extreme-event detection beyond seasonal-mean corrections. Independent evaluation over the Qinghai-Tibet Plateau using TPHiPr further supports its applicability in data-scarce regions. SHAP analysis highlights the importance of precipitation occurrence probabilities and surface pressure, providing physically interpretable diagnostics. The proposed framework offers a scalable and explainable approach for precipitation fusion and extreme-event assessment.

中文标题/摘要

标题：一种用于多源降水融合及季节性和极端事件估计改进的双阶段TransUNet深度学习框架

多源降水产品（MSPs）来自卫星反演和再分析，广泛用于水文气候监测，但空间异质性偏差和极端事件技能有限仍限制了其水文应用。本文开发了一种基于双阶段TransUNet的多源降水融合框架（DDL-MSPMF），整合了六种MSPs和四种ERA5地表物理预测因子。第一阶段分类器估计每日降水发生概率，第二阶段回归器将分类器输出与所有预测因子融合，估计2001-2020年中国0.25度分辨率的每日降水总量。与多种深度学习和混合基线进行基准测试表明，TransUNet-TransUNet配置在季节性表现最佳（R=0.75；RMSE=2.70毫米/天），并提高了相对于单一回归器设置的鲁棒性。对于大于25毫米/天的强降水，DDL-MSPMF在东部中国大部分地区提高了公平威胁评分，并更好地再现了2021年7月郑州暴雨的空间模式，表明其在季节平均修正之外增强了极端事件检测。青藏高原上的TPHiPr独立评估进一步支持了其在数据稀缺地区的适用性。SHAP分析强调了降水发生概率和地表气压的重要性，提供了物理可解释的诊断。所提出的框架为降水融合和极端事件评估提供了一种可扩展且可解释的方法。

Summary / 总结

The research aims to improve the accuracy and utility of multi-source precipitation products (MSPs) for hydroclimatic monitoring by addressing spatial biases and limitations in extreme event representation. It introduces a dual-stage TransUNet-based framework (DDL-MSPMF) that integrates six MSPs and four ERA5 predictors. The framework outperforms other deep learning and hybrid methods, showing better seasonal performance and enhanced detection of heavy precipitation events. It also demonstrates improved robustness and physical interpretability through SHAP analysis and independent evaluation in data-scarce regions.

研究旨在通过解决多源降水产品（MSPs）的空间偏差和极端事件估计限制，提高其在水文气候监测中的准确性和实用性。该研究提出了一种基于双阶段TransUNet的框架（DDL-MSPMF），整合了六种MSPs和四种ERA5物理预测因子，以估算每日降水发生概率和总量。该框架在季节性能方面表现出色，R = 0.75，RMSE = 2.70 mm/天，并增强了对东部中国及2021年郑州暴雨等极端降水事件的检测。SHAP分析强调了发生概率和表面气压在模型预测中的重要性，提供了物理可解释的诊断结果。

When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?

Authors: Xinyu Zhou, Chang Jin, Carsten Eickhoff, Zhijiang Guo, Seyed Ali Bahrainian

First: 2026-02-04T16:54:47+00:00 · Latest: 2026-02-04T16:54:47+00:00

Comments: Accepted to ICLR2026

Abs · PDF · Code1 · Code2

Abstract

Large language models (LLMs) rarely admit uncertainty, often producing fluent but misleading answers, rather than abstaining (i.e., refusing to answer). This weakness is even evident in temporal question answering, where models frequently ignore time-sensitive evidence and conflate facts across different time-periods. In this paper, we present the first empirical study of training LLMs with an abstention ability while reasoning about temporal QA. Existing approaches such as calibration might be unreliable in capturing uncertainty in complex reasoning. We instead frame abstention as a teachable skill and introduce a pipeline that couples Chain-of-Thought (CoT) supervision with Reinforcement Learning (RL) guided by abstention-aware rewards. Our goal is to systematically analyze how different information types and training techniques affect temporal reasoning with abstention behavior in LLMs. Through extensive experiments studying various methods, we find that RL yields strong empirical gains on reasoning: a model initialized by Qwen2.5-1.5B-Instruct surpasses GPT-4o by $3.46\%$ and $5.80\%$ in Exact Match on TimeQA-Easy and Hard, respectively. Moreover, it improves the True Positive rate on unanswerable questions by $20\%$ over a pure supervised fine-tuned (SFT) variant. Beyond performance, our analysis shows that SFT induces overconfidence and harms reliability, while RL improves prediction accuracy but exhibits similar risks. Finally, by comparing implicit reasoning cues (e.g., original context, temporal sub-context, knowledge graphs) with explicit CoT supervision, we find that implicit information provides limited benefit for reasoning with abstention. Our study provides new insights into how abstention and reasoning can be jointly optimized, providing a foundation for building more reliable LLMs.

中文标题/摘要

标题：沉默胜金：大语言模型能否学会在时间问答及更广泛的领域中保持沉默？

大语言模型（LLMs）很少承认不确定性，通常产生流畅但误导的回答，而不是保持沉默（即拒绝回答）。即使在时间问答中，模型也经常忽略时间敏感的证据，并混淆不同时间段的事实。在本文中，我们首次探讨了在时间问答中训练具有保持沉默能力的LLMs的实证研究。现有的校准方法可能无法准确捕捉复杂推理中的不确定性。相反，我们将保持沉默视为可教授的技能，并引入了一种结合链式思考（CoT）监督和由保持沉默意识奖励引导的强化学习（RL）的管道。我们的目标是系统地分析不同类型的信息和训练技术如何影响LLMs中的时间推理和保持沉默行为。通过广泛实验研究各种方法，我们发现RL在推理方面取得了显著的实证收益：一个由Qwen2.5-1.5B-Instruct初始化的模型在TimeQA-Easy和Hard上的精确匹配率分别超过了GPT-4o的3.46%和5.80%。此外，它在无法回答的问题上的真阳性率比纯监督微调（SFT）变体提高了20%。除了性能，我们的分析表明，SFT导致过度自信并损害可靠性，而RL提高预测准确性但表现出类似的风险。最后，通过比较隐式推理线索（例如原始上下文、时间子上下文、知识图谱）与显式CoT监督，我们发现隐式信息对保持沉默的推理提供的益处有限。我们的研究为如何同时优化保持沉默和推理提供了新的见解，为构建更可靠的LLMs奠定了基础。

Summary / 总结

This paper addresses the issue of large language models (LLMs) not admitting uncertainty and frequently producing misleading answers in temporal question answering. To tackle this, the authors introduce a pipeline combining Chain-of-Thought supervision with Reinforcement Learning (RL) guided by abstention-aware rewards. Experiments show that this approach outperforms supervised fine-tuning in terms of exact match on TimeQA-Easy and Hard, with gains of 3.46% and 5.80%, respectively. Additionally, it improves the True Positive rate on unanswerable questions by 20% compared to a pure supervised fine-tuned model. The study also reveals that supervised fine-tuning can lead to overconfidence, while RL improves prediction accuracy but has similar risks. Implicit reasoning cues provide limited benefit for reasoning with abstention.

该研究针对大型语言模型（LLMs）在处理时间敏感问题时缺乏不确定性承认的问题，提出了一种结合链式思考监督和强化学习（RL）的方法，该方法由弃权意识奖励引导。实验结果显示，这种方法在TimeQA-Easy和Hard上的精确匹配率分别比纯监督微调模型高出3.46%和5.80%。此外，它在无法回答的问题上提高了20%的真阳性率。研究还发现，纯监督微调会导致过度自信，而RL可以提高预测准确性但同样存在风险。隐式推理线索对弃权推理的帮助有限。

UNO: Unifying One-stage Video Scene Graph Generation via Object-Centric Visual Representation Learning

Authors: Huy Le, Nhat Chung, Tung Kieu, Jingkang Yang, Ngan Le

Venue: WACV 2026

First: 2025-09-07T18:30:41+00:00 · Latest: 2026-02-04T16:49:45+00:00

Comments: 11 pages, 7 figures. Accepted at WACV 2026

Abs · PDF · Code1 · Code2 · Code3

Abstract

Video Scene Graph Generation (VidSGG) aims to represent dynamic visual content by detecting objects and modeling their temporal interactions as structured graphs. Prior studies typically target either coarse-grained box-level or fine-grained panoptic pixel-level VidSGG, often requiring task-specific architectures and multi-stage training pipelines. In this paper, we present UNO (UNified Object-centric VidSGG), a single-stage, unified framework that jointly addresses both tasks within an end-to-end architecture. UNO is designed to minimize task-specific modifications and maximize parameter sharing, enabling generalization across different levels of visual granularity. The core of UNO is an extended slot attention mechanism that decomposes visual features into object and relation slots. To ensure robust temporal modeling, we introduce object temporal consistency learning, which enforces consistent object representations across frames without relying on explicit tracking modules. Additionally, a dynamic triplet prediction module links relation slots to corresponding object pairs, capturing evolving interactions over time. We evaluate UNO on standard box-level and pixel-level VidSGG benchmarks. Results demonstrate that UNO not only achieves competitive performance across both tasks but also offers improved efficiency through a unified, object-centric design. Code is available at: https://github.com/Fsoft-AIC/UNO

中文标题/摘要

标题：UNO：通过基于对象的视觉表示学习实现统一的一阶段视频场景图生成

视频场景图生成（VidSGG）旨在通过检测对象并将其在时间上的交互建模为结构化图来表示动态视觉内容。先前的研究通常针对粗粒度的框级或细粒度的全景像素级VidSGG，通常需要特定任务的架构和多阶段训练管道。在本文中，我们提出了UNO（统一的对象中心VidSGG），这是一种单阶段的统一框架，可以在端到端架构中同时解决这两个任务。UNO旨在最小化特定任务的修改并最大化参数共享，从而在不同粒度的视觉层次上实现泛化。UNO的核心是一个扩展的槽注意力机制，将视觉特征分解为对象和关系槽。为了确保稳健的时间建模，我们引入了对象时间一致性学习，该学习机制在不依赖显式跟踪模块的情况下，确保帧间对象表示的一致性。此外，动态三元组预测模块将关系槽链接到相应的对象对，捕捉随时间变化的交互。我们在标准的框级和像素级VidSGG基准上评估了UNO。结果表明，UNO不仅在两个任务上都取得了竞争力的表现，而且通过统一的对象中心设计还提高了效率。代码可在：https://github.com/Fsoft-AIC/UNO 获取。

Summary / 总结

UNO is a single-stage framework for Video Scene Graph Generation that unifies both box-level and pixel-level tasks within an end-to-end architecture. It uses an extended slot attention mechanism to decompose visual features and an object temporal consistency learning method to ensure robust temporal modeling. UNO achieves competitive performance on standard benchmarks and offers improved efficiency through its unified design.

UNO 是一个单阶段的 Video Scene Graph 生成框架，统一了盒级和像素级的任务，采用扩展的槽注意力机制将视觉特征分解为对象和关系槽，并引入对象时序一致性学习以确保稳健的时间建模。此外，UNO 还包含一个动态三元组预测模块来捕捉时间上的交互变化。实验结果表明，UNO 在盒级和像素级的 VidSGG 基准测试中均取得了竞争力的表现，并通过统一设计提高了效率。

History

20260205_0346 20260204_0352 20260202_0332 20260201_0328 20260131_0341 20260130_0339 20260129_0337 20260128_0335 20260127_0332 20260126_0325 20260125_0325 20260124_0333 20260123_0333 20260122_0339 20260121_0422 20260120_0328 20260119_0325 20260118_0324 20260117_0329 20260116_0332 20260115_0330 20260114_0329 20260113_0330 20260112_0330 20260111_0327 20260110_0328 20260109_0331 20260108_0330 20260107_0325 20260106_0331 20260105_0324 20260104_0324 20260103_0322 20260102_0335 20260101_0325 20251231_0331 20251230_0328 20251229_0326 20251228_0329 20251227_0325 20251226_0326 20251225_0325 20251224_0328 20251223_0327 20251222_0324 20251221_0326 20251220_0327 20251219_0327 20251218_0339 20251217_0331 20251216_0329 20251215_0331 20251214_0324 20251213_0324 20251212_0329 20251211_0326 20251210_0323 20251209_0326 20251208_0324 20251207_0323 20251206_0325 20251205_0326 20251204_0326 20251203_0328 20251202_0331 20251201_0324 20251130_0323 20251129_0323 20251128_0324 20251127_0324 20251126_0325 20251125_0322 20251124_0323 20251123_0323 20251122_0325 20251121_0324 20251120_0326 20251119_0325 20251118_0324 20251117_0322 20251116_0322 20251115_0324 20251114_0325 20251113_0326 20251112_0326 20251111_0318 20251110_0322 20251109_0323 20251108_0321 20251107_0320 20251106_0322 20251105_0321 20251104_0324 20251103_0317 20251102_0321 20251101_0317 20251031_0318 20251030_0328 20251029_0325 20251028_0324 20251027_0320 20251026_0328 20251025_0320 20251024_0328 20251023_1235 20251023_0316 20251022_0319 20251021_1916 20251021_0331 20251020_0328 20251019_0321 20251018_0327 20251017_0320 20251016_0328 20251015_0328 20251014_0323 20251011_0328 20251010_0330 20251009_0321 20251008_0343 20251007_0353 20251006_0325 20251005_0350 20251004_0352 20251003_0352 20251002_0356 20251001_0321 20250925_0335 20250924_0350 20250923_0348 20250922_0346 20250921_0345 20250920_0342 20250919_0346 20250918_0342 20250917_0336 20250916_0333 20250915_0333 20250914_0328 20250913_0322 20250912_0335 20250911_0337 20250910_0338 20250909_0341 20250908_0342 20250907_0333 20250906_0350 20250905_0319 20250904_0323 20250903_0355 20250902_0325 20250901_0355 20250831_0355 20250830_0356 20250829_0355 20250828_0333 20250827_1654 20250827_1602 20250827_1557 20250827_0320 20250826_0320 20250825_1752 20250825_1709 20250825_1652 20250825_1647 20250825_1645 20250825_1631 20250825_1606 20250825_1559 20250825_1558 20250825_1556 20250825_1531 20250825_1525 20250825_1516 20250825_1450 20250825_1444 20250825_1438 20250825_1414 20250825_1413 20250825_1410 20250825_1408 20250825_1405 20250825_1401 20250825_1355 20250825_1347 20250825_1345 20250825_1344 20250825_1343 20250825_1340 20250825_1339 20250825_1333 20250825_1323 20250825_1317 20250825_1243 20250824_0342 20250823_0343 20250823_0142 20250822_2331 20250822_2308 20250822_2258 20250822_2241 20250822_2228 20250822_2206 20250822_2147 20250822_2111 20250822_1259 20250822_1233 20250822_1229 20250822_1223 20250822_1210 20250822_1201 20250822_1111 20250822_1058 20250822_1052 20250822_1045 20250822_0657 20250822_0553