arXiv 论文速递

Snapshot: 20260217_0339

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

Authors: Albert J. Zhai, Kuo-Hao Zeng, Jiasen Lu, Ali Farhadi, Shenlong Wang, Wei-Chiu Ma

First: 2026-02-13T18:59:10+00:00 · Latest: 2026-02-13T18:59:10+00:00

Abstract

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired grasp-trajectory filtering in simulation. This simulation step extends the trajectory data with grasp suitability labels, which allows for supervised learning of task-oriented grasping capabilities. We show through real-world experiments that our framework can be used to learn precise manipulation skills efficiently without any robot data, resulting in significantly more robust performance than using a grasp generator naively.

中文标题/摘要

标题：模仿有效之处：基于人类视频的模拟筛选模块化策略学习

通过观看人类的视频来学习操作技能的能力有可能为机器人学习提供一种新的、高度可扩展的数据来源。在这里，我们探讨了抓取操作，这类任务涉及在执行各种后抓握动作之前抓取物体。人类视频为学习后抓握动作提供了强烈的信号，但对于学习先决抓握行为却不太有用，尤其是对于没有人类般手部的机器人来说。一种有前景的方法是使用模块化策略设计，利用专门的抓取生成器来生成稳定的抓取。然而，任意的稳定抓取往往与任务不兼容，阻碍了机器人执行所需的下游动作的能力。为了解决这一挑战，我们提出了感知-模拟-模仿（PSI）框架，该框架使用配对的抓取轨迹筛选在模拟中处理的人类视频运动数据来训练模块化操作策略。这一模拟步骤通过添加抓取适宜性标签来扩展轨迹数据，这使得能够监督学习任务导向的抓取能力。我们通过现实世界的实验表明，我们的框架可以有效地学习精确的操作技能，而无需任何机器人数据，从而比简单地使用抓取生成器获得显著更稳健的性能。

Summary / 总结

The paper aims to leverage human videos to teach robots manipulation skills, focusing on prehensile tasks. It introduces a Perceive-Simulate-Imitate (PSI) framework that uses a modular policy design, where a grasp generator produces stable grasps, and a simulation step filters these grasps based on their suitability for the task. This approach enables the robot to learn precise manipulation skills without relying on robot-collected data, leading to more robust performance compared to using a grasp generator alone.

该论文提出了一种Perceive-Simulate-Imitate (PSI)框架，以解决从人类视频中学习操作技能的挑战。方法使用模块化策略设计和抓取生成器来生成稳定的抓取，通过模拟过程过滤出任务不兼容的抓取。实验表明，这种方法可以在不依赖机器人数据的情况下学习精确的操作技能，相比直接使用抓取生成器，性能更加稳健。

Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

Authors: Aadarsh Sahoo, Georgia Gkioxari

First: 2026-02-13T18:58:30+00:00 · Latest: 2026-02-13T18:58:30+00:00

Comments: Project webpage: https://glab-caltech.github.io/converseg/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categorical and spatial queries (e.g., "left-most apple") and overlooks functional and physical reasoning (e.g., "where can I safely store the knife?"). We address this gap and introduce Conversational Image Segmentation (CIS) and ConverSeg, a benchmark spanning entities, spatial relations, intent, affordances, functions, safety, and physical reasoning. We also present ConverSeg-Net, which fuses strong segmentation priors with language understanding, and an AI-powered data engine that generates prompt-mask pairs without human supervision. We show that current language-guided segmentation models are inadequate for CIS, while ConverSeg-Net trained on our data engine achieves significant gains on ConverSeg and maintains strong performance on existing language-guided segmentation benchmarks. Project webpage: https://glab-caltech.github.io/converseg/

中文标题/摘要

标题：对话图像分割：通过可扩展监督锚定抽象概念

对话图像分割将抽象、意图驱动的概念锚定到像素准确的掩码中。先前的参考图像接地工作主要集中在类别和空间查询（例如，“最左边的苹果”）上，并忽视了功能性和物理推理（例如，“我可以在哪里安全地存放刀具？”）。我们填补了这一空白并引入了对话图像分割（CIS）和ConverSeg基准，该基准涵盖了实体、空间关系、意图、功能、安全性和物理推理。我们还提出了ConverSeg-Net，它将强大的分割先验与语言理解融合在一起，并使用AI驱动的数据引擎生成无需人工监督的提示-掩码对。我们展示了当前的语言引导分割模型对于CIS是不足的，而使用我们数据引擎训练的ConverSeg-Net在ConverSeg上取得了显著的提升，并在现有的语言引导分割基准上保持了强大的性能。项目网页：https://glab-caltech.github.io/converseg/

Summary / 总结

The research aims to ground abstract concepts into pixel-accurate masks through conversational image segmentation, addressing the limitations of prior work which focuses on categorical and spatial queries. The study introduces Conversational Image Segmentation (CIS) and ConverSeg, a benchmark that includes entities, spatial relations, intent, affordances, functions, safety, and physical reasoning. ConverSeg-Net, a model that combines strong segmentation priors with language understanding, is proposed and trained using an AI-powered data engine. The results show that current language-guided segmentation models are insufficient for CIS, but ConverSeg-Net achieves significant improvements on ConverSeg and maintains strong performance on existing benchmarks.

研究旨在通过对话式图像分割将抽象概念转化为像素级准确的掩码，弥补了先前工作仅关注类别和空间查询的不足。研究引入了对话式图像分割（CIS）和ConverSeg基准，该基准涵盖了实体、空间关系、意图、功能、安全性和物理推理。提出了结合强分割先验和语言理解的ConverSeg-Net模型，并使用AI驱动的数据引擎进行训练。结果表明，当前的语言引导分割模型在CIS上不够充分，但ConverSeg-Net在ConverSeg上取得了显著改进，并在现有基准上保持了良好的性能。

Semantic Chunking and the Entropy of Natural Language

Authors: Weishun Zhong, Doron Sivan, Tankut Can, Mikhail Katkov, Misha Tsodyks

First: 2026-02-13T18:58:10+00:00 · Latest: 2026-02-13T18:58:10+00:00

Comments: 29 pages, 9 figures

Abs · PDF · Code1 · Code2

Abstract

The entropy rate of printed English is famously estimated to be about one bit per character, a benchmark that modern large language models (LLMs) have only recently approached. This entropy rate implies that English contains nearly 80 percent redundancy relative to the five bits per character expected for random text. We introduce a statistical model that attempts to capture the intricate multi-scale structure of natural language, providing a first-principles account of this redundancy level. Our model describes a procedure of self-similarly segmenting text into semantically coherent chunks down to the single-word level. The semantic structure of the text can then be hierarchically decomposed, allowing for analytical treatment. Numerical experiments with modern LLMs and open datasets suggest that our model quantitatively captures the structure of real texts at different levels of the semantic hierarchy. The entropy rate predicted by our model agrees with the estimated entropy rate of printed English. Moreover, our theory further reveals that the entropy rate of natural language is not fixed but should increase systematically with the semantic complexity of corpora, which are captured by the only free parameter in our model.

中文标题/摘要

标题：语义切块与自然语言的熵

印刷英语的熵率著名地估计为每个字符大约一个比特，这是现代大型语言模型（LLMs）仅最近才接近的基准。该熵率意味着英语相对于预期的每个字符五比特的随机文本，几乎含有80%的冗余。我们引入了一个统计模型，试图捕捉自然语言复杂的多层次结构，提供了一个从第一原理出发的冗余水平解释。该模型描述了一种自相似地将文本切块为语义上连贯的片段的过程，直到单个单词级别。文本的语义结构可以逐级分解，从而允许进行分析处理。现代LLMs和开源数据集的数值实验表明，我们的模型在语义层次的不同水平上定量地捕捉到了真实文本的结构。我们的模型预测的熵率与印刷英语估计的熵率一致。此外，我们的理论进一步揭示，自然语言的熵率不是固定的，而应该随着语料库的语义复杂性系统地增加，这由我们模型中的唯一自由参数来捕捉。

Summary / 总结

The paper aims to understand the redundancy in natural language by estimating its entropy rate, which is about one bit per character. To achieve this, the authors propose a statistical model that segments text into semantically coherent chunks, from sentences to individual words, and hierarchically decomposes the semantic structure. Experiments with modern language models and open datasets show that the model accurately captures the structure of real texts at various semantic levels, with the predicted entropy rate matching the estimated entropy rate of printed English. Additionally, the model suggests that the entropy rate increases with the semantic complexity of the text corpus.

论文旨在通过估算自然语言的熵率（约为每个字符一个比特）来理解其中的冗余性。为此，作者提出了一种统计模型，将文本自相似地分割成从句子到单个单词的语义连贯片段，并逐级分解语义结构。实验表明，该模型能够准确捕捉实际文本在不同语义层次上的结构，预测的熵率与印刷英语的估计熵率相符。此外，该模型还表明，熵率会随着语料库的语义复杂性系统地增加，这是模型中的唯一自由参数所反映的。

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Authors: Sayan Deb Sarkar, Rémi Pautrat, Ondrej Miksik, Marc Pollefeys, Iro Armeni, Mahdi Rad, Mihai Dusmanu

First: 2026-02-13T18:57:31+00:00 · Latest: 2026-02-13T18:57:31+00:00

Comments: Project Page: https://sayands.github.io/cope/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods use keyframe sampling which can miss both macro-level events and micro-level details due to the sparse temporal coverage. Furthermore, processing full images and their tokens for each frame incurs substantial computational overhead. To address these limitations, we propose to leverage video codec primitives (specifically motion vectors and residuals) which natively encode video redundancy and sparsity without requiring expensive full-image encoding for most frames. To this end, we introduce lightweight transformer-based encoders that aggregate codec primitives and align their representations with image encoder embeddings through a pre-training strategy that accelerates convergence during end-to-end fine-tuning. Our approach reduces the time-to-first-token by up to $86\%$ and token usage by up to $93\%$ compared to standard VideoLMs. Moreover, by varying the keyframe and codec primitive densities we are able to maintain or exceed performance on $14$ diverse video understanding benchmarks spanning general question answering, temporal reasoning, long-form understanding, and spatial scene understanding.

中文标题/摘要

标题：CoPE-VideoLM：视频语言模型的编解码器基础

视频语言模型（VideoLMs）使AI系统能够理解视频中的时间动态。为了符合最大上下文窗口限制，当前方法使用关键帧采样，这可能会因时间覆盖稀疏而错过宏观事件和微观细节。此外，处理每一帧的完整图像及其标记会带来巨大的计算开销。为了解决这些限制，我们提出利用视频编解码器基础（特别是运动向量和残差），这些基础能够原生编码视频冗余和稀疏性，而无需对大多数帧进行昂贵的完整图像编码。为此，我们引入了轻量级的基于变压器的编码器，通过预训练策略将编解码器基础的表示与图像编码器嵌入对齐，从而加速端到端微调期间的收敛。我们的方法将第一个标记的时间减少高达86%，标记使用量减少高达93%，与标准VideoLMs相比。此外，通过调整关键帧和编解码器基础的密度，我们能够在涵盖一般问题回答、时间推理、长视频理解以及空间场景理解的14个不同视频理解基准测试中保持或超越性能。

DRL-Based Beam Positioning for LEO Satellite Constellations with Weighted Least Squares

Authors: Po-Heng Chou, Chiapin Wang, Kuan-Hao Chen, Wei-Chen Hsiao

First: 2025-11-12T00:14:10+00:00 · Latest: 2026-02-13T18:56:52+00:00

Comments: 6 pages, 3 figures, 1 table, and submitted to 2026 IEEE ICC Workshops

Abs · PDF · Code1 · Code2

Abstract

In this paper, we propose a reinforcement learning based beam weighting framework that couples a policy network with an augmented weighted least squares (WLS) estimator for accurate and low-complexity positioning in multi-beam LEO constellations. Unlike conventional geometry or CSI-dependent approaches, the policy learns directly from uplink pilot responses and geometry features, enabling robust localization without explicit CSI estimation. An augmented WLS jointly estimates position and receiver clock bias, improving numerical stability under dynamic beam geometry. Across representative scenarios, the proposed method reduces the mean positioning error by 99.3% compared with the geometry-based baseline, achieving 0.395 m RMSE with near real-time inference.

中文标题/摘要

标题：基于DRL的LEO卫星星座波束定位方法：加权最小二乘法

在本文中，我们提出了一种基于强化学习的波束加权框架，该框架结合了策略网络和增强的加权最小二乘（WLS）估计器，以实现多波束LEO星座中的准确且低复杂度定位。与传统的几何或CSI依赖方法不同，策略直接从上行链路导频响应和几何特征中学习，从而实现鲁棒定位，无需显式CSI估计。增强的WLS联合估计位置和接收器时钟偏差，提高在动态波束几何下的数值稳定性。在代表性场景中，所提出的方法将定位误差均值降低了99.3%，实现了0.395米的RMSE，并接近实时推理。

Summary / 总结

This paper proposes a reinforcement learning-based beam positioning framework for LEO satellite constellations, integrating a policy network with an augmented weighted least squares estimator. This approach directly learns from uplink pilot responses and geometry features, providing robust localization without CSI estimation. The method improves numerical stability and reduces mean positioning error by 99.3% compared to a geometry-based baseline, achieving a 0.395 m RMSE with near real-time inference.

本文提出了一种基于强化学习的波束加权框架，用于低地球轨道（LEO）卫星星座的精确定位。该方法结合了策略网络和增强的加权最小二乘（WLS）估计器，直接从上行试点响应和几何特征中学习。与基于几何的方法相比，该方法将平均定位误差降低了99.3%，实现了0.395米的均方根误差（RMSE），并具有接近实时的推理能力。

Learning-based Radio Link Failure Prediction Based on Measurement Dataset in Railway Environments

Authors: Po-Heng Chou, Da-Chih Lin, Hung-Yu Wei, Walid Saad, Yu Tsao

First: 2025-11-12T00:13:37+00:00 · Latest: 2026-02-13T18:53:36+00:00

Comments: 6 pages, 3 figures, 2 tables, and submitted to 2026 IEEE ICC Workshops

Abs · PDF · Code1 · Code2

Abstract

This paper presents a measurement-driven case study on early radio link failure (RLF) warning as device-side network sensing and analytics for proactive mobility management in 5G non-standalone (NSA) railway environments. Using 10~Hz metro-train measurement traces with serving- and neighbor-cell indicators, we benchmark six representative learning models, including CNN, LSTM, XGBoost, Anomaly Transformer, PatchTST, and TimesNet, under multiple observation windows and prediction horizons. Rather than proposing a new prediction architecture, this study focuses on quantifying the feasibility of early warning and the trade-offs among observation context, prediction horizon, and alarm reliability under real railway mobility. Experimental results show that learning models can anticipate RLF-related reliability degradation seconds in advance using lightweight features available on commercial devices. The presented benchmark provides practical insights for sensing-assisted communication control, such as proactive redundancy activation and adaptive handover strategies, aligning with the 6G vision of integrating sensing and analytics into mobility control.

中文标题/摘要

标题：基于测量数据的铁路环境中基于学习的早期无线链路失败预测

本文基于测量数据，对5G非独立（NSA）铁路环境中设备侧网络感知与分析中的早期无线链路失败（RLF）预警进行了案例研究，以实现主动移动管理。使用10~Hz的地铁列车测量轨迹和服务小区及邻区小区指示器，我们对六种代表性学习模型进行了基准测试，包括CNN、LSTM、XGBoost、异常变换器、PatchTST和TimesNet，考察了多种观测窗口和预测时间范围下的表现。本研究并未提出新的预测架构，而是专注于在实际铁路移动性条件下早期预警的可行性以及观测上下文、预测时间范围和警报可靠性之间的权衡。实验结果表明，学习模型可以提前几秒利用商用设备上可用的轻量级特征预测RLF相关的可靠性退化。所提出的基准为基于感知的通信控制提供了实用见解，如主动冗余激活和自适应切换策略，与6G愿景中将感知与分析集成到移动控制中相一致。

Summary / 总结

This paper evaluates six learning models (CNN, LSTM, XGBoost, Anomaly Transformer, PatchTST, and TimesNet) for early radio link failure prediction in 5G NSA railway environments using 10 Hz measurement traces. The study finds that these models can predict RLF-related reliability degradation seconds in advance using lightweight features available on commercial devices, highlighting the feasibility of early warning and the importance of observation context and prediction horizon in alarm reliability.

该研究评估了六种学习模型（CNN、LSTM、XGBoost、Anomaly Transformer、PatchTST 和 TimesNet），使用每秒10次的测量轨迹，在5G NSA铁路环境中预测无线电链路失败（RLF）相关的可靠性降级。研究发现，这些模型可以使用商用设备上可用的轻量级特征提前几秒预测RLF相关的可靠性降级，强调了早期预警的可行性以及观测上下文和预测时间窗对警报可靠性的关键作用。

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Authors: Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu

First: 2025-08-07T03:38:16+00:00 · Latest: 2026-02-13T18:53:32+00:00

Abs · PDF · Code1 · Code2

Abstract

Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero, a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver. These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger. This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs, e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks.

中文标题/摘要

标题：R-Zero: 从零数据自我进化的推理大语言模型

自我进化的大型语言模型（LLMs）通过自主生成、优化和从自身经验中学习，提供了一条通向超智能的可扩展路径。然而，训练此类模型的现有方法仍然高度依赖大量的人工标注任务和标签，通常通过微调或强化学习实现，这构成了推动AI系统超越人类智能能力的基本瓶颈。为克服这一限制，我们引入了R-Zero，这是一种完全自主的框架，可以从头开始生成自己的训练数据。从一个基础LLM开始，R-Zero初始化两个具有不同角色的独立模型，一个挑战者和一个解决者。这些模型分别优化并相互进化：挑战者因提出接近解决者能力边缘的任务而获得奖励，解决者因解决挑战者提出的越来越具挑战性的任务而获得奖励。这一过程产生了一个有针对性的、自我改进的课程，无需任何预先存在的任务和标签。实验证明，R-Zero显著提高了不同基础LLM的推理能力，例如，在数学推理基准测试中将Qwen3-4B-Base的性能提升了6.49，在通用领域推理基准测试中提升了7.54。

Summary / 总结

R-Zero is a self-evolving framework that generates its own training data autonomously, starting from a single base LLM. It introduces a Challenger and a Solver, which co-evolve by the Challenger proposing increasingly difficult tasks and the Solver solving them. This process enhances reasoning capabilities across different LLMs, notably improving Qwen3-4B-Base by 6.49 on math-reasoning benchmarks and 7.54 on general-domain reasoning benchmarks.

R-Zero 是一个自主生成训练数据的框架，从一个基础 LLM 开始。它引入了挑战者和解题者，两者通过挑战者提出越来越难的任务，解题者解决这些任务来共同进化。这一过程提升了不同 LLM 的推理能力，特别是在数学推理基准上提升了 Qwen3-4B-Base 的 6.49 分，在通用领域推理基准上提升了 7.54 分。

FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

Authors: Mingzhi Sheng, Zekai Gu, Peng Li, Cheng Lin, Hao-Xiang Guo, Ying-Cong Chen, Yuan Liu

First: 2026-02-13T18:52:11+00:00 · Latest: 2026-02-13T18:52:11+00:00

Comments: Codes: https://github.com/IGL-HKUST/FlexAM

Abs · PDF · Code1 · Code2 · Code3

Abstract

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.

中文标题/摘要

标题：FlexAM：灵活的外观-运动分解框架以实现多功能视频生成控制

在视频生成中实现有效的且可泛化的控制仍然是一个重大挑战。尽管许多方法依赖于模糊或任务特定的信号，我们认为“外观”和“运动”的基本分离提供了一条更稳健和可扩展的途径。我们提出了FlexAM，这是一种基于新颖3D控制信号的统一框架。该信号将视频动力学表示为点云，引入了三个关键增强：多频率位置编码以区分细粒度运动、深度感知位置编码以及灵活的控制信号以平衡精确度和生成质量。这种表示使FlexAM能够有效分离外观和运动，从而实现包括I2V/V2V编辑、摄像机控制和空间对象编辑在内的广泛任务。广泛的实验表明，FlexAM在所有评估任务中均实现了优越的性能。

Summary / 总结

FlexAM is a unified framework designed to effectively disentangle appearance and motion in video generation, addressing the challenge of effective control in video generation. It uses a novel 3D control signal that includes multi-frequency positional encoding, depth-aware positional encoding, and a flexible control signal to balance precision and generative quality. The method demonstrates superior performance in various tasks such as I2V/V2V editing, camera control, and spatial object editing through extensive experiments.

研究旨在通过提出FlexAM，一种统一框架，利用一种新颖的3D控制信号来解耦外观和运动，以解决视频生成中有效且通用的控制难题。该信号将视频动态表示为点云，包括用于细粒度运动的多频率位置编码、深度感知位置编码以及用于平衡精度和生成质量的灵活控制信号。实验结果表明，FlexAM在包括I2V/V2V编辑、摄像机控制和空间对象编辑在内的各种任务中均优于现有方法。

Selection of CMIP6 Models for Regional Precipitation Projection and Climate Change Assessment in the Jhelum and Chenab River Basins

Authors: Saad Ahmed Jamal, Ammara Nusrat, Muhammad Azmat, Muhammad Osama Nusrat

First: 2026-02-13T18:41:40+00:00 · Latest: 2026-02-13T18:41:40+00:00

Comments: 28 pages

Abs · PDF · Code1 · Code2

Abstract

Effective water resource management depends on accurate projections of flows in water channels. For projected climate data, use of different General Circulation Models (GCM) simulates contrasting results. This study shows selection of GCM for the latest generation CMIP6 for hydroclimate change impact studies. Envelope based method was used for the selection, which includes components based on machine learning techniques, allowing the selection of GCMs without the need for in-situ reference data. According to our knowledge, for the first time, such a comparison was performed for the CMIP6 Shared Socioeconomic Pathway (SSP) scenarios data. In addition, the effect of climate change under SSP scenarios was studied, along with the calculation of extreme indices. Finally, GCMs were compared to quantify spatiotemporal differences between CMIP5 and CMIP6 data. Results provide NorESM2 LM, FGOALS g3 as selected models for the Jhelum and Chenab River. Highly vulnerable regions under the effect of climate change were highlighted through spatial maps, which included parts of Punjab, Jammu, and Kashmir. Upon comparison of CMIP5 and CMIP6, no discernible difference was found between the RCP and SSP scenarios precipitation projections. In the future, more detailed statistical comparisons could further reinforce the proposition.

中文标题/摘要

标题：选择CMIP6模型进行杰卢姆河和 Chenab河流域降水投影及气候变化评估

有效的水资源管理依赖于对水道流量的准确预测。对于预测气候数据，不同的大气环流模型（GCM）模拟出不同的结果。本研究展示了选择最新一代CMIP6模型进行水文气候变化影响研究的方法。使用了基于包络的方法进行选择，该方法包括基于机器学习技术的组件，允许在无需现场参考数据的情况下选择GCMs。据我们所知，这是首次对CMIP6共享社会经济路径（SSP）情景数据进行此类比较。此外，还研究了SSP情景下的气候变化影响，计算了极端指数。最后，比较了CMIP5和CMIP6数据的空间和时间差异。结果表明，NorESM2 LM和FGOALS g3是选择的模型，气候变化影响下高度脆弱的区域通过空间地图突出显示，包括旁遮普、克什米尔和贾姆地区。在CMIP5和CMIP6的比较中，未发现RCP和SSP情景降水预测之间有明显差异。未来，更详细的统计比较将进一步加强这一观点。

Summary / 总结

This study aims to select appropriate General Circulation Models (GCMs) from the CMIP6 for regional precipitation projections and climate change assessment in the Jhelum and Chenab River basins. An envelope-based method, incorporating machine learning techniques, was used to select GCMs without requiring in-situ reference data. The study, for the first time, compared CMIP6 Shared Socioeconomic Pathway (SSP) scenarios data and found no discernible difference between RCP and SSP precipitation projections. The selected models, NorESM2 LM and FGOALS g3, were identified for the region, and spatial maps highlighted highly vulnerable areas under climate change effects, including parts of Punjab, Jammu, and Kashmir.

该研究旨在从CMIP6中选择合适的气候模型，用于杰卢姆和 Chenab河流域的降水预测和气候变化评估。研究使用了包含机器学习技术的包络法，无需现场参考数据即可选择气候模型。该研究首次比较了CMIP6共享社会路径（SSP）情景数据，发现RCP和SSP的降水预测之间没有明显差异。研究确定了NorESM2 LM和FGOALS g3为选定模型，并通过空间地图指出了在气候变化影响下高度脆弱的区域，包括旁遮普、克什米尔和贾姆穆地区。

Improved Regret Guarantees for Online Mirror Descent using a Portfolio of Mirror Maps

Authors: Swati Gupta, Jai Moondra, Mohit Singh

First: 2026-02-13T18:37:26+00:00 · Latest: 2026-02-13T18:37:26+00:00

Abs · PDF · Code1 · Code2

Abstract

OMD and its variants give a flexible framework for OCO where the performance depends crucially on the choice of the mirror map. While the geometries underlying OPGD and OEG, both special cases of OMD, are well understood, it remains a challenging open question on how to construct an optimal mirror map for any given constrained set and a general family of loss functions, e.g., sparse losses. Motivated by parameterizing a near-optimal set of mirror maps, we consider a simpler question: is it even possible to obtain polynomial gains in regret by using mirror maps for geometries that interpolate between $L_1$ and $L_2$, which may not be possible by restricting to only OEG ($L_1$) or OPGD ($L_2$). Our main result answers this question positively. We show that mirror maps based on block norms adapt better to the sparsity of loss functions, compared to previous $L_p$ (for $p \in [1, 2]$) interpolations. In particular, we construct a family of online convex optimization instances in $\mathbb{R}^d$, where block norm-based mirror maps achieve a provable polynomial (in $d$) improvement in regret over OEG and OPGD for sparse loss functions. We then turn to the setting in which the sparsity level of the loss functions is unknown. In this case, the choice of geometry itself becomes an online decision problem. We first show that naively switching between OEG and OPGD can incur linear regret, highlighting the intrinsic difficulty of geometry selection. To overcome this issue, we propose a meta-algorithm based on multiplicative weights that dynamically selects among a family of uniform block norms. We show that this approach effectively tunes OMD to the sparsity of the losses, yielding adaptive regret guarantees. Overall, our results demonstrate that online mirror-map selection can significantly enhance the ability of OMD to exploit sparsity in online convex optimization.

Summary / 总结

The paper addresses the challenge of selecting an optimal mirror map for online convex optimization (OCO) to improve regret guarantees. It explores the use of block norms as a flexible alternative to interpolate between $L_1$ and $L_2$ geometries, showing that these can achieve polynomial improvements in regret for sparse loss functions compared to previous methods. Additionally, it introduces a meta-algorithm based on multiplicative weights to dynamically select among a family of uniform block norms, adapting to the unknown sparsity level of the loss functions and providing adaptive regret guarantees.

论文旨在通过使用镜像映射组合来改进在线镜像下降（OMD）中的遗憾保证。它探索了使用块范数更好地适应损失函数的稀疏性，展示了这些方法可以在遗憾上实现多项式改进，超过之前的OEG和OPGD方法。此外，它提出了一种元算法，动态选择一组均匀块范数来适应未知的稀疏性水平，从而获得自适应遗憾保证。

Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace

Authors: Seth Donahue, J. D. Peiffer, R. Tyler Richardson, Yishan Zhong, Shaun Q. Y. Tan, Benoit Marteau, Stephanie R. Russo, May D. Wang, R. James Cotton, Ross Chafetz

First: 2026-02-13T18:36:27+00:00 · Latest: 2026-02-13T18:36:27+00:00

Abs · PDF · Code1 · Code2

Abstract

To validate a clinically accessible approach for quantifying the Upper Extremity Reachable Workspace (UERW) using a single (monocular) camera and Artificial Intelligence (AI)-driven Markerless Motion Capture (MMC) for biomechanical analysis. Objective assessment and validation of these techniques for specific clinically oriented tasks are crucial for their adoption in clinical motion analysis. AI-driven monocular MMC reduces the barriers to adoption in the clinic and has the potential to reduce the overhead for analysis of this common clinical assessment. Nine adult participants with no impairments performed the standardized UERW task, which entails reaching targets distributed across a virtual sphere centered on the torso, with targets displayed in a VR headset. Movements were simultaneously captured using a marker-based motion capture system and a set of eight FLIR cameras. We performed monocular video analysis on two of these video camera views to compare a frontal and offset camera configurations. The frontal camera orientation demonstrated strong agreement with the marker-based reference, exhibiting a minimal mean bias of $0.61 \pm 0.12$ \% reachspace reached per octanct (mean $\pm$ standard deviation). In contrast, the offset camera view underestimated the percent workspace reached ($-5.66 \pm 0.45$ \% reachspace reached). Conclusion: The findings support the feasibility of a frontal monocular camera configuration for UERW assessment, particularly for anterior workspace evaluation where agreement with marker-based motion capture was highest. The overall performance demonstrates clinical potential for practical, single-camera assessments. This study provides the first validation of monocular MMC system for the assessment of the UERW task. By reducing technical complexity, this approach enables broader implementation of quantitative upper extremity mobility assessment.

中文标题/摘要

标题：单目无标记运动捕捉使上肢可达工作空间的定量评估成为可能

验证使用单目相机和基于人工智能（AI）的无标记运动捕捉（MMC）进行上肢可达工作空间（UERW）量化评估的临床可访问方法。对这些技术进行客观评估和验证对于它们在临床运动分析中的应用至关重要。基于AI的单目MMC降低了临床应用的障碍，并有可能降低对该常见临床评估的分析成本。九名无损伤的成年参与者完成了标准化的UERW任务，即在虚拟球体中心的躯干周围分布的目标上进行目标的伸手操作，目标通过VR头显显示。同时使用基于标记的运动捕捉系统和一组八台FLIR相机捕捉动作。我们对其中两个视频相机视图进行了单目视频分析，比较了正前方和偏移相机配置。正前方相机配置与基于标记的参考系统表现出很强的一致性，最小均值偏差为$0.61 \pm 0.12$%的可达空间（均值$\pm$标准差）。相比之下，偏移相机视图低估了所达到的工作空间百分比（$-5.66 \pm 0.45$%的可达空间）。结论：研究结果支持正前方单目相机配置在UERW评估中的可行性，特别是在前部工作空间评估中，与基于标记的运动捕捉的一致性最高。总体性能表明，单目相机评估具有临床应用潜力。本研究提供了首个验证单目MMC系统进行UERW任务评估的实例。通过降低技术复杂性，该方法使定量上肢活动范围评估的广泛应用成为可能。

Summary / 总结

This study validates a monocular camera-based Markerless Motion Capture (MMC) system using Artificial Intelligence (AI) to assess the Upper Extremity Reachable Workspace (UERW) in a clinically accessible manner. Nine participants performed a standardized UERW task, with movements captured by both a marker-based system and monocular cameras. The frontal camera configuration showed strong agreement with the marker-based reference, while the offset camera underestimated the workspace reached. The findings support the feasibility of using a frontal monocular camera for UERW assessment, particularly for anterior workspace evaluation, indicating potential for practical, single-camera clinical assessments.

该研究验证了一种使用AI驱动的单目无标记运动捕捉方法来评估上肢可及工作空间（UERW）的方法。九名参与者完成了标准化的UERW任务，运动由标记式系统和单目摄像头同时捕捉。前视摄像头配置与标记式系统有很强的一致性，而偏移摄像头则低估了可及的工作空间。研究结果支持使用前视单目摄像头进行UERW评估，特别是在前向工作空间评估方面，表明该方法具有临床应用的潜力，可以实现简便的单摄像头评估。

Privacy-Preserving Federated Learning with Verifiable Fairness Guarantees

Authors: Mohammed Himayath Ali, Mohammed Aqib Abdullah, Syed Muneer Hussain, Mohammed Mudassir Uddin, Shahnawaz Alam

First: 2026-01-18T15:06:30+00:00 · Latest: 2026-02-13T18:35:53+00:00

Abs · PDF · Code1 · Code2

Abstract

Federated learning enables collaborative model training across distributed institutions without centralizing sensitive data; however, ensuring algorithmic fairness across heterogeneous data distributions while preserving privacy remains fundamentally unresolved. This paper introduces CryptoFair-FL, a novel cryptographic framework providing the first verifiable fairness guarantees for federated learning systems under formal security definitions. The proposed approach combines additively homomorphic encryption with secure multi-party computation to enable privacy-preserving verification of demographic parity and equalized odds metrics without revealing protected attribute distributions or individual predictions. A novel batched verification protocol reduces computational complexity from BigO(n^2) to BigO(n \log n) while maintaining (\dparam, \deltap)-differential privacy with dparam = 0.5 and deltap = 10^{-6}. Theoretical analysis establishes information-theoretic lower bounds on the privacy cost of fairness verification, demonstrating that the proposed protocol achieves near-optimal privacy-fairness tradeoffs. Comprehensive experiments across four benchmark datasets (MIMIC-IV healthcare records, Adult Income, CelebA, and a novel FedFair-100 benchmark) demonstrate that CryptoFair-FL reduces fairness violations from 0.231 to 0.031 demographic parity difference while incurring only 2.3 times computational overhead compared to standard federated averaging. The framework successfully defends against attribute inference attacks, maintaining adversarial success probability below 0.05 across all tested configurations. These results establish a practical pathway for deploying fairness-aware federated learning in regulated industries requiring both privacy protection and algorithmic accountability.

中文标题/摘要

标题：隐私保护的联邦学习及其可验证的公平性保证

联邦学习允许分布式机构在不集中敏感数据的情况下进行协作模型训练；然而，在异构数据分布下确保算法公平性的同时保护隐私仍然是一个根本性难题。本文提出了CryptoFair-FL，这是一种新颖的密码学框架，首次在形式安全定义下为联邦学习系统提供了可验证的公平性保证。所提出的方法结合了加性同态加密和安全多方计算，能够在不泄露保护属性分布或个体预测的情况下，实现隐私保护的公平性度量验证。一种新颖的批量验证协议将计算复杂度从O(n^2)降低到O(n log n)，同时保持dparam = 0.5和deltap = 10^-6的(\dparam, \deltap)-差分隐私。理论分析建立了公平性验证的隐私成本信息论下界，证明了所提出的协议实现了接近最优的隐私-公平性权衡。在四个基准数据集（MIMIC-IV医疗记录、成人收入、CelebA以及一个新颖的FedFair-100基准）上进行的全面实验表明，CryptoFair-FL将人口统计平价差异从0.231降低到0.031，计算开销仅比标准联邦平均高出2.3倍。该框架成功抵御了属性推断攻击，在所有测试配置中保持了低于0.05的对抗成功率。这些结果为在需要同时保护隐私和算法问责性的受监管行业中部署公平感知的联邦学习提供了实际途径。

Summary / 总结

The paper addresses the challenge of ensuring algorithmic fairness in federated learning while preserving privacy. It introduces CryptoFair-FL, a cryptographic framework that provides verifiable fairness guarantees through the use of additively homomorphic encryption and secure multi-party computation. The approach enables privacy-preserving verification of demographic parity and equalized odds metrics without revealing sensitive data. Experiments across four datasets show that CryptoFair-FL reduces fairness violations significantly while incurring only a 2.3 times increase in computational overhead compared to standard federated averaging. The framework also effectively defends against attribute inference attacks.

该论文解决了在保护隐私的同时确保联邦学习算法公平性的挑战。它引入了CryptoFair-FL，这是一种通过结合加法同态加密和安全多方计算的加密框架，提供了可验证的公平性保证。该方法能够在不泄露敏感信息的情况下，实现对人口比例和等价概率公式的隐私保护验证。实验结果显示，CryptoFair-FL减少了公平性偏差，同时仅增加了2.3倍的计算开销，并成功抵御了属性推断攻击。

LongStream: Long-Sequence Streaming Autoregressive Visual Geometry

Authors: Chong Cheng, Xianda Chen, Tao Xie, Wei Yin, Weiqiang Ren, Qian Zhang, Xiaoyuang Guo, Hao Wang

First: 2026-02-13T18:30:51+00:00 · Latest: 2026-02-13T18:30:51+00:00

Abs · PDF · Code1 · Code2 · Project1

Abstract

Long-sequence streaming 3D reconstruction remains a significant open challenge. Existing autoregressive models often fail when processing long sequences. They typically anchor poses to the first frame, which leads to attention decay, scale drift, and extrapolation errors. We introduce LongStream, a novel gauge-decoupled streaming visual geometry model for metric-scale scene reconstruction across thousands of frames. Our approach is threefold. First, we discard the first-frame anchor and predict keyframe-relative poses. This reformulates long-range extrapolation into a constant-difficulty local task. Second, we introduce orthogonal scale learning. This method fully disentangles geometry from scale estimation to suppress drift. Finally, we solve Transformer cache issues such as attention-sink reliance and long-term KV-cache contamination. We propose cache-consistent training combined with periodic cache refresh. This approach suppresses attention degradation over ultra-long sequences and reduces the gap between training and inference. Experiments show LongStream achieves state-of-the-art performance. It delivers stable, metric-scale reconstruction over kilometer-scale sequences at 18 FPS. Project Page: https://3dagentworld.github.io/longstream/

中文标题/摘要

标题：LongStream：长序列流式自回归视觉几何

长序列流式3D重建仍然是一个重要的开放挑战。现有的自回归模型在处理长序列时往往失败。它们通常将姿态锚定在第一帧，导致注意力衰减、尺度漂移和外推误差。我们提出了LongStream，一种新的解耦流式视觉几何模型，用于跨越数千帧的度量尺度场景重建。我们的方法分为三步。首先，我们丢弃第一帧的锚点，预测关键帧相对姿态。这将长距离外推重新定义为一个恒定难度的局部任务。其次，我们引入正交尺度学习。该方法完全解耦几何与尺度估计，抑制漂移。最后，我们解决了Transformer缓存问题，如注意力陷阱依赖和长期KV缓存污染。我们提出了缓存一致训练结合周期性缓存刷新。这种方法抑制了超长序列中的注意力退化，并减少了训练与推理之间的差距。实验表明，LongStream 达到了最先进的性能。它在18 FPS下实现了千米级序列的稳定、度量尺度重建。项目页面：https://3dagentworld.github.io/longstream/

Summary / 总结

LongStream addresses the challenge of long-sequence streaming 3D reconstruction by introducing a gauge-decoupled streaming visual geometry model. It predicts keyframe-relative poses instead of anchoring to the first frame, and introduces orthogonal scale learning to disentangle geometry from scale estimation. Additionally, it solves Transformer cache issues through cache-consistent training and periodic cache refresh. Experiments demonstrate that LongStream achieves state-of-the-art performance, providing stable, metric-scale reconstruction over kilometer-scale sequences at 18 FPS.

LongStream通过引入解耦模型，预测关键帧相对姿态而非固定在第一帧，从而缓解注意力衰减和尺度漂移问题。它还包含正交尺度学习，以分离几何与尺度估计，并采用缓存一致训练来抑制注意力退化。实验表明，LongStream在千米级序列上实现了最先进的性能，稳定且具有米级重建精度，帧率为18 FPS。

MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection

Authors: Sanggeon Yun, Raheeb Hassan, Ryozo Masukawa, Nathaniel D. Bastian, Mohsen Imani

First: 2025-08-20T14:43:04+00:00 · Latest: 2026-02-13T18:30:09+00:00

Abs · PDF · Code1 · Code2

Abstract

LLM-generated reasoning graphs, referred to as mission-specific graphs (MSGs), are increasingly used for video anomaly detection (VAD) and recognition (VAR). However, they are typically treated as fixed despite being generic and distribution-deficient. Conventional graph structure refinement (GSR) methods are ill-suited to this setting, as they rely on learning structural distributions that are absent in LLM-generated graphs. We propose HDC-constrained Graph Structure Refinement (HDC-GSR), a new paradigm that directly optimizes a decodable, task-aligned graph representation in a single hyperdimensional space without distribution modeling. Leveraging Hyperdimensional Computing (HDC), our framework encodes graphs via binding and bundling operations, aligns the resulting graph code with downstream loss, and decodes edge contributions to refine the structure. We instantiate this approach as MissionHD for weakly supervised VAD/VAR and demonstrate consistent performance gains on benchmark datasets.

中文标题/摘要

标题：MissionHD：超维度分布精炼的使命特定推理图视频异常检测

由LLM生成的推理图，称为使命特定图（MSGs），越来越多地用于视频异常检测（VAD）和识别（VAR）。然而，它们通常被视为固定的，尽管它们是通用且分布不足的。传统的图结构精炼（GSR）方法在这种情况下并不适用，因为它们依赖于学习LLM生成图中不存在的结构分布。我们提出了一种新的HDC约束图结构精炼（HDC-GSR）范式，该范式直接在超维度空间中优化一个可解码的任务对齐的图表示，而无需进行分布建模。利用超维度计算（HDC），我们的框架通过绑定和捆绑操作编码图，使结果图代码与下游损失对齐，并解码边贡献以精炼结构。我们以MissionHD的形式实例化此方法，用于弱监督VAD/VAR，并在基准数据集上展示了持续的性能提升。

Summary / 总结

The research aims to improve the performance of video anomaly detection and recognition by refining mission-specific graphs (MSGs) generated by language models. The proposed HDC-GSR method directly optimizes a task-aligned graph representation in a hyperdimensional space without modeling distributions. MissionHD, an instantiation of this approach, shows consistent performance gains on benchmark datasets for weakly supervised VAD/VAR tasks.

论文针对使用通用且分布不足的推理图（称为任务特定图 MSGs）在视频异常检测（VAD）和识别（VAR）中的问题。提出了一种新的 HDC-约束图结构精炼（HDC-GSR）方法，直接在超维空间中优化与任务对齐的图表示，而不进行分布建模。MissionHD 是该方法的一个实例，在弱监督 VAD/VAR 任务上的基准数据集上展示了持续的性能提升。

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Authors: Asmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, Weihua Zhu

First: 2026-02-13T18:25:00+00:00 · Latest: 2026-02-13T18:25:00+00:00

Abs · PDF · Code1 · Code2

Abstract

Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployments typically use a tiered static-dynamic design: a static cache of curated, offline vetted responses mined from logs, backed by a dynamic cache populated online. In practice, both tiers are commonly governed by a single embedding similarity threshold, which induces a hard tradeoff: conservative thresholds miss safe reuse opportunities, while aggressive thresholds risk serving semantically incorrect responses. We introduce \textbf{Krites}, an asynchronous, LLM-judged caching policy that expands static coverage without changing serving decisions. On the critical path, Krites behaves exactly like a standard static threshold policy. When the nearest static neighbor of the prompt falls just below the static threshold, Krites asynchronously invokes an LLM judge to verify whether the static response is acceptable for the new prompt. Approved matches are promoted into the dynamic cache, allowing future repeats and paraphrases to reuse curated static answers and expanding static reach over time. In trace-driven simulations on conversational and search workloads, Krites increases the fraction of requests served with curated static answers (direct static hits plus verified promotions) by up to $\textbf{3.9}$ times for conversational traffic and search-style queries relative to tuned baselines, with unchanged critical path latency.

中文标题/摘要

标题：异步验证语义缓存用于分层LLM架构

大型语言模型（LLMs）现在处于搜索、辅助和代理工作流的关键路径上，使得语义缓存对于减少推理成本和延迟变得至关重要。生产部署通常采用分层的静态-动态设计：一个静态缓存，包含从日志中挖掘的经过离线审核的精选响应，由一个在线填充的动态缓存支持。实际上，两个层级通常由单一的嵌入相似度阈值共同管理，这导致了硬性权衡：保守的阈值会错过安全的重用机会，而激进的阈值则有风险提供语义上不正确的响应。我们引入了**Krites**，一种异步、由LLM判断的缓存策略，它在不改变服务决策的情况下扩展静态覆盖范围。在关键路径上，Krites 行为与标准静态阈值策略完全相同。当提示的最近静态邻居刚好低于静态阈值时，Krites 异步调用一个LLM判断者来验证静态响应是否适用于新提示。批准的匹配项被提升到动态缓存中，允许未来的重复和改写重用精选的静态答案，并随着时间的推移扩展静态覆盖范围。在基于跟踪的模拟中，对于对话流量和搜索风格的查询，Krites 将使用精选静态答案（直接静态命中加上验证提升）的比例提高了最多**3.9**倍，而关键路径延迟保持不变。

Summary / 总结

The paper addresses the challenge of semantic caching in large language models (LLMs) to reduce inference cost and latency. It proposes Krites, an asynchronous caching policy that uses LLMs to verify static cache responses, expanding static coverage without altering serving decisions. Krites increases the fraction of requests served with curated static answers by up to 3.9 times for conversational traffic and search-style queries compared to tuned baselines, with no change in critical path latency.

论文旨在解决在缓存大型语言模型响应时平衡语义正确性和效率的挑战。它提出了一种名为Krites的异步缓存策略，该策略使用LLM验证静态缓存响应，从而在不改变服务决策的情况下扩大静态缓存的覆盖范围。实验结果显示，Krites可以将使用经过验证的静态答案的比例提高多达3.9倍，适用于对话和搜索查询，同时不会增加关键路径延迟。

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

Authors: Yiran Gao, Kim Hammar, Tao Li

Venue: AAAI

First: 2026-02-13T18:09:30+00:00 · Latest: 2026-02-13T18:09:30+00:00

Comments: 2026 AAAI Summer Symposium on Human-Aware AI Agents for the Cyber Battlefield

Abs · PDF · Code1 · Code2

Abstract

Rapidly evolving cyberattacks demand incident response systems that can autonomously learn and adapt to changing threats. Prior work has extensively explored the reinforcement learning approach, which involves learning response strategies through extensive simulation of the incident. While this approach can be effective, it requires handcrafted modeling of the simulator and suppresses useful semantics from raw system logs and alerts. To address these limitations, we propose to leverage large language models' (LLM) pre-trained security knowledge and in-context learning to create an end-to-end agentic solution for incident response planning. Specifically, our agent integrates four functionalities, perception, reasoning, planning, and action, into one lightweight LLM (14b model). Through fine-tuning and chain-of-thought reasoning, our LLM agent is capable of processing system logs and inferring the underlying network state (perception), updating its conjecture of attack models (reasoning), simulating consequences under different response strategies (planning), and generating an effective response (action). By comparing LLM-simulated outcomes with actual observations, the LLM agent repeatedly refines its attack conjecture and corresponding response, thereby demonstrating in-context adaptation. Our agentic approach is free of modeling and can run on commodity hardware. When evaluated on incident logs reported in the literature, our agent achieves recovery up to 23% faster than those of frontier LLMs.

中文标题/摘要

标题：上下文自适应网络事件响应：端到端大型语言模型代理方法

快速演化的网络攻击要求事件响应系统能够自主学习和适应不断变化的威胁。先前的工作已经广泛探索了强化学习方法，该方法通过事件的大量模拟来学习响应策略。虽然这种方法可能有效，但它需要手工构建模拟器模型，并抑制了来自原始系统日志和警报的有用语义。为了解决这些限制，我们提出利用大型语言模型（LLM）的预训练安全知识和上下文学习来创建一个端到端的代理解决方案，用于事件响应规划。具体而言，我们的代理将感知、推理、规划和行动四个功能整合到一个轻量级的LLM（14b模型）中。通过微调和链式思考推理，我们的LLM代理能够处理系统日志并推断出底层网络状态（感知），更新其对攻击模型的猜测（推理），在不同的响应策略下模拟后果（规划），并生成有效的响应（行动）。通过将LLM模拟结果与实际观察结果进行比较，LLM代理不断细化其对攻击的猜测及其相应的响应，从而展示了上下文适应性。我们的代理方法无需建模，可以在普通硬件上运行。当在文献中报告的事件日志上进行评估时，我们的代理比前沿的LLM快23%以上实现恢复。

Summary / 总结

This paper addresses the need for autonomous incident response systems capable of adapting to evolving cyberattacks. It proposes an end-to-end agent approach using large language models (LLMs) for perception, reasoning, planning, and action. The agent, fine-tuned with chain-of-thought reasoning, processes system logs to infer network states, updates attack models, simulates response strategies, and generates effective responses. The approach, free of modeling, demonstrates faster recovery compared to leading LLMs when evaluated on literature incident logs.

本文旨在应对不断演变的网络攻击，提出了一种使用大型语言模型（LLM）进行感知、推理、规划和行动的端到端代理方法。该代理通过链式思考推理进行微调，处理系统日志以推断网络状态，更新攻击模型，模拟响应策略，并生成有效的响应。该方法无需建模，当在文献中的事件日志上进行评估时，与领先的LLM相比，展示了更快的恢复速度。

Learning to Approximate Uniform Facility Location via Graph Neural Networks

Authors: Chendi Qian, Christopher Morris, Stefanie Jegelka, Christian Sohler

First: 2026-02-13T18:08:23+00:00 · Latest: 2026-02-13T18:08:23+00:00

Abs · PDF · Code1 · Code2

Abstract

There has been a growing interest in using neural networks, especially message-passing neural networks (MPNNs), to solve hard combinatorial optimization problems heuristically. However, existing learning-based approaches for hard combinatorial optimization tasks often rely on supervised training data, reinforcement learning, or gradient estimators, leading to significant computational overhead, unstable training, or a lack of provable performance guarantees. In contrast, classical approximation algorithms offer such performance guarantees under worst-case inputs but are non-differentiable and unable to adaptively exploit structural regularities in natural input distributions. We address this dichotomy with the fundamental example of Uniform Facility Location (UniFL), a variant of the combinatorial facility location problem with applications in clustering, data summarization, logistics, and supply chain design. We develop a fully differentiable MPNN model that embeds approximation-algorithmic principles while avoiding the need for solver supervision or discrete relaxations. Our approach admits provable approximation and size generalization guarantees to much larger instances than seen during training. Empirically, we show that our approach outperforms standard non-learned approximation algorithms in terms of solution quality, closing the gap with computationally intensive integer linear programming approaches. Overall, this work provides a step toward bridging learning-based methods and approximation algorithms for discrete optimization.

中文标题/摘要

标题：通过图神经网络学习近似均匀设施定位

神经网络，尤其是消息传递神经网络（MPNN），在解决组合优化问题方面引起了越来越多的兴趣。然而，现有的基于学习的方法通常依赖于监督训练数据、强化学习或梯度估计器，导致显著的计算开销、训练不稳定或缺乏可证明的性能保证。相比之下，经典的近似算法在最坏情况下提供了这样的性能保证，但它们是非可微的，无法适应自然输入分布中的结构规律。我们通过均匀设施定位（UniFL）这一组合设施定位问题的实例来解决这一二分法，该问题在聚类、数据摘要、物流和供应链设计中有应用。我们开发了一个完全可微的MPNN模型，该模型嵌入了近似算法的基本原理，同时避免了求解器监督或离散松弛的需求。我们的方法在比训练中看到的更大实例上提供了可证明的近似和规模泛化保证。实验表明，我们的方法在解决方案质量上优于标准的非学习近似算法，缩小了与计算密集型整数线性规划方法之间的差距。总体而言，这项工作为将基于学习的方法与近似算法结合用于离散优化提供了一步进展。

Summary / 总结

This paper addresses the challenge of using neural networks to solve hard combinatorial optimization problems, particularly Uniform Facility Location (UniFL), by developing a fully differentiable message-passing neural network (MPNN) model. This model incorporates approximation-algorithmic principles without requiring supervised training or discrete relaxations, offering provable performance guarantees. Experiments demonstrate that the proposed approach outperforms standard non-learned approximation algorithms in terms of solution quality and generalizes well to larger instances, closing the gap with more computationally intensive methods like integer linear programming.

本文解决了使用神经网络解决组合优化难题的问题，特别是针对统一设施定位（UniFL）问题。作者开发了一个完全可微的消息传递神经网络（MPNN）模型，该模型结合了近似算法的原则，无需监督或离散松弛。该模型为更大的实例提供了可证明的近似和大小泛化保证。实验证明，该方法在解决方案质量上优于标准的非学习近似算法，并且与计算密集型的整数线性规划方法缩小了差距。

Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection

Authors: Bharadwaj Dogga, Kaaustaaub Shankar, Gibin Raju, Wilhelm Louw, Kelly Cohen

First: 2026-02-04T22:33:18+00:00 · Latest: 2026-02-13T17:56:58+00:00

Abs · PDF · Code1 · Code2

Abstract

Deep learning models like U-Net and its variants, have established state-of-the-art performance in edge detection tasks and are used by Generative AI services world-wide for their image generation models. However, their decision-making processes remain opaque, operating as "black boxes" that obscure the rationale behind specific boundary predictions. This lack of transparency is a critical barrier in safety-critical applications where verification is mandatory. To bridge the gap between high-performance deep learning and interpretable logic, we propose the Rule-Based Spatial Mixture-of-Experts U-Net (sMoE U-Net). Our architecture introduces two key innovations: (1) Spatially-Adaptive Mixture-of-Experts (sMoE) blocks integrated into the decoder skip connections, which dynamically gate between "Context" (smooth) and "Boundary" (sharp) experts based on local feature statistics; and (2) a Takagi-Sugeno-Kang (TSK) Fuzzy Head that replaces the standard classification layer. This fuzzy head fuses deep semantic features with heuristic edge signals using explicit IF-THEN rules. We evaluate our method on the BSDS500 benchmark, achieving an Optimal Dataset Scale (ODS) F-score of 0.7628, effectively matching purely deep baselines like HED (0.7688) while outperforming the standard U-Net (0.7437). Crucially, our model provides pixel-level explainability through "Rule Firing Maps" and "Strategy Maps," allowing users to visualize whether an edge was detected due to strong gradients, high semantic confidence, or specific logical rule combinations.

中文标题/摘要

标题：基于规则的空间混合专家U-网用于可解释的边缘检测

深度学习模型如U-网及其变体，在边缘检测任务中已经取得了最先进的性能，并被全球的生成式AI服务用于其图像生成模型。然而，它们的决策过程仍然不透明，作为“黑箱”操作，模糊了特定边界预测背后的理由。这种透明度的缺乏是安全关键应用中的一个关键障碍，其中验证是强制性的。为了弥合高性能深度学习与可解释逻辑之间的差距，我们提出了基于规则的空间混合专家U-网（sMoE U-网）。我们的架构引入了两个关键创新：（1）集成到解码器跳连接中的空间自适应混合专家（sMoE）块，根据局部特征统计动态门控“上下文”（平滑）和“边界”（锐利）专家；（2）用Takagi-Sugeno-Kang（TSK）模糊头替换标准分类层。该模糊头使用显式的IF-THEN规则融合了深层语义特征和启发式边缘信号。我们在BSDS500基准上评估了我们的方法，实现了0.7628的最优数据集规模（ODS）F分数，有效地匹配了纯粹的深度基线如HED（0.7688），同时优于标准U-网（0.7437）。最关键的是，我们的模型通过“规则激活图”和“策略图”提供了像素级的可解释性，使用户能够可视化边缘是由于强烈的梯度、高语义置信度还是特定逻辑规则组合而被检测到的。

Summary / 总结

The paper addresses the need for transparent decision-making in edge detection tasks, particularly in safety-critical applications. It introduces the Rule-Based Spatial Mixture-of-Experts U-Net, which combines spatially-adaptive gating mechanisms and a fuzzy logic head to enhance interpretability. The model achieves an F-score of 0.7628 on the BSDS500 benchmark, comparable to deep learning baselines while providing pixel-level explainability through rule firing and strategy maps.

该论文提出了基于规则的空间混合专家U-Net（sMoE U-Net）以增强边缘检测模型的可解释性。它引入了空间自适应混合专家块和塔加基-苏杰-康模糊头，分别动态地在上下文和边界专家之间切换，并将深层语义特征与启发式边缘信号融合。在BSDS500基准上，该模型的ODS F分数达到0.7628，与HED等深层基线相当，同时通过规则激活图和策略图提供像素级的可解释性。

Generating Physical Dynamics under Priors

Authors: Zihan Zhou, Xiaoxue Wang, Tianshu Yu

Venue: ICLR

First: 2024-09-01T14:43:47+00:00 · Latest: 2026-02-13T17:52:00+00:00

Abs · PDF · Code1 · Code2

Abstract

Generating physically feasible dynamics in a data-driven context is challenging, especially when adhering to physical priors expressed in specific equations or formulas. Existing methodologies often overlook the integration of physical priors, resulting in violation of basic physical laws and suboptimal performance. In this paper, we introduce a novel framework that seamlessly incorporates physical priors into diffusion-based generative models to address this limitation. Our approach leverages two categories of priors: 1) distributional priors, such as roto-translational invariance, and 2) physical feasibility priors, including energy and momentum conservation laws and PDE constraints. By embedding these priors into the generative process, our method can efficiently generate physically realistic dynamics, encompassing trajectories and flows. Empirical evaluations demonstrate that our method produces high-quality dynamics across a diverse array of physical phenomena with remarkable robustness, underscoring its potential to advance data-driven studies in AI4Physics. Our contributions signify a substantial advancement in the field of generative modeling, offering a robust solution to generate accurate and physically consistent dynamics.

中文标题/摘要

标题：基于先验的物理动力学生成

在数据驱动的背景下生成物理上可行的动力学具有挑战性，尤其是在遵循特定方程或公式表达的物理先验时。现有方法往往忽视了物理先验的整合，导致违反基本物理定律并表现出次优性能。在本文中，我们提出了一种新颖的框架，将物理先验无缝地整合到基于扩散的生成模型中以解决这一限制。我们的方法利用两类先验：1) 分布先验，如旋转变换不变性，以及2) 物理可行性先验，包括能量和动量守恒定律和偏微分方程约束。通过将这些先验嵌入生成过程，我们的方法可以高效地生成物理上现实的动力学，涵盖轨迹和流。实证评估表明，我们的方法在各种物理现象中生成高质量的动力学具有显著的鲁棒性，突显了其在AI4Physics领域数据驱动研究中的潜在价值。我们的贡献标志着生成建模领域的一个重要进展，提供了一种生成准确且物理一致的动力学的稳健解决方案。

Summary / 总结

This paper addresses the challenge of generating physically feasible dynamics in a data-driven manner by integrating physical priors into diffusion-based generative models. It introduces a framework that incorporates two types of priors: distributional and physical feasibility priors, such as energy and momentum conservation laws. Experimental results show that the proposed method can generate high-quality, physically realistic dynamics across various physical phenomena, demonstrating its robustness and potential for advancing AI4Physics.

本文通过将物理先验知识整合到基于扩散的生成模型中，解决了在数据驱动方式下生成物理可行动态的挑战。该方法引入了一种框架，结合了分布先验和物理可行性先验，如能量和动量守恒定律。该方法生成了高质量、物理上真实的动态，覆盖了各种物理现象，并展示了强大的鲁棒性和在AI4Physics研究中的潜在应用价值。

FlashSchNet: Fast and Accurate Coarse-Grained Neural Network Molecular Dynamics

Authors: Pingzhi Li, Hongxuan Li, Zirui Liu, Xingcheng Lin, Tianlong Chen

First: 2026-02-13T17:49:12+00:00 · Latest: 2026-02-13T17:49:12+00:00

Comments: Code is at https://github.com/UNITES-Lab/flash-molecular-dynamics

Abs · PDF · Code1 · Code2 · Code3

Abstract

Graph neural network (GNN) potentials such as SchNet improve the accuracy and transferability of molecular dynamics (MD) simulation by learning many-body interactions, but remain slower than classical force fields due to fragmented kernels and memory-bound pipelines that underutilize GPUs. We show that a missing principle is making GNN-MD IO-aware, carefully accounting for reads and writes between GPU high-bandwidth memory (HBM) and on-chip SRAM. We present FlashSchNet, an efficient and accurate IO-aware SchNet-style GNN-MD framework built on four techniques: (1) flash radial basis, which fuses pairwise distance computation, Gaussian basis expansion, and cosine envelope into a single tiled pass, computing each distance once and reusing it across all basis functions; (2) flash message passing, which fuses cutoff, neighbor gather, filter multiplication, and reduction to avoid materializing edge tensors in HBM; (3) flash aggregation, which reformulates scatter-add via CSR segment reduce, reducing atomic writes by a factor of feature dimension and enabling contention-free accumulation in both forward and backward passes; (4) channel-wise 16-bit quantization that exploits the low per-channel dynamic range in SchNet MLP weights to further improve throughput with negligible accuracy loss. On a single NVIDIA RTX PRO 6000, FlashSchNet achieves 1000 ns/day aggregate simulation throughput over 64 parallel replicas on coarse-grained (CG) protein containing 269 beads (6.5x faster than CGSchNet baseline with 80% reduction of peak memory), surpassing classical force fields (e.g. MARTINI) while retaining SchNet-level accuracy and transferability.

中文标题/摘要

标题：FlashSchNet：快速且准确的粗粒度神经网络分子动力学

图神经网络（GNN）势能，如SchNet，通过学习多体相互作用提高了分子动力学（MD）模拟的准确性和可转移性，但由于碎片化的内核和内存限制的流水线，其速度仍低于经典力场。我们展示了缺失的原则是使GNN-MD I/O感知，仔细考虑GPU高带宽内存（HBM）和片上SRAM之间的读写操作。我们提出了FlashSchNet，这是一种基于四种技术的高效且准确的SchNet风格GNN-MD框架：（1）闪存径向基函数，将成对距离计算、高斯基底扩展和余弦包络融合为一个镶嵌遍历，每个距离计算一次并重用于所有基函数；（2）闪存消息传递，将截止、邻居收集、过滤乘法和归约融合，以避免在HBM中实现边张量；（3）闪存聚合，通过CSR段归约重新表述散列加法，通过特征维度减少原子写入，使前向和反向传递中的累积操作无竞争；（4）通道级16位量化，利用SchNet MLP权重中的低通道动态范围进一步提高吞吐量，同时几乎不损失准确性。在单个NVIDIA RTX PRO 6000上，FlashSchNet在64个并行副本上实现了粗粒度（CG）蛋白质（包含269个珠子）的1000 ns/天综合模拟吞吐量，比CGSchNet基线快6.5倍，峰值内存减少80%，超越了经典力场（例如MARTINI），同时保持SchNet级别的准确性和可转移性。

Summary / 总结

FlashSchNet is an efficient and accurate framework for molecular dynamics simulations using graph neural networks. It improves the performance of SchNet-style potentials by addressing IO inefficiencies through four techniques: flash radial basis, flash message passing, flash aggregation, and channel-wise 16-bit quantization. On a single NVIDIA RTX PRO 6000, FlashSchNet achieves 6.5 times faster simulation throughput compared to the CGSchNet baseline while using 80% less peak memory and maintaining SchNet-level accuracy and transferability.

研究旨在通过使用图神经网络（GNN）如SchNet来提高分子动力学模拟的效率和准确性。FlashSchNet是一种基于SchNet的IO感知GNN-MD框架，包含四种技术：闪存径向基函数、闪存消息传递、闪存聚合和通道级16位量化。这些技术使模拟更快且更节省内存，速度是CGSchNet基线的6.5倍，内存使用量减少80%，同时保持与SchNet相同的准确性和可转移性。在单个NVIDIA RTX PRO 6000上，FlashSchNet在包含269个粗粒度蛋白质的64个并行复制品上模拟1000 ns/day，超越了如MARTINI等经典力场。

Weight Decay may matter more than muP for Learning Rate Transfer in Practice

Authors: Atli Kosson, Jeremy Welborn, Yang Liu, Martin Jaggi, Xi Chen

Venue: ICLR 2026

First: 2025-10-21T21:36:14+00:00 · Latest: 2026-02-13T17:48:03+00:00

Comments: ICLR 2026

Abs · PDF · Code1 · Code2

Abstract

Transferring the optimal learning rate from small to large neural networks can enable efficient training at scales where hyperparameter tuning is otherwise prohibitively expensive. To this end, the Maximal Update Parameterization (muP) proposes a learning rate scaling designed to keep the update dynamics of internal representations stable across different model widths. However, the scaling rules of muP rely on strong assumptions, particularly about the geometric alignment of a layer's inputs with both its weights and gradient updates. In this large-scale empirical investigation, we show that these assumptions hold only briefly at the start of training in the practical setups where learning rate transfer is most valuable, such as LLM training. For the remainder of training it is weight decay rather than muP that correctly stabilizes the update dynamics of internal representations across widths, facilitating learning rate transfer. This suggests muP's scaling primarily acts as a form of implicit learning rate warmup, allowing us to largely replace it with modified warmup schedules. Together these findings fundamentally challenge prevailing beliefs about learning rate transfer and can explain empirical observations such as why muP requires the independent weight decay variant for good transfer.

中文标题/摘要

标题：权重衰减可能比muP在实践中对学习率转移更重要

将小型神经网络的最佳学习率转移到大型神经网络中，可以在超参数调优成本高昂的情况下实现高效的训练。为此，最大更新参数化（muP）提出了一种学习率缩放方法，旨在保持不同模型宽度下内部表示的更新动态稳定。然而，muP的缩放规则依赖于强烈的假设，特别是关于一层的输入与权重以及梯度更新之间的几何对齐。在对大规模实验的调查中，我们表明，在LLM训练等最需要学习率转移的实用设置中，这些假设仅在训练初期短暂成立。在训练的其余时间里，是权重衰减而不是muP正确地稳定了不同宽度下内部表示的更新动态，促进了学习率转移。这表明muP的缩放主要作为一种隐式的学习率预热，使我们能够用修改后的预热计划表来很大程度上替代它。这些发现从根本上挑战了关于学习率转移的现有观点，并可以解释诸如为什么muP需要独立的权重衰减变体才能实现良好转移等经验观察。

Summary / 总结

This study investigates the effectiveness of Maximal Update Parameterization (muP) and weight decay in enabling learning rate transfer from smaller to larger neural networks. The research challenges the assumption that muP's geometric alignment assumptions hold throughout training, showing that weight decay stabilizes update dynamics more effectively in large-scale training. The findings suggest that muP primarily acts as an implicit warmup, and can be largely replaced by modified warmup schedules, fundamentally altering our understanding of learning rate transfer mechanisms.

研究探讨了Maximal Update Parameterization (muP) 在小规模和大规模神经网络之间进行学习率转移的有效性。研究发现，muP 的几何对齐假设仅在训练初期成立，在大规模模型的其余训练过程中，权重衰减比muP 更重要，能够稳定内部表示的更新动态。研究结果表明，muP 主要起到隐式学习率预热的作用，可以通过修改预热策略来替代它，从而从根本上改变了对大规模语言模型学习率转移的认知。

From Prompt to Product: A Human-Centered Benchmark of Agentic App Generation Systems

Authors: Marcos Ortiz, Justin Hill, Collin Overbay, Ingrida Semenec, Frederic Sauve-Hoover, Jim Schwoebel, Joel Shor

First: 2025-12-19T21:37:15+00:00 · Latest: 2026-02-13T17:26:15+00:00

Abs · PDF · Code1 · Code2

Abstract

Agentic AI systems capable of generating full-stack web applications from natural language prompts ("prompt- to-app") represent a significant shift in software development. However, evaluating these systems remains challenging, as visual polish, functional correctness, and user trust are often misaligned. As a result, it is unclear how existing prompt-to-app tools compare under realistic, human-centered evaluation criteria. In this paper, we introduce a human-centered benchmark for evaluating prompt-to-app systems and conduct a large-scale comparative study of three widely used platforms: Replit, Bolt, and Firebase Studio. Using a diverse set of 96 prompts spanning common web application tasks, we generate 288 unique application artifacts. We evaluate these systems through a large-scale human-rater study involving 205 participants and 1,071 quality-filtered pairwise comparisons, assessing task-based ease of use, visual appeal, perceived completeness, and user trust. Our results show that these systems are not interchangeable: Firebase Studio consistently outperforms competing platforms across all human-evaluated dimensions, achieving the highest win rates for ease of use, trust, visual appeal, and visual appropriateness. Bolt performs competitively on visual appeal but trails Firebase on usability and trust, while Replit underperforms relative to both across most metrics. These findings highlight a persistent gap between visual polish and functional reliability in prompt-to-app systems and demonstrate the necessity of interactive, task-based evaluation. We release our benchmark framework, prompt set, and generated artifacts to support reproducible evaluation and future research in agentic application generation.

中文标题/摘要

标题：从提示到产品：代理应用程序生成系统的以人为本基准

能够从自然语言提示生成全栈网络应用程序的代理AI系统代表了软件开发中的重大转变。然而，评估这些系统仍然具有挑战性，因为视觉美观、功能正确性和用户信任往往不一致。因此，在现实的人本评价标准下，现有提示到应用程序工具之间的比较尚不清楚。在本文中，我们引入了一种以人为本的基准来评估提示到应用程序系统，并对三个广泛使用的平台进行了大规模比较研究：Replit、Bolt和Firebase Studio。使用涵盖常见网络应用程序任务的96个提示集，我们生成了288个独特的应用程序制品。我们通过涉及205名参与者和1,071对质量筛选后的成对比较的大规模人工评分研究，评估了这些系统的任务易用性、视觉吸引力、感知完整性以及用户信任。我们的结果显示，这些系统并非互换的：Firebase Studio在所有人工评估维度上始终优于竞争对手平台，获得最高的易用性、信任、视觉吸引力和视觉适宜性胜率。Bolt在视觉吸引力方面表现竞争，但在易用性和信任方面落后于Firebase，而Replit在大多数指标上相对两者表现较差。这些发现突显了提示到应用程序系统中视觉美观与功能可靠性之间持续存在的差距，并证明了交互式、任务导向评估的必要性。我们发布了我们的基准框架、提示集和生成的制品，以支持可重复的评估和未来代理应用程序生成的研究。

Summary / 总结

This paper introduces a human-centered benchmark for evaluating agentic AI systems that generate full-stack web applications from natural language prompts. It compares Replit, Bolt, and Firebase Studio using 96 diverse prompts and a large-scale human-rater study involving 205 participants. The results show that Firebase Studio outperforms the other platforms in terms of ease of use, trust, visual appeal, and visual appropriateness, while Bolt and Replit lag behind in usability and trust respectively. These findings underscore the gap between visual polish and functional reliability in prompt-to-app systems and emphasize the need for interactive, task-based evaluation.

本文介绍了一个以人为本的基准来评估能够从自然语言提示生成全栈web应用程序的有能AI系统。研究使用96个不同的提示比较了Replit、Bolt和Firebase Studio，并进行了大规模的人类评估者研究，涉及205名参与者。结果显示，Firebase Studio在易用性、信任度、视觉吸引力和视觉适宜性方面均优于其他平台，而Bolt在视觉吸引力方面具有竞争力，但在易用性和信任度方面落后，Replit在大多数指标上表现不佳。这些发现突显了提示到应用程序系统中视觉吸引力与功能可靠性之间的差距，并强调了需要进行交互式、基于任务的评估的必要性。

How to Train Your LLM Web Agent: A Statistical Diagnosis

Authors: Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

First: 2025-07-05T17:12:33+00:00 · Latest: 2026-02-13T17:24:17+00:00

Abs · PDF · Code1 · Code2

Abstract

LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.

中文标题/摘要

标题：如何训练你的LLM网络代理：一种统计诊断

基于LLM的网络代理最近取得了显著进展，但大部分进展发生在封闭源系统中，与开源替代方案之间的差距越来越大。进展受到两个关键挑战的阻碍：首先，专注于单一步骤任务，忽视了多步骤网络交互的复杂性；其次，训练基于LLM的网络代理所需的高计算成本。为了解决这个问题，我们提出了第一个基于统计的关于LLM网络代理后训练计算分配的研究。我们的方法使用两阶段管道，通过监督微调（SFT）训练Llama 3.1 8B学生模仿Llama 3.3 70B教师，然后通过策略梯度强化学习。我们发现这个过程对超参数选择高度敏感，使得全面搜索不切实际。为了节省他人昂贵的试错成本，我们采样了1,370种配置，并使用自助法估计有效的超参数。我们的结果显示，将SFT与策略梯度RL结合使用在WorkArena和MiniWob++上始终优于单独使用任一方法。此外，这种策略只需要纯SFT在MiniWob++上达到峰值性能所需计算量的55%，有效地推动了计算-性能帕累托前沿，并且是唯一能够缩小与封闭源模型差距的策略。

Summary / 总结

This study addresses the challenges of training LLM-based web agents by proposing a two-stage pipeline that combines supervised fine-tuning and on-policy reinforcement learning. The research finds that this approach, when optimized, outperforms either method alone on WorkArena and MiniWob++. Additionally, it requires less compute resources, achieving peak performance with only 55% of the compute needed for pure supervised fine-tuning on MiniWob++. This work provides a statistically grounded method to optimize hyperparameters, reducing the need for extensive trial-and-error.

研究提出了一种两阶段管道，包括监督微调和在线策略强化学习，以解决LLM基于的网络代理训练的挑战。研究发现，这种结合方法在WorkArena和MiniWob++上优于单独使用任何一种方法。此外，这种方法所需的计算资源显著减少，仅需纯监督微调在MiniWob++上达到峰值性能所需资源的55%。这项工作提供了一种统计上可靠的方法来优化超参数并降低训练LLM网络代理的成本。

Batch-CAM: Introduction to better reasoning in convolutional deep learning models

Authors: Giacomo Ignesti, Davide Moroni, Massimo Martinelli

First: 2025-10-01T08:47:00+00:00 · Latest: 2026-02-13T17:11:23+00:00

Comments: 10 pages, 6 figures, submitted to Signal, Image and Video Processing, Springer Nature

Abs · PDF · Code1 · Code2

Abstract

Deep learning opacity often impedes deployment in high-stakes domains. We propose a training framework that aligns model focus with class-representative features without requiring pixel-level annotations. To this end, we introduce Batch-CAM, a vectorised implementation of Gradient-weighted Class Activation Mapping that integrates directly into the training loop with minimal computational overhead. We propose two regularisation terms: a Prototype Loss, which aligns individual-sample attention with the global class average, and a Batch-CAM Loss, which enforces consistency within a training batch. These are evaluated using L1, L2, and SSIM metrics. Validated on MNIST and Fashion-MNIST using ResNet18 and ConvNeXt-V2, our method generates significantly more coherent and human-interpretable saliency maps compared to baselines. While maintaining competitive classification accuracy, the framework successfully suppresses spurious feature activation, as evidenced by qualitative reconstruction analysis. Batch-CAM appears to offer a scalable pathway for training intrinsically interpretable models by leveraging batch-level statistics to guide feature extraction, effectively bridging the gap between predictive performance and explainability.

中文标题/摘要

标题：Batch-CAM：提高卷积深度学习模型推理能力的介绍

深度学习的不透明性常阻碍其在高风险领域的部署。我们提出了一种训练框架，使模型聚焦于类代表特征，而无需像素级注释。为此，我们引入了Batch-CAM，这是一种向量化实现的梯度加权类激活映射，可以直接集成到训练循环中，且计算开销较小。我们提出了两种正则化项：原型损失，使单个样本的注意力与全局类平均值对齐；Batch-CAM损失，确保训练批次内部一致性。这些损失通过L1、L2和SSIM指标进行评估。在MNIST和Fashion-MNIST上使用ResNet18和ConvNeXt-V2验证，我们的方法生成的显著性图比基线方法更为连贯且易于人类解读。同时保持竞争力的分类准确性，该框架成功抑制了虚假特征激活，如定性重构分析所示。Batch-CAM似乎提供了一种通过利用批次级统计信息指导特征提取，从而实现预测性能与可解释性之间平衡的可扩展途径。

Summary / 总结

The research aims to improve the interpretability of deep learning models in critical applications by aligning model focus with class-representative features. The proposed Batch-CAM framework integrates Gradient-weighted Class Activation Mapping into the training loop with minimal computational overhead, using two regularisation terms: Prototype Loss and Batch-CAM Loss. The method generates more coherent and human-interpretable saliency maps on MNIST and Fashion-MNIST compared to baselines, while maintaining competitive classification accuracy and suppressing spurious feature activation.

研究通过提出Batch-CAM框架来解决深度学习模型的不透明性问题，该框架在不需要像素级标注的情况下增强模型的可解释性。引入了两种正则化项，即原型损失和Batch-CAM损失，分别用于使个体样本的注意力与全局类别平均值对齐，并在批次内部强制一致性。该方法在MNIST和Fashion-MNIST上使用ResNet18和ConvNeXt-V2进行评估，生成了比基线方法更连贯和易于人类理解的显著性图，同时保持了竞争力的分类准确性。

SCOPE: Selective Conformal Optimized Pairwise LLM Judging

Authors: Sher Badshah, Ali Emami, Hassan Sajjad

First: 2026-02-13T17:10:43+00:00 · Latest: 2026-02-13T17:10:43+00:00

Abs · PDF · Code1 · Code2

Abstract

Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation. Despite their practicality, LLM judges remain prone to miscalibration and systematic biases. This paper proposes SCOPE (Selective Conformal Optimized Pairwise Evaluation), a framework for selective pairwise judging with finite-sample statistical guarantees. Under exchangeability, SCOPE calibrates an acceptance threshold such that the error rate among non-abstained judgments is at most a user-specified level $α$. To provide SCOPE with a bias-neutral uncertainty signal, we introduce Bidirectional Preference Entropy (BPE), which queries the judge under both response positions, aggregates the implied preference probabilities to enforce invariance to response order, and converts the aggregated probability into an entropy-based uncertainty score. Across MT-Bench, RewardBench, and Chatbot Arena, BPE improves uncertainty quality over standard confidence proxies, providing a stronger selection signal that enables SCOPE to consistently meet the target risk level while retaining good coverage across judge scales. In particular, at $α= 0.10$, \textsc{Scope} consistently satisfies the risk bound across all benchmarks and judge scales (empirical risk $\approx 0.097$ to $0.099$), while retaining substantial coverage, reaching $0.89$ on RewardBench with Qwen-14B and $0.98$ on RewardBench with Qwen-32B. Compared to naïve baselines, \textsc{Scope} accepts up to $2.4\times$ more judgments on MT-Bench with Qwen-7B under the same target risk constraint, demonstrating that BPE enables reliable and high-coverage LLM-based evaluation.

中文标题/摘要

标题：SCOPE：选择性校准优化成对LLM评判

大型语言模型（LLMs）越来越多地被用作评判者，以替代昂贵的人类偏好标签在成对评估中的使用。尽管它们具有实用性，但LLM评判者仍然容易出现校准不当和系统性偏差。本文提出SCOPE（选择性校准优化成对评估）框架，该框架在有限样本统计保证下进行选择性成对评判。在可交换性假设下，SCOPE校准一个接受阈值，使得非弃权评判中的错误率最多为用户指定的水平α。为了使SCOPE获得无偏的不确定性信号，我们引入双向偏好熵（BPE），该方法在两个响应位置下查询评判者，聚合隐含的偏好概率以确保对响应顺序的不变性，并将聚合的概率转换为基于熵的不确定性评分。在MT-Bench、RewardBench和Chatbot竞技场中，BPE在不确定性质量上优于标准置信度代理，提供了一个更强的选择信号，使SCOPE能够一致地达到目标风险水平，同时在评判者规模上保持良好的覆盖率。特别是，在α=0.10时，SCOPE在所有基准和评判者规模上一致满足风险上限（经验风险≈0.097到0.099），同时保持较高的覆盖率，达到RewardBench上Qwen-14B的0.89和Qwen-32B的0.98。与朴素基线相比，在相同的预期风险约束下，SCOPE在Qwen-7B上的接受评判数量最多增加2.4倍，表明BPE使基于LLM的评估变得可靠且具有高覆盖率。

Summary / 总结

SCOPE is a framework that uses large language models (LLMs) for pairwise evaluation with statistical guarantees. It introduces Bidirectional Preference Entropy (BPE) to provide a bias-neutral uncertainty signal, enabling LLMs to make calibrated judgments. SCOPE consistently meets the target risk level while maintaining good coverage across different benchmarks and judge scales. At a risk level of 0.10, SCOPE achieves an empirical risk of approximately 0.097 to 0.099 and retains substantial coverage, reaching 0.89 on RewardBench with Qwen-14B and 0.98 with Qwen-32B. Compared to naive baselines, SCOPE accepts up to 2.4 times more judgments on MT-Bench with Qwen-7B under the same risk constraint.

SCOPE 是一个使用大型语言模型（LLMs）进行选择性成对评估的框架，具有统计保证。它引入了双向偏好熵（BPE）来提供无偏的不确定性信号，使LLMs能够做出校准判断。在各种基准测试中，SCOPE 一致地达到了目标风险水平，同时保持了良好的覆盖率，展示了与朴素基线相比可靠的高覆盖率LLM评估。

Which Algorithms Can Graph Neural Networks Learn?

Authors: Solveig Wittig, Antonis Vasileiou, Robert R. Nerem, Timo Stoll, Floris Geerts, Yusu Wang, Christopher Morris

First: 2026-02-13T17:09:50+00:00 · Latest: 2026-02-13T17:09:50+00:00

Abs · PDF · Code1 · Code2

Abstract

In recent years, there has been growing interest in understanding neural architectures' ability to learn to execute discrete algorithms, a line of work often referred to as neural algorithmic reasoning. The goal is to integrate algorithmic reasoning capabilities into larger neural pipelines. Many such architectures are based on (message-passing) graph neural networks (MPNNs), owing to their permutation equivariance and ability to deal with sparsity and variable-sized inputs. However, existing work is either largely empirical and lacks formal guarantees or it focuses solely on expressivity, leaving open the question of when and how such architectures generalize beyond a finite training set. In this work, we propose a general theoretical framework that characterizes the sufficient conditions under which MPNNs can learn an algorithm from a training set of small instances and provably approximate its behavior on inputs of arbitrary size. Our framework applies to a broad class of algorithms, including single-source shortest paths, minimum spanning trees, and general dynamic programming problems, such as the $0$-$1$ knapsack problem. In addition, we establish impossibility results for a wide range of algorithmic tasks, showing that standard MPNNs cannot learn them, and we derive more expressive MPNN-like architectures that overcome these limitations. Finally, we refine our analysis for the Bellman-Ford algorithm, yielding a substantially smaller required training set and significantly extending the recent work of Nerem et al. [2025] by allowing for a differentiable regularization loss. Empirical results largely support our theoretical findings.

中文标题/摘要

标题：图神经网络能学习哪些算法？

近年来，人们越来越关注神经架构学习执行离散算法的能力，这一领域通常被称为神经算法推理。目标是将算法推理能力整合到更大的神经管道中。许多这样的架构基于（消息传递）图神经网络（MPNNs），因为它们具有置换不变性和处理稀疏性和可变大小输入的能力。然而，现有工作要么主要是经验性的，缺乏形式保证，要么仅关注表达能力，留下了这些架构在有限训练集之外泛化的何时和如何的问题。在本文中，我们提出了一种通用的理论框架，以表征在训练集中从小型实例学习算法并在任意大小的输入上证明逼近其行为的充分条件。我们的框架适用于一系列算法，包括单源最短路径、最小生成树和一般的动态规划问题，如0-1背包问题。此外，我们为一系列算法任务建立了不可能性结果，表明标准MPNNs无法学习它们，并推导出更具表达能力的MPNN-like架构以克服这些限制。最后，我们对贝尔曼-福德算法进行了细化分析，导致所需的训练集显著减小，并显著扩展了Nerem等人[2025]的近期工作，允许使用可微正则化损失。实验结果大部分支持我们的理论发现。

Summary / 总结

This paper explores the conditions under which message-passing graph neural networks (MPNNs) can learn and generalize discrete algorithms. The authors propose a theoretical framework that characterizes the sufficient conditions for MPNNs to learn algorithms from small training instances and approximate their behavior on larger inputs. The framework applies to various algorithms, including shortest paths and dynamic programming problems. The study also identifies limitations of standard MPNNs and proposes more expressive architectures. Empirical results support the theoretical findings, showing improved performance for the Bellman-Ford algorithm with differentiable regularization.

本文探讨了消息传递图神经网络（MPNN）在学习和泛化离散算法方面的条件。作者提出了一种理论框架，该框架描述了MPNN从少量训练实例学习算法并在更大输入上近似其行为所需的充分条件。该框架适用于各种算法，包括最短路径和动态规划问题。研究还指出了标准MPNN的局限性，并提出了更具表达性的架构。实验证据支持理论发现，对于Bellman-Ford算法，通过可微正则化损失可以显著减少所需的训练集大小并取得更好的性能。

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Authors: Gengsheng Li, Jinghan He, Shijie Wang, Dan Zhang, Ruiqi Liu, Renrui Zhang, Zijun Yao, Junfeng Fang, Haiyun Guo, Jinqiao Wang

First: 2026-02-13T17:07:42+00:00 · Latest: 2026-02-13T17:07:42+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Self-play bootstraps LLM reasoning through an iterative Challenger-Solver loop: the Challenger is trained to generate questions that target the Solver's capabilities, and the Solver is optimized on the generated data to expand its reasoning skills. However, existing frameworks like R-Zero often exhibit non-sustained improvement, where early gains degrade as self-play continues. We identify a key failure mode, Diversity Illusion, where the Solver's training signals appear diverse yet collapse into recurring underlying patterns. It manifests as (1) Local Diversity Illusion, where diversity is enforced only within-batch, inducing cross-iteration mode cycling; and (2) Surface Diversity Illusion, where questions vary superficially but require near-identical reasoning skills. To mitigate them, we propose R-Diverse with two aligned innovations: Memory-Augmented Penalty (MAP), which uses a persistent memory bank to discourage recycling across iterations, and Skill-Aware Measurement (SAM), which evaluates diversity by the reasoning skills exercised rather than surface variation of questions. Across 10 math and general reasoning benchmarks, R-Diverse sustains gains over more iterations and consistently outperforms prior self-play methods. Code is available at https://github.com/Gengsheng-Li/R-Diverse.

中文标题/摘要

标题：R-多元：缓解自游戏训练LLM中的多样性幻觉

自游戏通过迭代的挑战者-解决者循环来提升LLM的推理能力：挑战者被训练生成针对解决者能力的问题，而解决者则在生成的数据上进行优化以扩展其推理技能。然而，现有的框架如R-零常常表现出非持续改进，早期的改进随着自游戏的继续而退化。我们识别出一个关键的失败模式，即多样性幻觉，其中解决者的训练信号看似多样化但实际上却陷入重复的基础模式。它表现为（1）局部多样性幻觉，其中多样性仅在批次内被强制执行，导致跨迭代模式循环；（2）表面多样性幻觉，其中问题虽然表面上有所变化，但所需的推理技能却几乎相同。为了缓解这些问题，我们提出了R-多元，其中包含两项对齐的创新：增强记忆惩罚（MAP），使用持久的记忆库来阻止跨迭代的重复使用，以及技能感知测量（SAM），通过评估实际使用的推理技能而非问题表面变化来衡量多样性。在10个数学和一般推理基准测试中，R-多元在更多迭代中保持了改进，并且始终优于之前的自游戏方法。代码可在https://github.com/Gengsheng-Li/R-Diverse/ 获取。

Summary / 总结

The paper addresses the issue of Diversity Illusion in self-play training of large language models (LLMs), where the apparent diversity in training signals collapses into recurring patterns. To mitigate this, the authors propose R-Diverse, which introduces Memory-Augmented Penalty (MAP) to discourage recycling of training data across iterations and Skill-Aware Measurement (SAM) to evaluate diversity based on the reasoning skills exercised. Experiments on 10 math and general reasoning benchmarks show that R-Diverse sustains performance gains over more iterations and outperforms previous self-play methods.

论文针对LLM自游戏训练中非持续性改进的问题，特别是多样性幻觉的现象。提出了R-Diverse，通过持久记忆库增强的惩罚（MAP）来防止迭代间重复使用训练数据，并通过技能感知测量（SAM）来基于推理技能而非问题表面变化来评估多样性。在10个数学和一般推理基准测试中，R-Diverse在更多迭代中保持了性能提升，并优于之前的自游戏方法。

Barron-Wiener-Laguerre models

Authors: Rahul Manavalan, Filip Tronarp

First: 2026-02-13T17:02:48+00:00 · Latest: 2026-02-13T17:02:48+00:00

Abs · PDF · Code1 · Code2

Abstract

We propose a probabilistic extension of Wiener-Laguerre models for causal operator learning. Classical Wiener-Laguerre models parameterize stable linear dynamics using orthonormal Laguerre bases and apply a static nonlinear map to the resulting features. While structurally efficient and interpretable, they provide only deterministic point estimates. We reinterpret the nonlinear component through the lens of Barron function approximation, viewing two-layer networks, random Fourier features, and extreme learning machines as discretizations of integral representations over parameter measures. This perspective naturally admits Bayesian inference on the nonlinear map and yields posterior predictive uncertainty. By combining Laguerre-parameterized causal dynamics with probabilistic Barron-type nonlinear approximators, we obtain a structured yet expressive class of causal operators equipped with uncertainty quantification. The resulting framework bridges classical system identification and modern measure-based function approximation, providing a principled approach to time-series modeling and nonlinear systems identification.

中文标题/摘要

标题：巴伦-维纳-拉格朗日模型

我们提出了一种维纳-拉格朗日模型的概率扩展，用于因果算子学习。经典的维纳-拉格朗日模型使用正交的拉格朗日基来参数化稳定的线性动力学，并对结果特征应用静态非线性映射。虽然结构上高效且可解释，但它们仅提供确定性的点估计。我们通过巴伦函数逼近的视角重新解释非线性部分，将两层网络、随机傅里叶特征和极端学习机视为参数测度上积分表示的离散化。这种视角自然地允许对非线性映射进行贝叶斯推断，并产生后验预测不确定性。通过结合拉格朗日参数化的因果动力学和概率巴伦型非线性逼近器，我们获得了一类具有不确定性量化功能的结构化且表达能力强的因果算子。由此形成的框架将经典系统识别与现代基于测度的函数逼近相结合，提供了一种时间序列建模和非线性系统识别的原理性方法。

Summary / 总结

The research aims to extend classical Wiener-Laguerre models by incorporating probabilistic Barron function approximation to enable uncertainty quantification in causal operator learning. The method involves using orthonormal Laguerre bases to parameterize stable linear dynamics and applying a probabilistic nonlinear map, which is approximated by two-layer networks or random Fourier features. Key findings show that this approach provides structured yet flexible models with posterior predictive uncertainty, bridging classical system identification techniques with modern function approximation methods.

研究旨在通过引入概率Barron函数逼近来扩展经典的Wiener-Laguerre模型，以在因果操作学习中实现不确定性量化。方法包括使用正交的Laguerre基来参数化稳定的线性动态，并应用一个概率非线性映射，该映射通过两层网络或随机傅里叶特征进行近似。关键发现表明，这种方法提供了结构化且灵活的模型，并具有后验预测不确定性，将经典的系统识别技术与现代函数逼近方法相结合。

Mathematics and Machine Creativity: A Survey on Bridging Mathematics with AI

Authors: Shizhe Liang, Wei Zhang, Tianyang Zhong, Tianming Liu

First: 2024-12-21T08:58:36+00:00 · Latest: 2026-02-13T17:01:27+00:00

Comments: This article is withdrawn due to internal authorship and supervisory considerations that require clarification before the work can proceed in its current form. After further review, I believe it is appropriate to pause and formally resolve these matters to ensure full compliance with institutional and collaborative research policies

Abs · PDF · Code1 · Code2

Abstract

This paper presents a comprehensive overview on the applications of artificial intelligence (AI) in mathematical research, highlighting the transformative role AI has begun to play in this domain. Traditionally, AI advancements have heavily relied on theoretical foundations provided by mathematics and statistics. However, recent developments in AI, particularly in reinforcement learning (RL) and large language models (LLMs), have demonstrated the potential for AI to contribute back to mathematics by offering flexible algorithmic frameworks and powerful inductive reasoning capabilities that support various aspects of mathematical research. This survey aims to establish a bridge between AI and mathematics, providing insights into the mutual benefits and fostering deeper interdisciplinary understanding. In particular, we argue that while current AI and LLMs may struggle with complex deductive reasoning, their "inherent creativity", the ability to generate outputs at high throughput based on recognition of shallow patterns, holds significant potential to support and inspire mathematical research. This creative capability, often overlooked, could be the key to unlocking new perspectives and methodologies in mathematics. Furthermore, we address the lack of cross-disciplinary communication: mathematicians may not fully comprehend the latest advances in AI, while AI researchers frequently prioritize benchmark performance over real-world applications in frontier mathematical research. This paper seeks to close that gap, offering a detailed exploration of AI fundamentals, its strengths, and its emerging applications in the mathematical sciences.

中文标题/摘要

标题：数学与机器创造力：人工智能与数学融合综述

本文综述了人工智能（AI）在数学研究中的应用，强调了AI在这一领域开始发挥的变革性作用。传统上，AI的进步很大程度上依赖于数学和统计学提供的理论基础。然而，最近AI的发展，特别是在强化学习（RL）和大型语言模型（LLMs）方面，展示了AI反哺数学的潜力，通过提供灵活的算法框架和强大的归纳推理能力，支持数学研究的各个方面。本文旨在建立AI与数学之间的桥梁，提供相互利益的见解，并促进更深层次的跨学科理解。特别地，我们认为虽然当前的AI和LLMs在复杂演绎推理方面存在困难，但它们的“内在创造力”，即基于浅层模式识别生成大量输出的能力，具有支持和激发数学研究的潜力。这种创造能力往往被忽视，可能是解锁数学新视角和方法的关键。此外，我们还讨论了跨学科沟通的缺乏：数学家可能无法完全理解AI的最新进展，而AI研究人员则经常优先考虑基准性能而非前沿数学研究的实际应用。本文旨在弥合这一差距，详细探讨AI的基本原理、优势及其在数学科学中的新兴应用。

Summary / 总结

This paper surveys the application of artificial intelligence (AI) in mathematical research, emphasizing the transformative role of AI, especially in reinforcement learning and large language models, in supporting mathematical research through flexible algorithmic frameworks and inductive reasoning. The study highlights the potential of AI's 'inherent creativity' in generating outputs based on pattern recognition, which could inspire new perspectives and methodologies in mathematics. It also addresses the need for better cross-disciplinary communication between mathematicians and AI researchers to leverage AI's strengths in real-world mathematical applications.

本文综述了人工智能在数学研究中的应用，强调了人工智能，尤其是强化学习和大型语言模型，在通过灵活的算法框架和归纳推理支持数学研究方面所发挥的变革性作用。研究指出，人工智能的“内在创造力”，即基于模式识别生成大量输出的能力，有可能激发数学研究中的新视角和方法论。此外，本文还强调了数学家与人工智能研究人员之间更好的跨学科沟通需求，以便充分利用人工智能在数学实际应用中的优势。

Reasoning about Intent for Ambiguous Requests

Authors: Irina Saparina, Mirella Lapata

First: 2025-11-13T16:18:45+00:00 · Latest: 2026-02-13T16:55:57+00:00

Abs · PDF · Code1 · Code2

Abstract

Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address this, we propose generating multiple interpretation-answer pairs in a single structured response to ambiguous requests. Our models are trained with reinforcement learning and customized reward functions using multiple valid answers as supervision. Experiments on conversational question answering and semantic parsing demonstrate that our method achieves higher coverage of valid answers than baseline approaches. Human evaluation confirms that predicted interpretations are highly aligned with their answers. Our approach promotes transparency with explicit interpretations, achieves efficiency by requiring only one generation step, and supports downstream applications through its structured output format.

中文标题/摘要

标题：关于模糊请求意图推理

大型语言模型经常通过隐式地选择一种解释来回应模糊请求。意图误解会令用户沮丧并带来安全风险。为了解决这个问题，我们提出了一种在单一结构化响应中生成多个解释-答案对的方法。我们的模型通过强化学习和使用多个有效答案定制的奖励函数进行训练。在对话式问答和语义解析实验中，我们的方法在有效答案覆盖率方面优于基线方法。人类评估证实，预测的解释与答案高度一致。我们的方法通过明确的解释促进透明度，通过单步生成实现高效性，并通过结构化输出支持下游应用。

Summary / 总结

The paper addresses the issue of large language models committing to one interpretation of ambiguous requests, which can lead to misunderstandings and safety risks. It proposes a method of generating multiple interpretation-answer pairs in a single structured response using reinforcement learning and customized reward functions. Experiments show that this approach covers more valid answers than baseline methods and aligns well with human interpretations, promoting transparency and efficiency in handling ambiguous requests.

论文针对大型语言模型在处理含糊请求时倾向于只选择一种解释导致的误解和安全风险问题，提出了一种生成单一结构化响应中多个解释-答案对的方法，使用强化学习和定制的奖励函数进行训练。实验表明，该方法比基线方法覆盖了更多的有效答案，并且与人类解释高度一致，促进了透明度、效率，并通过结构化输出支持下游应用。

Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Authors: Kun Feng, Shaocheng Lan, Yuchen Fang, Wenchao He, Lintao Ma, Xingyu Lu, Kan Ren

First: 2025-09-30T06:02:26+00:00 · Latest: 2026-02-13T16:54:37+00:00

Abs · PDF · Code1 · Code2 · Project1

Abstract

Inherent temporal heterogeneity, such as varying sampling densities and periodic structures, has posed substantial challenges in zero-shot generalization for Time Series Foundation Models (TSFMs). Existing TSFMs predominantly rely on massive parameterization to absorb such heterogeneity, as their static tokenization and positional encoding schemes entangle diverse temporal patterns into a fixed representation space, encouraging memorization rather than adaptation. To address this limitation, we propose Kairos, a flexible and parameter-efficient TSFM that decouples temporal heterogeneity from model capacity through a novel tokenization perspective. Kairos introduces a dynamic patching tokenizer and a mixture-of-size encoding that adapt observational granularity to local information density, enabling fine-grained temporal abstraction without increasing model width or depth. In addition, we design a multi-granularity positional embedding based on dynamic rotary encodings, which conditions on instance-level spectral features and temporal structure induced by dynamic patching tokenization, allowing robust modeling of diverse temporal dependencies. Trained on a novel Predictability-Stratified Time-Series (PreSTS) corpus, Kairos achieves superior zero-shot performance with substantially fewer parameters on two mainstream benchmarks, GIFT-Eval and Time-Series-Library. The project page is at https://foundation-model-research.github.io/Kairos .

中文标题/摘要

标题：Kairos: 向自适应和参数高效时间序列基础模型迈进

内在的时间异质性，如变化的采样密度和周期结构，对时间序列基础模型（TSFMs）的零样本泛化构成了重大挑战。现有的TSFMs主要依赖大量的参数化来吸收这种异质性，因为它们静态的分词和位置编码方案将多种时间模式纠缠在一个固定的表示空间中，鼓励记忆而不是适应。为了解决这一局限性，我们提出了Kairos，一种通过新颖的分词视角解耦时间异质性和模型容量的灵活且参数高效的TSFM。Kairos引入了动态分块分词器和大小混合编码，根据局部信息密度调整观测粒度，无需增加模型宽度或深度即可实现细粒度的时间抽象。此外，我们设计了一种基于动态旋转编码的多粒度位置嵌入，该嵌入基于动态分块分词，条件于实例级的频谱特征和由动态分块分词诱导的时间结构，允许对多种时间依赖关系进行稳健建模。Kairos在新型可预测性分层时间序列（PreSTS）语料库上进行训练，在两个主流基准GIFT-Eval和Time-Series-Library上实现了优于现有模型的零样本性能，且参数量显著减少。项目页面位于https://foundation-model-research.github.io/Kairos 。

Summary / 总结

Kairos is designed to address the challenge of zero-shot generalization in Time Series Foundation Models (TSFMs) by decoupling temporal heterogeneity from model capacity. It introduces a dynamic patching tokenizer and a mixture-of-size encoding to adapt to varying sampling densities and periodic structures, enabling fine-grained temporal abstraction without increasing model size. Kairos also employs a multi-granularity positional embedding based on dynamic rotary encodings, which conditions on instance-level spectral features and temporal structure, allowing for robust modeling of diverse temporal dependencies. On mainstream benchmarks, Kairos achieves superior zero-shot performance with significantly fewer parameters compared to existing models.

Kairos旨在通过将时间异质性与模型容量解耦来解决时间序列基础模型（TSFMs）在零样本泛化中的挑战。它引入了一种动态分块分词器和大小混合编码，以适应不同的采样密度和周期结构，从而在不增加模型大小的情况下实现精细的时间抽象。Kairos还使用基于动态旋转编码的多粒度位置嵌入，增强了模型处理各种时间依赖性的能力。实验结果表明，Kairos在GIFT-Eval和Time-Series-Library基准测试中表现出色，并且参数量显著较少。

Universal Transformation of One-Class Classifiers for Unsupervised Anomaly Detection

Authors: Declan McIntosh, Alexandra Branzan Albu

First: 2026-02-13T16:54:12+00:00 · Latest: 2026-02-13T16:54:12+00:00

Comments: 6 figures, 9 pages main paper, 15 pages total with supplemental

Abs · PDF · Code1 · Code2

Abstract

Detecting anomalies in images and video is an essential task for multiple real-world problems, including industrial inspection, computer-assisted diagnosis, and environmental monitoring. Anomaly detection is typically formulated as a one-class classification problem, where the training data consists solely of nominal values, leaving methods built on this assumption susceptible to training label noise. We present a dataset folding method that transforms an arbitrary one-class classifier-based anomaly detector into a fully unsupervised method. This is achieved by making a set of key weak assumptions: that anomalies are uncommon in the training dataset and generally heterogeneous. These assumptions enable us to utilize multiple independently trained instances of a one-class classifier to filter the training dataset for anomalies. This transformation requires no modifications to the underlying anomaly detector; the only changes are algorithmically selected data subsets used for training. We demonstrate that our method can transform a wide variety of one-class classifier anomaly detectors for both images and videos into unsupervised ones. Our method creates the first unsupervised logical anomaly detectors by transforming existing methods. We also demonstrate that our method achieves state-of-the-art performance for unsupervised anomaly detection on the MVTec AD, ViSA, and MVTec Loco AD datasets. As improvements to one-class classifiers are made, our method directly transfers those improvements to the unsupervised domain, linking the domains.

中文标题/摘要

标题：一类分类器的通用转换用于无监督异常检测

在图像和视频中检测异常是多个现实世界问题的关键任务，包括工业检查、计算机辅助诊断和环境监测。异常检测通常被形式化为一类分类问题，其中训练数据仅由正常值组成，这使得基于这一假设的方法容易受到训练标签噪声的影响。我们提出了一种数据折叠方法，将任意一类分类器为基础的异常检测器转换为完全无监督的方法。这通过做出一组关键的弱假设得以实现：异常在训练数据集中不常见且通常异质性较强。这些假设使我们能够利用多个独立训练的一类分类器实例来过滤训练数据集中的异常。这种转换不需要对底层异常检测器进行任何修改；唯一的变化是用于训练的算法选择的数据子集。我们证明了我们的方法可以将各种一类分类器异常检测器，无论是针对图像还是视频，都转换为无监督的。我们的方法通过转换现有方法创建了第一个无监督逻辑异常检测器。我们还证明了我们的方法在MVTec AD、ViSA和MVTec Loco AD数据集上的无监督异常检测性能达到了最先进的水平。随着一类分类器的改进，我们的方法可以直接将这些改进转移到无监督领域，连接了这些领域。

Summary / 总结

The paper addresses the challenge of unsupervised anomaly detection in images and videos by transforming one-class classifiers into fully unsupervised methods. It assumes anomalies are rare and heterogeneous, using multiple instances of a one-class classifier to filter the training dataset. This method, requiring no changes to the anomaly detector, demonstrates state-of-the-art performance on multiple datasets, linking improvements in one-class classifiers to the unsupervised domain.

论文通过将一类分类器转换为完全无监督的方法，解决了图像和视频中的无监督异常检测问题。该方法假设异常值稀少且异质性高，利用多个此类分类器实例过滤训练数据中的异常值。此方法无需对分类器进行更改，已在多个数据集上展示了最先进的性能，并将一类分类器的改进直接转移到无监督领域。

Post-hoc Probabilistic Vision-Language Models

Authors: Anton Baumann, Rui Li, Marcus Klasson, Santeri Mentu, Shyamgopal Karthik, Zeynep Akata, Arno Solin, Martin Trapp

Venue: ICLR 2026

First: 2024-12-08T18:16:13+00:00 · Latest: 2026-02-13T16:49:09+00:00

Comments: Published at ICLR 2026. Project page: https://aaltoml.github.io/BayesVLM/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Vision-language models (VLMs), such as CLIP and SigLIP, have found remarkable success in classification, retrieval, and generative tasks. For this, VLMs deterministically map images and text descriptions to a joint latent space in which their similarity is assessed using the cosine similarity. However, a deterministic mapping of inputs fails to capture uncertainties over concepts arising from domain shifts when used in downstream tasks. In this work, we propose post-hoc uncertainty estimation in VLMs that does not require additional training. Our method leverages a Bayesian posterior approximation over the last layers in VLMs and analytically quantifies uncertainties over cosine similarities. We demonstrate its effectiveness for uncertainty quantification and support set selection in active learning. Compared to baselines, we obtain improved and well-calibrated predictive uncertainties, interpretable uncertainty estimates, and sample-efficient active learning. Our results show promise for safety-critical applications of large-scale models.

中文标题/摘要

标题：事后概率视觉-语言模型

视觉-语言模型（VLMs），如CLIP和SigLIP，在分类、检索和生成任务中取得了显著的成功。为此，VLMs将图像和文本描述确定性地映射到一个联合潜在空间，在该空间中使用余弦相似度评估它们的相似性。然而，在下游任务中使用确定性映射输入时，无法捕捉由于领域转移而产生的概念不确定性。在本文中，我们提出了一种不需要额外训练的VLMs事后不确定性估计方法。该方法利用VLMs最后一层的贝叶斯后验近似，并分析性地量化余弦相似度的不确定性。我们展示了其在不确定性量化和积极学习支持集选择中的有效性。与基线相比，我们获得了改进且校准良好的预测不确定性、可解释的不确定性估计以及样本高效的积极学习。我们的结果表明，对于大规模模型的安全关键应用具有前景。

Summary / 总结

This work addresses the limitations of deterministic mappings in vision-language models (VLMs) by proposing a post-hoc method for uncertainty estimation. The method uses a Bayesian posterior approximation to quantify uncertainties over cosine similarities in the latent space of VLMs. The study demonstrates that this approach improves predictive uncertainties, provides interpretable uncertainty estimates, and enhances sample-efficient active learning compared to baseline methods. This is particularly promising for safety-critical applications of large-scale models.

该研究针对视觉-语言模型（VLMs）中确定性映射的局限性，提出了一种后验不确定性估计方法。该方法通过贝叶斯后验近似来量化VLMs隐空间中余弦相似性的不确定性。研究结果表明，这种方法能够提高预测不确定性，提供可解释的不确定性估计，并在主动学习中提高样本效率，相比基线方法具有优势，特别是在大规模模型的安全关键应用中显示出潜力。

Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach

Authors: Subreena, Mohammad Amzad Hossain, Mirza Raquib, Saydul Akbar Murad, Farida Siddiqi Prity, Muhammad Hanif, Nick Rahimi

First: 2026-01-31T17:37:16+00:00 · Latest: 2026-02-13T16:48:00+00:00

Abs · PDF · Code1 · Code2

Abstract

Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse representation. Next, to enhance the dataset's robustness, we incorporate four additional datasets, including public benchmarks, to cover various complexities and improve the model's generalization. To overcome the limitations of current recognition models, we propose a novel hybrid CNN architecture that combines MobileNetV3-Large and EfficientNetB0 for efficient feature extraction. This is followed by an effective multilayer perceptron (MLP) classifier to improve performance while keeping computational costs low, making the system suitable for resource-constrained devices. The experimental results show that the proposed model achieves 97.95% accuracy on controlled datasets, 92.84% on complex backgrounds, and 94.98% accuracy when combining all datasets. The model's performance is thoroughly evaluated using five-fold cross-validation and seven metrics: accuracy, precision, recall, F1-score, Cohen's Kappa, MCC, and AUC. Additionally, explainable AI methods like LIME and SHAP are incorporated to enhance transparency and interpretability.

中文标题/摘要

标题：稳健且实时的孟加拉国货币识别：基于双流MobileNet和EfficientNet的方法

准确的货币识别对于辅助技术至关重要，特别是对于依赖他人识别纸币的视障人士而言。这种依赖使他们面临欺诈和剥削的风险。为了解决这些挑战，我们首先构建了一个新的孟加拉国纸币数据集，其中包括控制和现实场景，确保更全面和多样的表示。接下来，为了增强数据集的稳健性，我们整合了四个额外的数据集，包括公共基准数据集，以涵盖各种复杂性并提高模型的泛化能力。为了克服现有识别模型的局限性，我们提出了一种新颖的混合CNN架构，结合了MobileNetV3-Large和EfficientNetB0进行高效的特征提取。随后，通过有效的多层感知器（MLP）分类器提高性能，同时保持计算成本低，使系统适用于资源受限的设备。实验结果表明，所提模型在控制数据集上的准确率为97.95%，在复杂背景下的准确率为92.84%，在结合所有数据集时的准确率为94.98%。使用五折交叉验证和七项指标：准确率、精确率、召回率、F1分数、Cohen's Kappa、MCC和AUC，对模型的性能进行了全面评估。此外，还整合了可解释AI方法如LIME和SHAP，以增强透明度和可解释性。

Summary / 总结

The research aims to develop a robust and real-time currency recognition system for Bangladeshi banknotes to assist visually impaired individuals. The study introduces a hybrid CNN architecture combining MobileNetV3-Large and EfficientNetB0 for feature extraction and an MLP classifier for improved performance. Experimental results show the model achieves high accuracy across different scenarios, with 97.95% on controlled datasets, 92.84% on complex backgrounds, and 94.98% when combining all datasets. The model is evaluated using five-fold cross-validation and seven metrics, and explainable AI methods are used to enhance transparency and interpretability.

研究旨在为孟加拉国的视障人士开发一个稳健且实时的货币识别系统。为此，创建了一个结合了控制和现实场景的新数据集，并整合了四个额外的数据集以提高泛化能力。提出了一种结合MobileNetV3-Large和EfficientNetB0的混合CNN架构，用于高效特征提取，随后使用MLP分类器以提高性能。该模型在控制数据集上的准确率为97.95%，在复杂背景下的准确率为92.84%，在结合所有数据集时的准确率为94.98%。使用五折交叉验证和七个指标来评估模型性能，并结合了解释性AI方法以增强透明度和可解释性。

Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks

Authors: Yongzhong Xu

First: 2026-02-11T03:57:46+00:00 · Latest: 2026-02-13T16:47:38+00:00

Comments: 15 pages, 6 figures

Abs · PDF · Code1 · Code2

Abstract

We investigate the geometric structure of learning dynamics in overparameterized transformer models through carefully controlled modular arithmetic tasks. Our primary finding is that despite operating in high-dimensional parameter spaces ($d=128$), transformer training trajectories rapidly collapse onto low-dimensional execution manifolds of dimension $3$--$4$. This dimensional collapse is robust across random seeds and moderate task difficulties, though the orientation of the manifold in parameter space varies between runs. We demonstrate that this geometric structure underlies several empirically observed phenomena: (1) sharp attention concentration emerges as saturation along routing coordinates within the execution manifold, (2) SGD commutators are preferentially aligned with the execution subspace (up to $10\times$ random baseline) early in training, with $>92\%$ of non-commutativity confined to orthogonal staging directions and this alignment decreasing as training converges, and (3) sparse autoencoders capture auxiliary routing structure but fail to isolate execution itself, which remains distributed across the low-dimensional manifold. Our results suggest a unifying geometric framework for understanding transformer learning, where the vast majority of parameters serve to absorb optimization interference while core computation occurs in a dramatically reduced subspace. These findings have implications for interpretability, training curriculum design, and understanding the role of overparameterization in neural network learning.

中文标题/摘要

标题：变压器学习动力学中的低维执行流形：来自模块算术任务的证据

我们通过精心控制的模块算术任务研究了过参数化变压器模型学习动力学的几何结构。我们的主要发现是，尽管在高维参数空间（$d=128$）中操作，变压器训练轨迹迅速坍缩到维度为$3$-$4$的低维执行流形上。这种维度坍缩在随机种子和中等难度任务中是稳健的，但流形在参数空间中的方向在不同运行中有所不同。我们证明这种几何结构解释了几个经验观察到的现象：(1) 精锐注意力集中表现为执行流形内路由坐标上的饱和，(2) SGD 交换子在训练早期与执行子空间（比随机基线高$10\times$）对齐，非交换性主要集中在正交调度方向上，随着训练收敛，这种对齐度降低，(3) 稀疏自编码器捕获辅助路由结构但无法隔离执行本身，执行在低维流形上分布。我们的结果表明，一个统一的几何框架可以理解变压器学习，其中绝大多数参数用于吸收优化干扰，而核心计算发生在显著减少的子空间中。这些发现对可解释性、训练课程设计以及理解过参数化在神经网络学习中的作用具有重要意义。

Summary / 总结

The study investigates the learning dynamics of overparameterized transformer models using modular arithmetic tasks. It finds that despite high-dimensional parameter spaces, transformer training trajectories quickly converge to low-dimensional execution manifolds of dimensions 3-4. This structure explains phenomena such as sharp attention concentration, SGD commutators aligning with the execution subspace, and sparse autoencoders failing to isolate execution. The findings suggest that most parameters absorb optimization interference while core computation occurs in a reduced subspace.

研究使用模算术任务探讨了变压器模型学习动力学的几何结构。尽管在高维空间中训练，模型的轨迹迅速收敛到维度为3-4的低维流形。这一结构解释了注意力集中、SGD交换子与执行子空间的偏好对齐以及稀疏自编码器无法隔离核心计算等现象。研究结果表明，变压器学习主要发生在一个高度压缩的子空间中，大多数参数用于缓解优化干扰，而核心计算则发生在这一低维流形中。

Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning

Authors: Mominul Rubel, Adam Meyers, Gabriel Nicolosi

Venue: Transactions on Machine Learning Research, 2025

First: 2025-09-10T16:49:20+00:00 · Latest: 2026-02-13T16:32:41+00:00

Comments: The published version is available at https://openreview.net/forum?id=LPKt5vd7yz

Abs · PDF · Code1 · Code2

Abstract

We introduce the Fourier Learning Machine (FLM), a neural network (NN) architecture designed to represent a multidimensional nonharmonic Fourier series. The FLM uses a simple feedforward structure with cosine activation functions to learn the frequencies, amplitudes, and phase shifts of the series as trainable parameters. This design allows the model to create a problem-specific spectral basis adaptable to both periodic and nonperiodic functions. Unlike previous Fourier-inspired NN models, the FLM is the first architecture able to represent a multidimensional Fourier series with a complete set of basis functions in separable form, doing so by using a standard Multilayer Perceptron-like architecture. A one-to-one correspondence between the Fourier coefficients and amplitudes and phase-shifts is demonstrated, allowing for the translation between a full, separable basis form and the cosine phase-shifted one. Additionally, we evaluate the performance of FLMs on several scientific computing problems, including benchmark Partial Differential Equations (PDEs) and a family of Optimal Control Problems (OCPs). Computational experiments show that the performance of FLMs is comparable, and often superior, to that of established architectures like SIREN and vanilla feedforward NNs.

中文标题/摘要

标题：傅里叶学习机：基于非谐傅里叶级数的神经网络科学机器学习

我们介绍了傅里叶学习机（FLM），这是一种用于表示多维非谐傅里叶级数的神经网络（NN）架构。FLM 使用简单的前馈结构和余弦激活函数来学习级数的频率、振幅和相位移作为可训练参数。这种设计使模型能够创建特定于问题的谱基，适用于周期性和非周期性函数。与之前的傅里叶启发式 NN 模型不同，FLM 是第一个能够以分量形式表示多维傅里叶级数的架构，通过使用类似于标准多层感知机的架构实现。证明了傅里叶系数与振幅和相位移之间的一一对应关系，允许在完整的分量基形式和余弦相位移形式之间进行转换。此外，我们在几个科学计算问题上评估了FLM的表现，包括基准偏微分方程（PDE）和一系列最优控制问题（OCP）。计算实验表明，FLM 的性能与已建立的架构（如SIREN和标准前馈神经网络）相当，甚至更优。

Summary / 总结

The research introduces the Fourier Learning Machine (FLM), a neural network designed to represent multidimensional nonharmonic Fourier series. FLM uses a simple feedforward structure with cosine activation functions to learn frequencies, amplitudes, and phase shifts as trainable parameters. Experiments show that FLMs perform comparably and sometimes better than established architectures like SIREN and vanilla feedforward networks on scientific computing problems, including PDEs and OCPs.

Fourier Learning Machine (FLM) 使用简单的前馈神经网络结构和余弦激活函数来表示多维非谐傅里叶级数，通过学习频率、振幅和相位偏移作为可训练参数，使模型能够适应周期性和非周期性函数。在偏微分方程（PDE）和最优控制问题（OCP）等科学计算问题上的实验表明，FLM 的性能与 SIREN 和纯前馈网络等现有架构相当，有时甚至更优。

Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models

Authors: Ziru Niu, Hai Dong, A. K. Qin

Venue: ICLR 2026 poster

First: 2025-08-03T08:55:18+00:00 · Latest: 2026-02-13T16:25:33+00:00

Comments: Accepted by ICLR 2026 (poster)

Abs · PDF · Code1 · Code2

Abstract

Federated Learning (FL) is a privacy-preserving machine learning framework facilitating collaborative training across distributed clients. However, its performance is often compromised by data heterogeneity among participants, which can result in local models with limited generalization capability. Traditional model-homogeneous approaches address this issue primarily by regularizing local training procedures or dynamically adjusting client weights during aggregation. Nevertheless, these methods become unsuitable in scenarios involving clients with heterogeneous model architectures. In this paper, we propose a model-heterogeneous FL framework that enhances clients' generalization performance on unseen data without relying on parameter aggregation. Instead of model parameters, clients share feature distribution statistics (mean and covariance) with the server. Then each client trains a variational transposed convolutional neural network using Gaussian latent variables sampled from these distributions, and use it to generate synthetic data. By fine-tuning local models with the synthetic data, clients achieve significant improvement of generalization ability. Experimental results demonstrate that our approach not only attains higher generalization accuracy compared to existing model-heterogeneous FL frameworks, but also reduces communication costs and memory consumption.

中文标题/摘要

标题：使用生成模型弥合异构联邦客户端泛化差距

联邦学习（FL）是一种隐私保护的机器学习框架，可以在分布式客户端之间协作训练。然而，由于参与者之间数据异构性的影响，其性能常常受到限制，导致局部模型泛化能力有限。传统模型同质方法主要通过正则化局部训练过程或在聚合时动态调整客户端权重来解决这一问题。然而，这些方法在涉及具有异构模型架构的客户端的场景中变得不合适。在本文中，我们提出了一种模型异构FL框架，该框架在不依赖参数聚合的情况下增强客户端在未见过的数据上的泛化性能。客户端与服务器共享特征分布统计（均值和协方差），然后每个客户端使用从这些分布中采样的高斯潜在变量训练变分转置卷积神经网络，并使用该网络生成合成数据。通过使用合成数据微调局部模型，客户端能够显著提高泛化能力。实验结果表明，与现有的模型异构FL框架相比，我们的方法不仅在泛化准确性上更高，而且降低了通信成本和内存消耗。

Summary / 总结

This paper addresses the generalization gap in federated learning due to data heterogeneity among clients. It proposes a model-heterogeneous FL framework where clients share feature distribution statistics with the server, and then generate synthetic data using variational transposed convolutional neural networks. This approach improves generalization performance without relying on model parameter aggregation, leading to higher generalization accuracy and reduced communication costs and memory consumption.

本文针对联邦学习中由于客户端数据异质性导致的泛化能力差距问题，提出了一种模型异质的联邦学习框架。该框架中，客户端共享特征分布统计而非模型参数，每个客户端使用变分转置卷积神经网络生成合成数据，并用这些数据微调本地模型，从而提高泛化能力。实验结果表明，该方法在泛化准确性方面优于现有方法，并且减少了通信和内存消耗。

Backdoor Attacks on Contrastive Continual Learning for IoT Systems

Authors: Alfous Tim, Kuniyilh Simi D

First: 2026-02-13T16:17:25+00:00 · Latest: 2026-02-13T16:17:25+00:00

Abs · PDF · Code1 · Code2

Abstract

The Internet of Things (IoT) systems increasingly depend on continual learning to adapt to non-stationary environments. These environments can include factors such as sensor drift, changing user behavior, device aging, and adversarial dynamics. Contrastive continual learning (CCL) combines contrastive representation learning with incremental adaptation, enabling robust feature reuse across tasks and domains. However, the geometric nature of contrastive objectives, when paired with replay-based rehearsal and stability-preserving regularization, introduces new security vulnerabilities. Notably, backdoor attacks can exploit embedding alignment and replay reinforcement, enabling the implantation of persistent malicious behaviors that endure through updates and deployment cycles. This paper provides a comprehensive analysis of backdoor attacks on CCL within IoT systems. We formalize the objectives of embedding-level attacks, examine persistence mechanisms unique to IoT deployments, and develop a layered taxonomy tailored to IoT. Additionally, we compare vulnerabilities across various learning paradigms and evaluate defense strategies under IoT constraints, including limited memory, edge computing, and federated aggregation. Our findings indicate that while CCL is effective for enhancing adaptive IoT intelligence, it may also elevate long-lived representation-level threats if not adequately secured.

中文标题/摘要

标题：物联网系统中对比连续学习的后门攻击

物联网(IoT)系统越来越多地依赖连续学习以适应非平稳环境。这些环境可能包括传感器漂移、用户行为变化、设备老化和对抗动态等因素。对比连续学习(CCL)结合了对比表示学习与增量适应，能够在任务和领域之间实现稳健的特征重用。然而，对比目标的几何性质与基于重演的重演和保持稳定性的正则化相结合，引入了新的安全漏洞。值得注意的是，后门攻击可以利用嵌入对齐和重演强化，植入持久的恶意行为，这些行为能够持续到更新和部署周期。本文对IoT系统中CCL的后门攻击进行了全面分析。我们形式化了嵌入级攻击的目标，研究了IoT部署特有的持久性机制，并开发了一种针对IoT的分层分类法。此外，我们比较了各种学习范式下的漏洞，并在IoT限制条件下评估了防御策略，包括有限的内存、边缘计算和联邦聚合。我们的研究结果表明，虽然CCL能够增强适应性IoT智能，但如果未得到充分保护，也可能增加长期存在的表示级威胁。

Summary / 总结

The paper investigates backdoor attacks on contrastive continual learning (CCL) in IoT systems, motivated by the need to secure adaptive learning in non-stationary environments. The authors employ a layered taxonomy to analyze embedding-level attacks and their persistence mechanisms, comparing vulnerabilities across different learning paradigms. Key findings show that while CCL enhances IoT adaptability, it can also introduce long-lived representation-level threats if not properly secured, highlighting the need for robust defense strategies.

本文研究了物联网系统中对比连续学习(CCL)的后门攻击，动机在于确保物联网的自适应智能在非平稳环境中不受威胁。作者采用分层分类法分析了嵌入层攻击和持久机制，并在物联网约束条件下评估了防御策略。主要发现表明，当CCL与回放强化和稳定性保持正则化结合时，可能会遭受持久的恶意行为，强调了需要采取 robust 安全措施的重要性。

PromptDepthAnything++: Accurate 4K Metric Depth Estimation via Pattern-Agnostic Prompting

Authors: Haotong Lin, Sida Peng, Qinglin Yang, Peishan Yang, Jiaming Sun, Ruizhen Hu, Kai Xu, Hujun Bao, Bingyi Kang, Xiaowei Zhou

First: 2024-12-18T16:32:12+00:00 · Latest: 2026-02-13T16:17:04+00:00

Comments: Project page: https://PromptDA.github.io/

Abs · PDF · Code1 · Code2 · Code3 · Project1

Abstract

Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. To further extend our method to work with any prompt depth points, we propose a new prompting mechanism, which serializes the input depth points into tokens and uses self-attention to enhance image tokens from depth foundation models. Our approach sets new state-of-the-arts on 8 zero-shot depth benchmarks and benefits downstream applications, including 3D reconstruction and generalized robotic grasping. The code is available at https://github.com/DepthAnything/PromptDA .

中文标题/摘要

标题：PromptDepthAnything++: 无模式提示实现4K分辨率度量深度估计

提示在释放语言和视觉基础模型特定任务能力方面起着关键作用。我们首次将提示引入深度基础模型，创建了一种新的度量深度估计范式，称为Prompt Depth Anything。具体而言，我们使用低成本LiDAR作为提示来引导Depth Anything模型生成准确的度量深度输出，最高可达4K分辨率。我们的方法集中在多尺度深度解码器中结合LiDAR的简洁提示融合设计。为了解决有限数据集中的训练挑战，该数据集包含LiDAR深度和精确GT深度，我们提出了一种可扩展的数据管道，包括合成数据LiDAR模拟和真实数据伪GT深度生成。为了进一步使我们的方法能够处理任何提示深度点，我们提出了一种新的提示机制，将输入深度点序列化为令牌，并使用自注意力增强来自深度基础模型的图像令牌。我们的方法在8个零样本深度基准测试中设立了新的最先进水平，并有利于下游应用，包括三维重建和通用机器人抓取。代码可在https://github.com/DepthAnything/PromptDA 获取。

Summary / 总结

This paper introduces Prompt Depth Anything++, a method for accurate 4K metric depth estimation using pattern-agnostic prompting. The approach uses a low-cost LiDAR as a prompt to guide the Depth Anything model, integrating the LiDAR at multiple scales within the depth decoder. The authors propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation to address training challenges. The method achieves state-of-the-art results on 8 zero-shot depth benchmarks and benefits downstream applications such as 3D reconstruction and robotic grasping.

该研究引入了Prompt Depth Anything++，一种使用无模式提示进行准确4K度量深度估计的方法。该方法使用低成本LiDAR作为提示来引导Depth Anything模型，并在深度解码器中以多尺度方式集成LiDAR。作者提出了一种可扩展的数据管道，包括合成数据LiDAR模拟和真实数据伪GT深度生成，以解决训练挑战。该方法在8个零样本深度基准上达到了最先进的结果，并且有利于下游应用，如3D重建和机器人抓取。

LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting

Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Joshua Han, Zihang Xu, Songyuan Sui, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Alfredo Costilla Reyes, Daochen Zha, Xia Hu

First: 2024-06-20T07:09:19+00:00 · Latest: 2026-02-13T16:15:55+00:00

Abs · PDF · Code1 · Code2

Abstract

Time Series Forecasting (TSF) has long been a challenge in time series analysis. Inspired by the success of Large Language Models (LLMs), researchers are now developing Large Time Series Models (LTSMs)-universal transformer-based models that use autoregressive prediction-to improve TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datasets. Recent endeavors have studied and evaluated various design choices aimed at enhancing LTSM training and generalization capabilities. However, these design choices are typically studied and evaluated in isolation and are not benchmarked collectively. In this work, we introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs, spanning pre-processing techniques, model configurations, and dataset configuration. It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, training paradigms, base model selection, data quantity, and dataset diversity. Furthermore, we combine the most effective design choices identified in our study. Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods on benchmark datasets.

中文标题/摘要

标题：LTSM-Bundle：用于时间序列预测的大语言模型工具箱和基准

时间序列预测（TSF）长期以来一直是时间序列分析中的一个挑战。受大型语言模型（LLMs）成功的启发，研究人员现在正在开发大型时间序列模型（LTSMs）——基于自回归预测的通用转换器模型，以提高TSF。然而，使用异构时间序列数据训练LTSMs带来了独特的挑战，包括数据集中的不同频率、维度和模式。最近的研究已经研究和评估了各种旨在增强LTSM训练和泛化能力的设计选择。然而，这些设计选择通常是在孤立的情况下研究和评估的，并没有集体基准化。在本文中，我们介绍了LTSM-Bundle，这是一个全面的工具箱和基准，用于训练LTSMs，涵盖了预处理技术、模型配置和数据集配置。它从多个维度模块化和基准化了LTSMs，包括提示策略、分词方法、训练范式、基础模型选择、数据量和数据集多样性。此外，我们结合了我们在研究中确定的最有效的设计选择。实证结果表明，这种组合在基准数据集上实现了优于最先进的LTSMs和传统TSF方法的零样本和少量样本性能。

Summary / 总结

The research aims to improve Time Series Forecasting (TSF) using Large Time Series Models (LTSMs), which are transformer-based models designed for autoregressive prediction. The study introduces LTSM-Bundle, a comprehensive toolbox and benchmark that modularizes and evaluates various design choices for LTSM training, including pre-processing techniques, model configurations, and dataset settings. Experimental results show that the combination of the most effective design choices significantly outperforms state-of-the-art LTSMs and traditional TSF methods in zero-shot and few-shot scenarios on benchmark datasets.

研究旨在通过开发LTSM-Bundle工具箱和基准来解决训练大型时间序列模型（LTSM）进行时间序列预测（TSF）所面临的挑战。它涵盖了各种预处理技术、模型配置和数据集设置，以增强LTSM的训练和泛化能力。研究结合了最有效的设计选择，并在基准数据集上展示了这种组合在零样本和少量样本设置中优于最先进的LTSM和传统TSF方法的性能。

History

20260216_0334 20260215_0332 20260213_0402 20260212_0404 20260211_0409 20260210_0409 20260208_0334 20260207_0349 20260206_0347 20260205_0346 20260204_0352 20260202_0332 20260201_0328 20260131_0341 20260130_0339 20260129_0337 20260128_0335 20260127_0332 20260126_0325 20260125_0325 20260124_0333 20260123_0333 20260122_0339 20260121_0422 20260120_0328 20260119_0325 20260118_0324 20260117_0329 20260116_0332 20260115_0330 20260114_0329 20260113_0330 20260112_0330 20260111_0327 20260110_0328 20260109_0331 20260108_0330 20260107_0325 20260106_0331 20260105_0324 20260104_0324 20260103_0322 20260102_0335 20260101_0325 20251231_0331 20251230_0328 20251229_0326 20251228_0329 20251227_0325 20251226_0326 20251225_0325 20251224_0328 20251223_0327 20251222_0324 20251221_0326 20251220_0327 20251219_0327 20251218_0339 20251217_0331 20251216_0329 20251215_0331 20251214_0324 20251213_0324 20251212_0329 20251211_0326 20251210_0323 20251209_0326 20251208_0324 20251207_0323 20251206_0325 20251205_0326 20251204_0326 20251203_0328 20251202_0331 20251201_0324 20251130_0323 20251129_0323 20251128_0324 20251127_0324 20251126_0325 20251125_0322 20251124_0323 20251123_0323 20251122_0325 20251121_0324 20251120_0326 20251119_0325 20251118_0324 20251117_0322 20251116_0322 20251115_0324 20251114_0325 20251113_0326 20251112_0326 20251111_0318 20251110_0322 20251109_0323 20251108_0321 20251107_0320 20251106_0322 20251105_0321 20251104_0324 20251103_0317 20251102_0321 20251101_0317 20251031_0318 20251030_0328 20251029_0325 20251028_0324 20251027_0320 20251026_0328 20251025_0320 20251024_0328 20251023_1235 20251023_0316 20251022_0319 20251021_1916 20251021_0331 20251020_0328 20251019_0321 20251018_0327 20251017_0320 20251016_0328 20251015_0328 20251014_0323 20251011_0328 20251010_0330 20251009_0321 20251008_0343 20251007_0353 20251006_0325 20251005_0350 20251004_0352 20251003_0352 20251002_0356 20251001_0321 20250925_0335 20250924_0350 20250923_0348 20250922_0346 20250921_0345 20250920_0342 20250919_0346 20250918_0342 20250917_0336 20250916_0333 20250915_0333 20250914_0328 20250913_0322 20250912_0335 20250911_0337 20250910_0338 20250909_0341 20250908_0342 20250907_0333 20250906_0350 20250905_0319 20250904_0323 20250903_0355 20250902_0325 20250901_0355 20250831_0355 20250830_0356 20250829_0355 20250828_0333 20250827_1654 20250827_1602 20250827_1557 20250827_0320 20250826_0320 20250825_1752 20250825_1709 20250825_1652 20250825_1647 20250825_1645 20250825_1631 20250825_1606 20250825_1559 20250825_1558 20250825_1556 20250825_1531 20250825_1525 20250825_1516 20250825_1450 20250825_1444 20250825_1438 20250825_1414 20250825_1413 20250825_1410 20250825_1408 20250825_1405 20250825_1401 20250825_1355 20250825_1347 20250825_1345 20250825_1344 20250825_1343 20250825_1340 20250825_1339 20250825_1333 20250825_1323 20250825_1317 20250825_1243 20250824_0342 20250823_0343 20250823_0142 20250822_2331 20250822_2308 20250822_2258 20250822_2241 20250822_2228 20250822_2206 20250822_2147 20250822_2111 20250822_1259 20250822_1233 20250822_1229 20250822_1223 20250822_1210 20250822_1201 20250822_1111 20250822_1058 20250822_1052 20250822_1045 20250822_0657 20250822_0553