MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images
Authors: Ankan Deria, Komal Kumar, Adinath Madhavrao Dukre, Eran Segal, Salman Khan, Imran Razzak
First: 2026-02-06T18:59:59+00:00 · Latest: 2026-02-06T18:59:59+00:00
Comments: 21 pages, 6 figures and 4 tables
Abstract
Multimodal large language models (MLLMs) have rapidly advanced, yet their adoption in medicine remains limited by gaps in domain coverage, modality alignment, and grounded reasoning. In this work, we introduce MedMO, a medical foundation model built upon a generalized MLLM architecture and trained exclusively on large-scale, domain-specific data. MedMO follows a multi-stage training recipe: (i) cross-modal pretraining to align heterogeneous visual encoders with a medical language backbone; (ii) instruction tuning on multi-task supervision that spans captioning, VQA, report generation, retrieval, and grounded disease localization with bounding boxes; and (iii) reinforcement learning with verifiable rewards that combine factuality checks with a box-level GIoU reward to strengthen spatial grounding and step-by-step reasoning in complex clinical scenarios. MedMO consistently outperforms strong open-source medical MLLMs across multiple modalities and tasks. On VQA benchmarks, MedMO achieves an average accuracy improvement of +13.7% over the baseline and performs within 1.9% of the SOTA Fleming-VL. For text-based QA, it attains +6.9% over the baseline and +14.5% over Fleming-VL. In medical report generation, MedMO delivers significant gains in both semantic and clinical accuracy. Moreover, it exhibits strong grounding capability, achieving an IoU improvement of +40.4 over the baseline and +37.0% over Fleming-VL, underscoring its robust spatial reasoning and localization performance. Evaluations across radiology, ophthalmology, and pathology-microscopy confirm MedMO's broad cross-modality generalization. We release two versions of MedMO: 4B and 8B. Project is available at https://genmilab.github.io/MedMO-Page
中文标题/摘要
标题:MedMO:基于多模态大型语言模型的医学图像理解
多模态大型语言模型(MLLMs)已迅速发展,但在医学领域的应用受限于领域覆盖范围、模态对齐和基于事实的推理方面的差距。本文介绍了一种名为MedMO的医学基础模型,该模型基于通用MLLM架构,并仅在大规模、领域特定的数据上进行训练。MedMO采用多阶段训练方法:(i)跨模态预训练,将异构视觉编码器与医学语言骨干对齐;(ii)在涉及图像字幕、VQA、报告生成、检索和带有边界框的基于事实的疾病定位的多任务监督下进行指令调优;(iii)结合事实检查和边界框级别的GIoU奖励的强化学习,以增强复杂临床场景中的空间对齐和逐步推理。MedMO在多个模态和任务上均优于强大的开源医学MLLMs。在VQA基准测试中,MedMO的平均准确率提高了13.7%,并在Fleming-VL基准测试中仅落后1.9%。对于基于文本的问答,它比基线提高了6.9%,比Fleming-VL提高了14.5%。在医学报告生成方面,MedMO在语义和临床准确性方面取得了显著进步。此外,它还展示了强大的空间对齐能力,与基线相比提高了40.4%,与Fleming-VL相比提高了37.0%,突显了其稳健的空间推理和定位性能。放射学、眼科和病理学-显微镜学领域的评估证实了MedMO的跨模态泛化能力。我们发布了MedMO的两个版本:4B和8B。项目详情请参见https://genmilab.github.io/MedMO-Page
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
Authors: Yuchen Yan, Liang Jiang, Jin Jiang, Shuaicheng Li, Zujie Wen, Zhiqiang Zhang, Jun Zhou, Jian Shao, Yueting Zhuang, Yongliang Shen
First: 2026-02-06T18:59:27+00:00 · Latest: 2026-02-06T18:59:27+00:00
Comments: Project Page: https://zju-real.github.io/InftyThink-Plus Code: https://github.com/ZJU-REAL/InftyThink-Plus
Abstract
Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but this paradigm suffers from quadratic cost, context length limits, and degraded reasoning due to lost-in-the-middle effects. Iterative reasoning mitigates these issues by periodically summarizing intermediate thoughts, yet existing methods rely on supervised learning or fixed heuristics and fail to optimize when to summarize, what to preserve, and how to resume reasoning. We propose InftyThink+, an end-to-end reinforcement learning framework that optimizes the entire iterative reasoning trajectory, building on model-controlled iteration boundaries and explicit summarization. InftyThink+ adopts a two-stage training scheme with supervised cold-start followed by trajectory-level reinforcement learning, enabling the model to learn strategic summarization and continuation decisions. Experiments on DeepSeek-R1-Distill-Qwen-1.5B show that InftyThink+ improves accuracy by 21% on AIME24 and outperforms conventional long chain-of-thought reinforcement learning by a clear margin, while also generalizing better to out-of-distribution benchmarks. Moreover, InftyThink+ significantly reduces inference latency and accelerates reinforcement learning training, demonstrating improved reasoning efficiency alongside stronger performance.
中文标题/摘要
标题:InftyThink+: 通过强化学习实现有效高效的无限期推理
大型推理模型通过扩展推理时的链式思考来实现强大的性能,但这种范式会遭受二次成本、上下文长度限制以及由于中间迷失而导致的推理退化。迭代推理通过定期总结中间想法来缓解这些问题,但现有方法依赖于监督学习或固定启发式方法,并不能优化何时总结、保留什么以及如何继续推理。我们提出了一种名为InftyThink+的端到端强化学习框架,该框架优化了整个迭代推理轨迹,基于模型控制的迭代边界和显式总结。InftyThink+采用两阶段训练方案,先进行监督冷启动,然后进行轨迹级强化学习,使模型能够学习战略性的总结和继续决策。实验表明,InftyThink+在DeepSeek-R1-Distill-Qwen-1.5B上的AIME24准确率提高了21%,在常规长链式思考强化学习中表现出明显的优势,同时在离分布基准上也表现出更好的泛化能力。此外,InftyThink+显著减少了推理延迟并加速了强化学习训练,展示了更强的推理效率和性能。
Summary / 总结
InftyThink+ is an end-to-end reinforcement learning framework that optimizes iterative reasoning for infinite-horizon problems, addressing the limitations of large reasoning models. It uses a two-stage training scheme to learn strategic summarization and continuation decisions, improving accuracy by 21% on AIME24 and outperforming conventional methods. InftyThink+ also reduces inference latency and accelerates training, enhancing reasoning efficiency.
InftyThink+ 是一个端到端的强化学习框架,优化了迭代推理过程,通过周期性总结中间想法来解决大型推理模型的局限性。它在AIME24上的准确率提高了21%,并优于传统的长链推理学习,同时减少了推理延迟并加速了训练。实验表明InftyThink+在分布外基准上具有更好的泛化能力。
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
Authors: Kaiyi Huang, Yukun Huang, Yu Li, Jianhong Bai, Xintao Wang, Zinan Lin, Xuefei Ning, Jiwen Yu, Pengfei Wan, Yu Wang, Xihui Liu
First: 2026-02-06T18:59:24+00:00 · Latest: 2026-02-06T18:59:24+00:00
Comments: Project website: https://karine-huang.github.io/CineScene/
Abstract
Cinematic video production requires control over scene-subject composition and camera movement, but live-action shooting remains costly due to the need for constructing physical sets. To address this, we introduce the task of cinematic video generation with decoupled scene context: given multiple images of a static environment, the goal is to synthesize high-quality videos featuring dynamic subject while preserving the underlying scene consistency and following a user-specified camera trajectory. We present CineScene, a framework that leverages implicit 3D-aware scene representation for cinematic video generation. Our key innovation is a novel context conditioning mechanism that injects 3D-aware features in an implicit way: By encoding scene images into visual representations through VGGT, CineScene injects spatial priors into a pretrained text-to-video generation model by additional context concatenation, enabling camera-controlled video synthesis with consistent scenes and dynamic subjects. To further enhance the model's robustness, we introduce a simple yet effective random-shuffling strategy for the input scene images during training. To address the lack of training data, we construct a scene-decoupled dataset with Unreal Engine 5, containing paired videos of scenes with and without dynamic subjects, panoramic images representing the underlying static scene, along with their camera trajectories. Experiments show that CineScene achieves state-of-the-art performance in scene-consistent cinematic video generation, handling large camera movements and demonstrating generalization across diverse environments.
中文标题/摘要
标题:CineScene:隐式3D作为有效的场景表示以生成电影级视频
电影级视频制作需要对场景-主体组成和摄像机运动进行控制,但由于需要构建物理场景,现场拍摄仍然成本高昂。为了解决这一问题,我们提出了分解场景上下文的电影级视频生成任务:给定静态环境的多张图像,目标是合成高质量的视频,其中包含动态主体,同时保持场景一致性并遵循用户指定的摄像机轨迹。我们提出了CineScene框架,该框架利用隐式3D感知场景表示进行电影级视频生成。我们的主要创新是一种新颖的上下文条件机制,以隐式方式注入3D感知特征:通过VGGT将场景图像编码为视觉表示,CineScene通过上下文连接将空间先验注入预训练的文本到视频生成模型中,从而实现受摄像机控制的视频合成,具有一致的场景和动态主体。为了进一步增强模型的鲁棒性,我们在训练过程中引入了一种简单而有效的输入场景图像随机打乱策略。为了解决训练数据不足的问题,我们使用Unreal Engine 5构建了一个场景分解数据集,包含场景及其动态主体的配对视频,全景图像代表底层静态场景,以及它们的摄像机轨迹。实验表明,CineScene在场景一致的电影级视频生成方面达到了最先进的性能,能够处理大范围的摄像机运动,并在多种环境中表现出良好的泛化能力。
Summary / 总结
CineScene addresses the challenge of generating high-quality cinematic videos with dynamic subjects while preserving scene consistency and following user-specified camera movements. It uses an implicit 3D-aware scene representation and a novel context conditioning mechanism that injects spatial priors into a pretrained text-to-video model. The framework also includes a random-shuffling strategy for training robustness. Experiments demonstrate that CineScene outperforms existing methods in scene-consistent video generation, handling large camera movements and generalizing across various environments.
CineScene通过引入一种场景解耦的电影视频生成框架来解决现场拍摄的高成本问题。该框架利用隐式的3D感知场景表示和一种新颖的上下文条件机制,实现具有一致场景和动态主体的摄像机控制视频合成。通过在训练过程中引入简单的随机洗牌策略来增强模型的鲁棒性。实验表明,CineScene在生成具有场景一致性的电影视频方面优于现有方法,即使面对大范围的摄像机运动和多样化的环境也能表现出色。
Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine
Authors: Reza E. Fazel, Arash Bakhtiary, Siavash A. Bigdeli
First: 2026-02-06T18:56:17+00:00 · Latest: 2026-02-06T18:56:17+00:00
Comments: 22 pages, 5 figures, 5 tables
Abstract
Addressing class imbalance is a central challenge in credit card fraud detection, as it directly impacts predictive reliability in real-world financial systems. To overcome this, the study proposes an enhanced workflow based on the Explainable Boosting Machine (EBM)-a transparent, state-of-the-art implementation of the GA2M algorithm-optimized through systematic hyperparameter tuning, feature selection, and preprocessing refinement. Rather than relying on conventional sampling techniques that may introduce bias or cause information loss, the optimized EBM achieves an effective balance between accuracy and interpretability, enabling precise detection of fraudulent transactions while providing actionable insights into feature importance and interaction effects. Furthermore, the Taguchi method is employed to optimize both the sequence of data scalers and model hyperparameters, ensuring robust, reproducible, and systematically validated performance improvements. Experimental evaluation on benchmark credit card data yields an ROC-AUC of 0.983, surpassing prior EBM baselines (0.975) and outperforming Logistic Regression, Random Forest, XGBoost, and Decision Tree models. These results highlight the potential of interpretable machine learning and data-driven optimization for advancing trustworthy fraud analytics in financial systems.
中文标题/摘要
标题:使用优化的可解释提升机提高信用卡欺诈检测
信用卡欺诈检测中的类不平衡是一个核心挑战,因为它直接影响到实际金融系统中的预测可靠性。为了解决这一问题,研究提出了一种基于透明的GA2M算法实现——可解释提升机(EBM)的增强工作流,通过系统性的超参数调优、特征选择和预处理改进来优化EBM。与可能引入偏差或造成信息丢失的传统抽样技术不同,优化后的EBM在准确性和可解释性之间实现了有效平衡,能够精确检测欺诈交易,并提供有关特征重要性和交互效应的可操作见解。此外,采用田口方法优化数据缩放器的顺序和模型超参数,确保了稳健、可重复和系统验证的性能改进。在基准信用卡数据上的实验评估显示,ROC-AUC为0.983,超过了先前的EBM基线(0.975),并优于逻辑回归、随机森林、XGBoost和决策树模型。这些结果突显了可解释机器学习和数据驱动优化在金融系统中可信欺诈分析中的潜力。
Summary / 总结
The study addresses the challenge of class imbalance in credit card fraud detection by proposing an optimized Explainable Boosting Machine (EBM) workflow. This workflow includes hyperparameter tuning, feature selection, and preprocessing refinement, and uses the Taguchi method to optimize data scalers and model hyperparameters. The optimized EBM achieves an ROC-AUC of 0.983, surpassing previous EBM baselines and outperforming other models like Logistic Regression, Random Forest, XGBoost, and Decision Tree. This demonstrates the effectiveness of interpretable machine learning and data-driven optimization in enhancing fraud detection accuracy and reliability.
研究通过提出优化的可解释提升机(EBM)工作流来解决信用卡欺诈检测中的类别不平衡问题。该工作流包括系统性的超参数调整、特征选择和预处理改进,并使用田口方法优化数据缩放器和模型超参数。优化后的EBM实现了0.983的ROC-AUC,超过了之前的EBM基线,并优于逻辑回归、随机森林、XGBoost和决策树等其他模型。这表明可解释的机器学习和数据驱动的优化在提升金融系统中的欺诈检测方面具有有效性。
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
Authors: Shenyuan Gao, William Liang, Kaiyuan Zheng, Ayaan Malik, Seonghyeon Ye, Sihyun Yu, Wei-Cheng Tseng, Yuzhu Dong, Kaichun Mo, Chen-Hsuan Lin, Qianli Ma, Seungjun Nah, Loic Magne, Jiannan Xiang, Yuqi Xie, Ruijie Zheng, Dantong Niu, You Liang Tan, K. R. Zentner, George Kurian, Suneel Indupuru, Pooya Jannaty, Jinwei Gu, Jun Zhang, Jitendra Malik, Pieter Abbeel, Ming-Yu Liu, Yuke Zhu, Joel Jang, Linxi "Jim" Fan
First: 2026-02-06T18:49:43+00:00 · Latest: 2026-02-06T18:49:43+00:00
Comments: Project page: https://dreamdojo-world.github.io/
Abstract
Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos. Our data mixture represents the largest video dataset to date for world model pretraining, spanning a wide range of daily scenarios with diverse objects and skills. To address the scarcity of action labels, we introduce continuous latent actions as unified proxy actions, enhancing interaction knowledge transfer from unlabeled videos. After post-training on small-scale target robot data, DreamDojo demonstrates a strong understanding of physics and precise action controllability. We also devise a distillation pipeline that accelerates DreamDojo to a real-time speed of 10.81 FPS and further improves context consistency. Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning. Systematic evaluation on multiple challenging out-of-distribution (OOD) benchmarks verifies the significance of our method for simulating open-world, contact-rich tasks, paving the way for general-purpose robot world models.
中文标题/摘要
标题:DreamDojo:来自大规模人类视频的通用机器人世界模型
能够在多种环境中模拟动作结果将彻底改变大规模通用代理的开发。然而,建模这些世界动力学,尤其是灵巧的机器人任务,由于数据覆盖有限和动作标签稀缺,提出了重大挑战。为此,我们引入了DreamDojo,一种基础世界模型,从44000小时的主观人类视频中学习多样化的交互和灵巧控制。我们的数据混合代表了迄今为止用于世界模型预训练的最大视频数据集,涵盖了广泛的生活场景,涉及多种物体和技能。为了解决动作标签稀缺的问题,我们引入了连续潜在动作作为统一的代理动作,增强了从未标记视频中转移交互知识的能力。在小型目标机器人数据上进行后训练后,DreamDojo展示了对物理的深刻理解和精确的动作可控性。我们还设计了一种蒸馏管道,将DreamDojo加速到每秒10.81帧,并进一步提高了上下文一致性。我们的工作基于生成世界模型支持了多种重要应用,包括实时远程操作、策略评估和基于模型的规划。在多个具有挑战性的分布外(OOD)基准上的系统评估验证了我们的方法对于模拟开放世界、接触丰富的任务的重要性,为通用机器人世界模型铺平了道路。
Summary / 总结
DreamDojo is a foundation world model that learns from 44,000 hours of human videos to simulate action outcomes in various environments. It addresses the challenge of limited data and scarce action labels by introducing continuous latent actions as proxy actions. After post-training on robot data, DreamDojo shows strong physics understanding and precise action control. The model is accelerated to real-time speed and improves context consistency. It enables applications like live teleoperation and model-based planning, and its effectiveness is verified through systematic evaluation on OOD benchmarks.
DreamDojo 是一个基于 44,000 小时人类视频的学习世界模型,用于模拟多样交互和灵巧控制,解决数据有限和动作标签稀缺的问题。经过小型机器人数据的微调后,DreamDojo 展现出强大的物理理解能力和精确的动作控制。通过蒸馏管道,模型加速到实时速度并提高了上下文一致性。它支持实时远程操作、策略评估和基于模型的规划等应用,并在多个 OOD 基准测试中进行了系统评估,证明了其在模拟复杂机器人任务方面的有效性。
Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches
Authors: Saber Omidi, Rene Akupan Ebunle, Se Young Yoon
First: 2026-02-06T18:42:01+00:00 · Latest: 2026-02-06T18:42:01+00:00
Comments: 10 pages, 9 figures. Preprint; manuscript under journal review
Abstract
This paper presents the design and implementation of data-driven optimal derivative feedback controllers for an active magnetic levitation system. A direct, model-free control design method based on the reinforcement learning framework is compared with an indirect optimal control design derived from a numerically identified mathematical model of the system. For the direct model-free approach, a policy iteration procedure is proposed, which adds an iteration layer called the epoch loop to gather multiple sets of process data, providing a more diverse dataset and helping reduce learning biases. This direct control design method is evaluated against a comparable optimal control solution designed from a plant model obtained through the combined Dynamic Mode Decomposition with Control (DMDc) and Prediction Error Minimization (PEM) system identification. Results show that while both controllers can stabilize and improve the performance of the magnetic levitation system when compared to controllers designed from a nominal model, the direct model-free approach consistently outperforms the indirect solution when multiple epochs are allowed. The iterative refinement of the optimal control law over the epoch loop provides the direct approach a clear advantage over the indirect method, which relies on a single set of system data to determine the identified model and control.
中文标题/摘要
标题:主动磁悬浮系统最优导数反馈控制设计与实现:基于数据驱动方法的实验研究
本文介绍了主动磁悬浮系统中基于数据驱动方法的最优导数反馈控制器的设计与实现。基于强化学习框架提出了一种直接的、无需模型的控制设计方法,并将其与从系统数值识别数学模型中推导出的间接最优控制设计进行了比较。对于直接的无需模型的方法,提出了一种策略迭代程序,该程序通过称为epoch循环的迭代层来收集多组过程数据,提供更丰富的数据集并帮助减少学习偏差。该直接控制设计方法与通过结合动态模式分解与控制(DMDc)和预测误差最小化(PEM)系统识别方法获得的植物模型设计的最优控制解决方案进行了比较。结果表明,当允许多个epoch时,直接的无需模型的方法在稳定和改善磁悬浮系统性能方面始终优于间接方法。epoch循环中最优控制律的迭代优化为直接方法提供了明显的优势,而间接方法则依赖于单组系统数据来确定识别模型和控制。
Summary / 总结
This paper investigates data-driven optimal derivative feedback controllers for an active magnetic levitation system. It compares a direct, model-free control design using a policy iteration procedure with an indirect optimal control design based on a numerically identified system model. The direct approach, which iteratively gathers multiple sets of process data, outperforms the indirect method in stabilizing and improving the system's performance, especially when multiple epochs are allowed, due to the iterative refinement of the optimal control law.
本文设计并实现了用于主动磁悬浮系统的数据驱动最优导数反馈控制器。研究比较了使用策略迭代程序的直接、无模型控制设计方法与基于系统数值模型的间接最优控制设计。直接方法通过多个周期收集多样化的数据,优于单次数据确定模型和控制的间接方法,尤其是在允许多个周期时,由于策略迭代程序对最优控制律的逐步优化,直接方法在稳定和提升系统性能方面表现更优。
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Authors: Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe
First: 2026-01-26T18:46:56+00:00 · Latest: 2026-02-06T18:38:32+00:00
Comments: Blog post: https://ssundaram21.github.io/soar/
Abstract
Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework designed to surface these pedagogical signals through meta-RL. A teacher copy of the model proposes synthetic problems for a student copy, and is rewarded with its improvement on a small subset of hard problems. Critically, SOAR grounds the curriculum in measured student progress rather than intrinsic proxy rewards. Our study on the hardest subsets of mathematical benchmarks (0/128 success) reveals three core findings. First, we show that it is possible to realize bi-level meta-RL that unlocks learning under sparse, binary rewards by sharpening a latent capacity of pretrained models to generate useful stepping stones. Second, grounded rewards outperform intrinsic reward schemes used in prior LLM self-play, reliably avoiding the instability and diversity collapse modes they typically exhibit. Third, analyzing the generated questions reveals that structural quality and well-posedness are more critical for learning progress than solution correctness. Our results suggest that the ability to generate useful stepping stones does not require the preexisting ability to actually solve the hard problems, paving a principled path to escape reasoning plateaus without additional curated data.
中文标题/摘要
标题:教学模型自我教学:边缘可学习性中的推理
模型能否学会突破自身的学习瓶颈?在初始成功率低且训练信号少的数据集上,强化学习方法对大型推理模型的微调会停滞不前。我们探讨了一个基本问题:预训练的语言模型能否利用潜在知识为它无法解决的问题生成自动课程?为此,我们设计了SOAR:一种自我改进框架,通过元强化学习揭示这些教学信号。教师模型副本为学生模型副本提出合成问题,并根据其在一小部分难题上的改进获得奖励。关键的是,SOAR将课程建立在可测量的学生进步上,而不是内在的代理奖励。在数学基准中最难的子集(0/128成功率)上进行的研究揭示了三个核心发现。首先,我们展示了通过增强预训练模型生成有用阶梯的能力,实现双层元强化学习的可能性,从而在稀疏的二元奖励下解锁学习。其次,基于奖励的表现优于先前LLM自我博弈中使用的内在奖励方案,可靠地避免了它们通常表现出的不稳定性及多样性崩溃模式。第三,分析生成的问题表明,结构质量和良好定义比解的正确性对学习进步更为关键。我们的结果表明,生成有用阶梯的能力不需要预先具备解决难题的能力,为在不增加额外策划数据的情况下逃离推理瓶颈提供了一条原则性的路径。
Summary / 总结
The research aims to enable models to improve themselves by generating synthetic problems, which can help them overcome learning plateaus. The study introduces SOAR, a self-improvement framework using meta-reinforcement learning where a teacher model proposes problems to a student model, rewarding the teacher based on the student's progress. Key findings include the realization of bi-level meta-RL under sparse rewards, the superiority of grounded rewards over intrinsic ones, and the importance of structural quality in generated questions for learning progress. This suggests that models can generate useful stepping stones without needing to solve the problems themselves, providing a path to escape reasoning plateaus.
研究旨在探索预训练语言模型是否能够自动生成课程以提高其解决初始无法解决的问题的能力。研究引入了SOAR框架,使用元强化学习,其中教师模型向学生模型提出合成问题,学生模型的进步决定了奖励。关键发现包括在稀疏奖励下实现元级元RL,基于实际进步的奖励比内在奖励更有效,以及生成问题的结构质量和良好定义比正确解更为重要。这些结果表明,生成有用的阶梯石不需要预先具备解决这些问题的能力,从而为摆脱推理瓶颈提供了一条有原则的路径,无需额外的定制数据。
Forecast Aware Deep Reinforcement Learning for Efficient Electricity Load Scheduling in Dairy Farms
Authors: Nawazish Ali, Rachael Shaw, Karl Mason
First: 2026-01-12T22:41:26+00:00 · Latest: 2026-02-06T18:36:06+00:00
Abstract
Dairy farming is an energy intensive sector that relies heavily on grid electricity. With increasing renewable energy integration, sustainable energy management has become essential for reducing grid dependence and supporting the United Nations Sustainable Development Goal 7 on affordable and clean energy. However, the intermittent nature of renewables poses challenges in balancing supply and demand in real time. Intelligent load scheduling is therefore crucial to minimize operational costs while maintaining reliability. Reinforcement Learning has shown promise in improving energy efficiency and reducing costs. However, most RL-based scheduling methods assume complete knowledge of future prices or generation, which is unrealistic in dynamic environments. Moreover, standard PPO variants rely on fixed clipping or KL divergence thresholds, often leading to unstable training under variable tariffs. To address these challenges, this study proposes a Deep Reinforcement Learning framework for efficient load scheduling in dairy farms, focusing on battery storage and water heating under realistic operational constraints. The proposed Forecast Aware PPO incorporates short term forecasts of demand and renewable generation using hour of day and month based residual calibration, while the PID KL PPO variant employs a proportional integral derivative controller to regulate KL divergence for stable policy updates adaptively. Trained on real world dairy farm data, the method achieves up to 1% lower electricity cost than PPO, 4.8% than DQN, and 1.5% than SAC. For battery scheduling, PPO reduces grid imports by 13.1%, demonstrating scalability and effectiveness for sustainable energy management in modern dairy farming.
中文标题/摘要
标题:基于天气预报的深度强化学习在奶牛场高效电力负荷调度中的应用
奶牛养殖是一个能源密集型行业,高度依赖电网电力。随着可再生能源的不断整合,可持续能源管理变得至关重要,以减少对电网的依赖并支持联合国可持续发展目标7(可负担和清洁能源)。然而,可再生能源的间歇性性质给实时平衡供需带来了挑战。因此,智能负荷调度对于在保持可靠性的同时降低运营成本至关重要。强化学习在提高能源效率和降低成本方面显示出潜力。然而,大多数基于强化学习的调度方法假设对未来价格或发电量有完全的了解,这在动态环境中是不现实的。此外,标准的PPO变体依赖于固定的剪切或KL散度阈值,通常在变动的电价下导致训练不稳定。为了解决这些挑战,本研究提出了一种基于深度强化学习的框架,用于在奶牛场进行高效的负荷调度,重点关注在现实操作约束下的电池存储和热水加热。所提出的Forecast Aware PPO结合了基于小时和月份的残差校准的短期需求和可再生能源发电量的预测,而PID KL PPO变体则使用比例积分微分控制器来调节KL散度,以实现稳定的策略更新。该方法在实际奶牛场数据上训练,与PPO相比,电费降低了1%,与DQN相比降低了4.8%,与SAC相比降低了1.5%。对于电池调度,PPO减少了13.1%的电网进口,证明了其在现代奶牛场可持续能源管理中的可扩展性和有效性。
Summary / 总结
This study addresses the challenges of efficient electricity load scheduling in dairy farms by proposing a Deep Reinforcement Learning framework, specifically a Forecast Aware PPO and a PID KL PPO variant, which incorporate short-term forecasts of demand and renewable generation. The method achieves up to 1% lower electricity cost compared to PPO, 4.8% compared to DQN, and 1.5% compared to SAC, and reduces grid imports by 13.1% for battery scheduling, showcasing its effectiveness in sustainable energy management.
该研究针对乳牛场等能源密集型产业面临的电网电力调度效率问题,提出了一种深度强化学习框架,特别是Forecast Aware PPO和PID KL PPO,以整合短期预测并调节KL发散以实现稳定的策略更新。该方法在实际数据上训练,相比PPO可降低高达1%的电费,相比DQN降低4.8%,相比SAC降低1.5%,并且通过电池调度减少了13.1%的电网进口量,展示了其在现代乳牛场可持续能源管理中的有效性和可扩展性。
Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics
Authors: Zuyuan Zhang, Sizhe Tang, Tian Lan
First: 2026-02-06T18:35:41+00:00 · Latest: 2026-02-06T18:35:41+00:00
Abstract
Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies, partial observability, and memory effects. The Bellman equation that is the central pillar of Reinforcement learning (RL) becomes only approximately valid under Non-Markovian. Existing work often focus on practical algorithm designs and offer limited theoretical treatment to address key questions, such as what dynamics are indeed capturable by the Bellman framework and how to inspire new algorithm classes with optimal approximations. In this paper, we present a novel topological viewpoint on temporal-difference (TD) based RL. We show that TD errors can be viewed as 1-cochain in the topological space of state transitions, while Markov dynamics are then interpreted as topological integrability. This novel view enables us to obtain a Hodge-type decomposition of TD errors into an integrable component and a topological residual, through a Bellman-de Rham projection. We further propose HodgeFlow Policy Search (HFPS) by fitting a potential network to minimize the non-integrable projection residual in RL, achieving stability/sensitivity guarantees. In numerical evaluations, HFPS is shown to significantly improve RL performance under non-Markovian.
中文标题/摘要
标题:协链视角下的时差信号在超越马尔可夫动力学学习中的应用
非马尔可夫动力学在现实环境中普遍存在,由于长程依赖、部分可观测性和记忆效应。贝尔曼方程作为强化学习(RL)的核心支柱,在非马尔可夫动力学下仅近似有效。现有工作往往侧重于实用算法的设计,对关键问题如贝尔曼框架能捕捉哪些动力学以及如何启发新的最优逼近算法类提供有限的理论处理。在本文中,我们提出了基于时差(TD)的RL的一种新颖拓扑视角。我们表明,TD误差可以被视为状态转换拓扑空间中的1-协链,而马尔可夫动力学则被解释为拓扑可积性。这种新颖的观点使我们能够通过贝尔曼-德·拉姆投影获得TD误差的霍奇型分解,分解为可积分分量和拓扑残差。我们进一步提出了霍奇流策略搜索(HFPS),通过拟合势能网络最小化RL中的非可积投影残差,从而获得稳定性和敏感性保证。在数值评估中,HFPS 显示出在非马尔可夫动力学下显著提高RL性能。
Summary / 总结
This paper introduces a topological perspective on temporal-difference (TD) based reinforcement learning (RL) to address the limitations of the Bellman equation under non-Markovian dynamics. The authors view TD errors as 1-cochains in the topological space of state transitions, decomposing them into an integrable component and a topological residual. They propose HodgeFlow Policy Search (HFPS) to fit a potential network that minimizes the non-integrable projection residual, leading to improved RL performance in non-Markovian environments.
本文提出了一种新的拓扑视角来处理基于时差(TD)的强化学习(RL)中的非马尔可夫动态。作者将TD误差视为状态转换拓扑空间中的1-上同调,并提出HodgeFlow策略搜索(HFPS)来拟合一个潜在网络,以最小化非可积投影残差,从而在非马尔可夫动态下提高RL性能。
Reliable Mislabel Detection for Video Capsule Endoscopy Data
Authors: Julia Werner, Julius Oexle, Oliver Bause, Maxime Le Floch, Franz Brinkmann, Hannah Tolle, Jochen Hampe, Oliver Bringmann
First: 2026-02-06T18:33:12+00:00 · Latest: 2026-02-06T18:33:12+00:00
Abstract
The classification performance of deep neural networks relies strongly on access to large, accurately annotated datasets. In medical imaging, however, obtaining such datasets is particularly challenging since annotations must be provided by specialized physicians, which severely limits the pool of annotators. Furthermore, class boundaries can often be ambiguous or difficult to define which further complicates machine learning-based classification. In this paper, we want to address this problem and introduce a framework for mislabel detection in medical datasets. This is validated on the two largest, publicly available datasets for Video Capsule Endoscopy, an important imaging procedure for examining the gastrointestinal tract based on a video stream of lowresolution images. In addition, potentially mislabeled samples identified by our pipeline were reviewed and re-annotated by three experienced gastroenterologists. Our results show that the proposed framework successfully detects incorrectly labeled data and results in an improved anomaly detection performance after cleaning the datasets compared to current baselines.
中文标题/摘要
标题:视频胶囊内镜数据的可靠误标检测
深度神经网络的分类性能强烈依赖于能够访问到大量准确标注的数据集。然而,在医学成像领域,获取这样的数据集特别具有挑战性,因为注释必须由专门的医生提供,这严重限制了注释者的范围。此外,类别边界往往模糊或难以定义,这进一步复杂化了基于机器学习的分类。在本文中,我们旨在解决这一问题,并介绍一种医学数据中误标检测的框架。该框架在两个最大的公开可用的视频胶囊内镜数据集上进行了验证,这是基于低分辨率图像视频流的重要成像程序,用于检查消化道。此外,通过我们的管道识别出的可能误标样本由三位有经验的胃肠病学家进行了审查和重新注释。我们的结果表明,所提出的框架成功地检测了错误标注的数据,并且在清理数据集后,异常检测性能优于当前基线。
Summary / 总结
This paper addresses the challenge of obtaining large, accurately annotated datasets in medical imaging, particularly for Video Capsule Endoscopy. The authors introduce a framework for detecting mislabeled data using deep neural networks and validate it on two large public datasets. After re-annotating potentially mislabeled samples, the framework improved the performance of anomaly detection compared to existing methods.
本文旨在解决医疗成像中获取大量准确标注数据的难题,特别是在视频胶囊内镜检查方面。作者提出了一种检测错误标注数据的框架,使用深度神经网络。该框架在两个大型公共数据集上进行了验证,潜在的错误标注样本由胃肠病专家重新标注。结果表明,所提出的方法能够有效识别错误标签,从而在清理数据集后相比现有方法提高了异常检测性能。
When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
Authors: Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu, Xiaoxia Wu
First: 2026-02-06T18:28:54+00:00 · Latest: 2026-02-06T18:28:54+00:00
Abstract
Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline modeling problem. We show that this decoupled formulation introduces substantial deployment and adaptation lag: (1) high time-to-serve, since a speculator must be trained offline for a considerable period before deployment; (2) delayed utility feedback, since the true end-to-end decoding speedup is only known after training and cannot be inferred reliably from acceptance rate alone due to model-architecture and system-level overheads; and (3) domain-drift degradation, as the target model is repurposed to new domains and the speculator becomes stale and less effective.
To address these issues, we present Aurora, a unified training-serving system that closes the loop by continuously learning a speculator directly from live inference traces. Aurora reframes online speculator learning as an asynchronous reinforcement-learning problem: accepted tokens provide positive feedback, while rejected speculator proposals provide implicit negative feedback that we exploit to improve sample efficiency. Our design integrates an SGLang-based inference server with an asynchronous training server, enabling hot-swapped speculator updates without service interruption. Crucially, Aurora supports day-0 deployment: a speculator can be served immediately and rapidly adapted to live traffic, improving system performance while providing immediate utility feedback. Across experiments, Aurora achieves a 1.5x day-0 speedup on recently released frontier models (e.g., MiniMax M2.1 229B and Qwen3-Coder-Next 80B). Aurora also adapts effectively to distribution shifts in user traffic, delivering an additional 1.25x speedup over a well-trained but static speculator on widely used models (e.g., Qwen3 and Llama3).
中文标题/摘要
标题:当RL遇到自适应推测训练:一个统一的训练-服务系统
推测解码可以显著加速大语言模型的服务,但大多数部署将推测训练与服务分离,将推测训练视为独立的离线建模问题。我们展示了这种分离的表述引入了重大的部署和适应滞后:(1)高上线时间,因为推测器必须在部署前经过长时间的离线训练;(2)延迟的效用反馈,因为只有在训练后才能知道端到端的解码加速效果,而不能仅通过接受率来可靠地推断,因为存在模型架构和系统层面的开销;(3)领域漂移退化,当目标模型被重新用于新领域时,推测器变得过时且效果较差。
为了解决这些问题,我们提出了Aurora,一个统一的训练-服务系统,通过连续从实时推理跟踪中学习推测器来闭合回路。Aurora将在线推测学习重新定义为异步强化学习问题:接受的令牌提供正反馈,而被拒绝的推测器提案提供隐含的负反馈,我们利用这种反馈来提高样本效率。我们的设计集成了基于SGLang的推理服务器和异步训练服务器,使推测器更新可以在不中断服务的情况下进行热插拔。至关重要的是,Aurora支持零日部署:推测器可以立即上线并快速适应实时流量,从而提高系统性能并提供即时的效用反馈。在实验中,Aurora在最近发布的前沿模型(如MiniMax M2.1 229B和Qwen3-Coder-Next 80B)上实现了1.5倍的零日加速。Aurora还能够有效适应用户流量分布的变化,在广泛使用的模型(如Qwen3和Llama3)上,与训练有素但静态的推测器相比,额外实现了1.25倍的加速。
Summary / 总结
The paper addresses the limitations of disentangled speculator training in LLM serving by introducing Aurora, a unified training-serving system. Aurora uses reinforcement learning to continuously learn a speculator directly from live inference traces, providing immediate utility feedback and enabling rapid adaptation. Experiments show that Aurora achieves a 1.5x day-0 speedup on frontier models and an additional 1.25x speedup on widely used models when adapting to distribution shifts.
论文解决了语言模型中推测解码的部署挑战,通常需要离线训练并引入延迟和适应问题。为此,作者提出了Aurora,这是一种统一的训练-服务系统,可以直接从实时推理痕迹中学习推测器,并使用强化学习。Aurora在前沿模型上实现了1.5倍的初始速度提升,在广泛使用的模型上通过适应用户流量的变化,额外实现了1.25倍的速度提升。
Continuous-time reinforcement learning: ellipticity enables model-free value function approximation
Authors: Wenlong Mou
First: 2026-02-06T18:25:33+00:00 · Latest: 2026-02-06T18:25:33+00:00
Abstract
We study off-policy reinforcement learning for controlling continuous-time Markov diffusion processes with discrete-time observations and actions. We consider model-free algorithms with function approximation that learn value and advantage functions directly from data, without unrealistic structural assumptions on the dynamics.
Leveraging the ellipticity of the diffusions, we establish a new class of Hilbert-space positive definiteness and boundedness properties for the Bellman operators. Based on these properties, we propose the Sobolev-prox fitted $q$-learning algorithm, which learns value and advantage functions by iteratively solving least-squares regression problems. We derive oracle inequalities for the estimation error, governed by (i) the best approximation error of the function classes, (ii) their localized complexity, (iii) exponentially decaying optimization error, and (iv) numerical discretization error. These results identify ellipticity as a key structural property that renders reinforcement learning with function approximation for Markov diffusions no harder than supervised learning.
中文标题/摘要
标题:连续时间强化学习:椭圆性使无模型价值函数近似成为可能
我们研究了控制连续时间马尔可夫扩散过程的离策强化学习,其中观测和动作是离散时间的。我们考虑了无需对动力学进行不切实际的结构假设的无模型算法,这些算法直接从数据中学习价值函数和优势函数。利用扩散过程的椭圆性,我们建立了贝尔曼算子的新一类希尔伯特空间正定性和有界性性质。基于这些性质,我们提出了Sobolev-近邻拟合$q$学习算法,该算法通过迭代求解最小二乘回归问题来学习价值函数和优势函数。我们推导了估计误差的oracle不等式,由(i)函数类的最佳逼近误差,(ii)其局部复杂性,(iii)指数衰减的优化误差,以及(iv)数值离散误差控制。这些结果表明,椭圆性是关键的结构特性,使得对于马尔可夫扩散过程的函数近似强化学习与监督学习一样简单。
Summary / 总结
The study focuses on off-policy reinforcement learning for continuous-time Markov diffusion processes. It leverages ellipticity to develop a new class of Hilbert-space properties for Bellman operators and proposes the Sobolev-prox fitted $q$-learning algorithm. The algorithm iteratively solves least-squares regression problems to learn value and advantage functions. Key findings include oracle inequalities that highlight the importance of ellipticity in making reinforcement learning with function approximation as challenging as supervised learning.
研究旨在通过直接从数据中学习价值和优势函数,解决连续时间马尔可夫扩散过程的离散时间观测和动作的离策增强学习问题,而不假设特定的动力学。利用扩散过程的椭圆性,作者提出了Sobolev-prox fitted $q$-学习算法,该算法通过迭代求解最小二乘回归问题来学习价值和优势函数。关键发现包括估计误差的oracle不等式,这些误差受最佳逼近误差、局部复杂性、优化误差和数值离散误差的影响,突出了椭圆性在使增强学习与函数逼近与监督学习一样具有挑战性方面的重要性。
EigenTrack: Spectral Activation Feature Tracking for Hallucination and Out-of-Distribution Detection in LLMs and VLMs
Authors: Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi
Venue: ICASSP 2026
First: 2025-09-19T08:05:28+00:00 · Latest: 2026-02-06T18:25:25+00:00
Comments: 5 pages, submitted to ICASSP 2026, September 2025
Abstract
Large language models (LLMs) offer broad utility but remain prone to hallucination and out-of-distribution (OOD) errors. We propose EigenTrack, an interpretable real-time detector that uses the spectral geometry of hidden activations, a compact global signature of model dynamics. By streaming covariance-spectrum statistics such as entropy, eigenvalue gaps, and KL divergence from random baselines into a lightweight recurrent classifier, EigenTrack tracks temporal shifts in representation structure that signal hallucination and OOD drift before surface errors appear. Unlike black- and grey-box methods, it needs only a single forward pass without resampling. Unlike existing white-box detectors, it preserves temporal context, aggregates global signals, and offers interpretable accuracy-latency trade-offs.
中文标题/摘要
标题:EigenTrack:用于大语言模型和视觉语言模型幻觉及分布外检测的光谱激活特征跟踪
大语言模型(LLMs)提供了广泛的应用,但仍容易出现幻觉和分布外(OOD)错误。我们提出EigenTrack,一种可解释的实时检测器,利用隐藏激活的光谱几何,这是一种模型动态的紧凑全局签名。通过将协方差光谱统计量,如熵、特征值间隙和与随机基线的KL散度流式传输到一个轻量级递归分类器中,EigenTrack 跟踪表示结构的时序变化,这些变化在表面错误出现之前就预示了幻觉和OOD漂移。与黑盒和灰盒方法不同,它只需要一次前向传播而无需重新采样。与现有的白盒检测器不同,它保留了时序上下文,聚合了全局信号,并提供了可解释的准确率-延迟权衡。
Summary / 总结
EigenTrack is an interpretable real-time detector designed to identify hallucination and out-of-distribution errors in large language models and vision-language models. It leverages the spectral geometry of hidden activations to track temporal shifts in representation structure. By streaming covariance-spectrum statistics into a lightweight recurrent classifier, EigenTrack can detect these errors before surface-level mistakes become apparent. This method requires only a single forward pass and offers interpretable accuracy-latency trade-offs, distinguishing it from both black- and grey-box methods, as well as existing white-box detectors.
EigenTrack 是一种可解释的实时检测器,用于识别大型语言模型和视觉-语言模型中的幻觉和分布外错误。它通过跟踪隐藏激活的谱几何来识别这些错误,这些错误在表面错误出现之前就信号化了。该方法涉及将协方差谱统计量流式传输到轻量级递归分类器中,并不需要额外的前向传递或采样,提供了可解释的准确率-延迟权衡。
WAFT: Warping-Alone Field Transforms for Optical Flow
Authors: Yihan Wang, Jia Deng
First: 2025-06-26T17:47:59+00:00 · Latest: 2026-02-06T18:23:11+00:00
Abstract
We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring, Sintel, and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being 1.3-4.1x faster than existing methods that have competitive accuracy (e.g., 1.3x than Flowformer++, 4.1x than CCMR+). Code and model weights are available at \href{https://github.com/princeton-vl/WAFT}{https://github.com/princeton-vl/WAFT}.
中文标题/摘要
标题:WAFT:光学流的扭曲独立场变换
我们引入了扭曲独立场变换(WAFT),这是一种简单而有效的光学流方法。WAFT 类似于 RAFT,但用高分辨率扭曲替代了代价体,以较低的内存成本实现了更好的准确性。这一设计挑战了构建代价体是实现高性能所必需的传统智慧。WAFT 是一种简单且灵活的元架构,具有最少的归纳偏见和对定制设计的依赖。与现有方法相比,WAFT 在 Spring、Sintel 和 KITTI 基准测试中排名第一,在 KITTI 上实现了最佳的零样本泛化,同时比具有竞争力准确度的现有方法(例如,比 Flowformer++ 快 1.3 倍,比 CCMR+ 快 4.1 倍)快 1.3-4.1 倍。代码和模型权重可在 https://github.com/princeton-vl/WAFT 获取。
Summary / 总结
WAFT is a novel method for optical flow that uses high-resolution warping instead of cost volume, achieving better accuracy with lower memory cost. It ranks first on Spring, Sintel, and KITTI benchmarks and shows the best zero-shot generalization on KITTI. Compared to other methods, WAFT is 1.3-4.1 times faster while maintaining competitive accuracy.
WAFT 是一种新颖的光流方法,使用高分辨率的扭曲代替成本体积,实现更高的准确性和更低的内存成本。它在 Spring、Sintel 和 KITTI 基准测试中排名第一,并且在 KITTI 上展示了最佳的零样本泛化能力。与其它方法相比,WAFT 在保持竞争力的同时快了 1.3-4.1 倍。
From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
Authors: Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias
First: 2026-02-06T18:17:37+00:00 · Latest: 2026-02-06T18:17:37+00:00
Abstract
Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but understand the underlying governing dynamics. While previous "AI Physicist" approaches have successfully recovered such laws, they typically rely on strong, domain-specific priors that effectively "bake in" the physics. Conversely, Vafa et al. recently showed that generic Transformers fail to acquire these world models, achieving high predictive accuracy without capturing the underlying physical laws. We bridge this gap by systematically introducing three minimal inductive biases. We show that ensuring spatial smoothness (by formulating prediction as continuous regression) and stability (by training with noisy contexts to mitigate error accumulation) enables generic Transformers to surpass prior failures and learn a coherent Keplerian world model, successfully fitting ellipses to planetary trajectories. However, true physical insight requires a third bias: temporal locality. By restricting the attention window to the immediate past -- imposing the simple assumption that future states depend only on the local state rather than a complex history -- we force the model to abandon curve-fitting and discover Newtonian force representations. Our results demonstrate that simple architectural choices determine whether an AI becomes a curve-fitter or a physicist, marking a critical step toward automated scientific discovery.
中文标题/摘要
标题:从开普勒到牛顿:归纳偏见引导Transformer学习世界模型
通用人工智能架构能否超越预测,发现支配宇宙的物理定律?真正的智能依赖于“世界模型”——因果抽象,使智能体不仅能预测未来状态,还能理解其背后的动力学。虽然先前的“AI物理学家”方法成功地恢复了这些定律,但它们通常依赖于强的、特定领域的先验知识,实际上“内置”了物理学。相反,Vafa等人最近表明,通用Transformer无法获得这些世界模型,在实现高预测准确性的同时未能捕捉到背后的物理定律。我们通过系统地引入三个最小的归纳偏见来弥合这一差距。我们证明,通过将预测公式化为连续回归以确保空间平滑性,并通过使用噪声上下文进行训练以减轻误差累积以确保稳定性,可以使通用Transformer超越先前的失败,学习一个连贯的开普勒世界模型,成功拟合行星轨迹的椭圆。然而,真正的物理洞察需要第三个偏见:时间局部性。通过限制注意力窗口到最近的过去——施加一个简单的假设,即未来状态仅依赖于局部状态而不是复杂的历史——迫使模型放弃曲线拟合,发现牛顿力的表示。我们的结果表明,简单的架构选择决定了AI是成为曲线拟合者还是物理学家,标志着自动科学发现的关键一步。
Summary / 总结
The study explores whether general-purpose AI architectures can discover physical laws beyond mere prediction. It introduces three inductive biases to generic Transformers: spatial smoothness, stability, and temporal locality. These biases enable the Transformers to learn coherent Keplerian world models, fitting ellipses to planetary trajectories. However, only by incorporating temporal locality, which restricts attention to the immediate past, can the model discover Newtonian force representations, indicating that simple architectural choices significantly influence whether an AI can become a physicist rather than a mere curve-fitter.
研究探讨了通用AI架构是否能在预测之外发现物理定律。研究引入了三种归纳偏置到通用Transformer中:空间平滑性、稳定性和时间局部性。这些偏置使Transformer能够学习一个连贯的开普勒世界模型,并拟合行星轨迹的椭圆。然而,时间局部性对于捕捉牛顿力表示至关重要,表明简单的架构选择显著影响AI是成为曲线拟合器还是物理学家。
Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs
Authors: Samir Abdaljalil, Parichit Sharma, Erchin Serpedin, Hasan Kurban
First: 2026-02-06T18:16:09+00:00 · Latest: 2026-02-06T18:16:09+00:00
Abstract
Hallucinations in large language models remain a persistent challenge, particularly in multilingual and generative settings where factual consistency is difficult to maintain. While recent models show strong performance on English-centric benchmarks, their behavior across languages, tasks, and hallucination types is not yet well understood. In this work, we introduce Halluverse-M^3, a dataset designed to enable systematic analysis of hallucinations across multiple languages, multiple generation tasks, and multiple hallucination categories. Halluverse-M^3 covers four languages, English, Arabic, Hindi, and Turkish, and supports two generation tasks: question answering and dialogue summarization. The dataset explicitly distinguishes between entity-level, relation-level, and sentence-level hallucinations. Hallucinated outputs are constructed through a controlled editing process and validated by human annotators, ensuring clear alignment between original content and hallucinated generations. Using this dataset, we evaluate a diverse set of contemporary open-source and proprietary language models on fine-grained hallucination detection. Our results show that question answering is consistently easier than dialogue summarization, while sentence-level hallucinations remain challenging even for the strongest models. Performance is highest in English and degrades in lower-resource languages, with Hindi exhibiting the lowest detection accuracy. Overall, Halluverse-M^3 provides a realistic and challenging benchmark for studying hallucinations in multilingual, multi-task settings. We release the dataset to support future research on hallucination detection and mitigation\footnote{https://huggingface.co/datasets/sabdalja/HalluVerse-M3}.
中文标题/摘要
标题:Halluverse-M^3:大规模语言模型中幻觉的多任务多语言基准
大规模语言模型中的幻觉仍然是一个持续的挑战,尤其是在多语言和生成环境中,保持事实一致性非常困难。尽管最近的模型在以英语为中心的基准测试中表现出色,但它们在不同语言、任务和幻觉类型中的行为尚未得到充分理解。在本研究中,我们引入了Halluverse-M^3数据集,旨在支持对多种语言、多种生成任务和多种幻觉类别的幻觉进行系统分析。Halluverse-M^3涵盖了四种语言:英语、阿拉伯语、印地语和土耳其语,并支持两种生成任务:问答和对话总结。数据集明确区分了实体级、关系级和句子级的幻觉。幻觉输出通过受控编辑过程构建,并由人类注释员验证,确保原始内容与幻觉生成之间有明确的对齐。使用此数据集,我们对一系列当代开源和专有语言模型进行了细粒度幻觉检测评估。结果显示,问答任务始终比对话总结任务更容易,而句子级幻觉即使对于最强的模型也仍然具有挑战性。性能在英语中最高,在低资源语言中下降,印地语的检测准确性最低。总体而言,Halluverse-M^3为研究多语言、多任务环境中的幻觉提供了一个现实且具有挑战性的基准。我们发布了该数据集以支持未来关于幻觉检测和缓解的研究\footnote{https://huggingface.co/datasets/sabdalja/HalluVerse-M3}。
Summary / 总结
This study introduces Halluverse-M^3, a benchmark for evaluating hallucinations in large language models across multiple languages and tasks. The dataset includes four languages (English, Arabic, Hindi, and Turkish) and two generation tasks (question answering and dialogue summarization). It distinguishes between entity-level, relation-level, and sentence-level hallucinations and evaluates contemporary language models, showing that question answering is easier than dialogue summarization, and performance is highest in English and lowest in Hindi. The benchmark provides a realistic challenge for future research on hallucination detection and mitigation.
研究旨在解决大型语言模型中的幻觉问题,特别是在多语言和生成性设置中的挑战。引入了Halluverse-M^3数据集,以系统地分析多语言、任务和类型中的幻觉。该数据集涵盖了英语、阿拉伯语、印地语和土耳其语,并包括问答和对话总结任务。主要发现表明,问答比对话总结更容易,而句子级别的幻觉尤其困难。性能在英语中最高,在印地语中最低,表明需要在低资源语言中改进模型。
Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy
Authors: Sumit Kumar, Suraj Jaiswal, Parampreet Singh, Vipul Arora
First: 2026-02-06T18:15:36+00:00 · Latest: 2026-02-06T18:15:36+00:00
Comments: Under Review at Transactions of Audio Speech and Language Processing
Abstract
The advancement of machine learning in audio analysis has opened new possibilities for technology-enhanced music education. This paper introduces a framework for automatic singing mistake detection in the context of music pedagogy, supported by a newly curated dataset. The dataset comprises synchronized teacher learner vocal recordings, with annotations marking different types of mistakes made by learners. Using this dataset, we develop different deep learning models for mistake detection and benchmark them. To compare the efficacy of mistake detection systems, a new evaluation methodology is proposed. Experiments indicate that the proposed learning-based methods are superior to rule-based methods. A systematic study of errors and a cross-teacher study reveal insights into music pedagogy that can be utilised for various music applications. This work sets out new directions of research in music pedagogy. The codes and dataset are publicly available.
中文标题/摘要
标题:歌唱错误的自动检测与分析在音乐教学中的应用
音频分析中机器学习的进步为技术增强的音乐教育开辟了新的可能性。本文介绍了一种在音乐教学背景下自动检测歌唱错误的框架,该框架依托于一个新编纂的数据集。该数据集包含同步的教师和学习者歌唱录音,并对学习者所犯的不同类型的错误进行了标注。利用该数据集,我们开发了不同的深度学习模型进行错误检测,并进行了基准测试。为了比较错误检测系统的有效性,提出了一种新的评估方法。实验表明,提出的基于学习的方法优于基于规则的方法。系统性地研究错误和跨教师研究揭示了可用于各种音乐应用的音乐教学见解。这项工作为音乐教学研究开辟了新的研究方向。代码和数据集已公开。
Summary / 总结
This paper presents a framework for automatically detecting singing mistakes in music pedagogy using a newly curated dataset of synchronized teacher-learner vocal recordings. Deep learning models are developed and benchmarked, and a new evaluation methodology is proposed to compare the performance of mistake detection systems. The results show that learning-based methods outperform rule-based methods. The study provides insights into music pedagogy and sets new research directions. The codes and dataset are publicly available.
该论文提出了一种利用新收集的数据集自动检测音乐教学中歌唱错误的框架。开发并基准测试了深度学习模型,实验表明这些模型优于基于规则的方法。通过系统研究错误和跨教师分析,获得了关于音乐教学的见解,为该领域的研究开辟了新方向。
Seeing Beyond Redundancy: Task Complexity's Role in Vision Token Specialization in VLLMs
Authors: Darryl Hannan, John Cooper, Dylan White, Yijing Watkins
First: 2026-02-06T18:13:01+00:00 · Latest: 2026-02-06T18:13:01+00:00
Comments: 25 pages
Abstract
Vision capabilities in vision large language models (VLLMs) have consistently lagged behind their linguistic capabilities. In particular, numerous benchmark studies have demonstrated that VLLMs struggle when fine-grained visual information or spatial reasoning is required. However, we do not yet understand exactly why VLLMs struggle so much with these tasks relative to others. Some works have focused on visual redundancy as an explanation, where high-level visual information is uniformly spread across numerous tokens and specific, fine-grained visual information is discarded. In this work, we investigate this premise in greater detail, seeking to better understand exactly how various types of visual information are processed by the model and what types of visual information are discarded. To do so, we introduce a simple synthetic benchmark dataset that is specifically constructed to probe various visual features, along with a set of metrics for measuring visual redundancy, allowing us to better understand the nuances of their relationship. Then, we explore fine-tuning VLLMs on a number of complex visual tasks to better understand how redundancy and compression change based upon the complexity of the data that a model is trained on. We find that there is a connection between task complexity and visual compression, implying that having a sufficient ratio of high complexity visual data is crucial for altering the way that VLLMs distribute their visual representation and consequently improving their performance on complex visual tasks. We hope that this work will provide valuable insights for training the next generation of VLLMs.
中文标题/摘要
标题:超越冗余:任务复杂性在VLLMs视觉标记专业化中的作用
视觉能力在视觉大型语言模型(VLLMs)中一直落后于语言能力。特别是,许多基准研究表明,VLLMs在需要精细视觉信息或空间推理的任务中表现不佳。然而,我们尚未完全理解为什么VLLMs在这些任务上表现如此糟糕。一些研究将视觉冗余作为解释,认为高层视觉信息均匀分布在大量标记中,而具体的精细视觉信息被丢弃。在本研究中,我们更详细地探讨了这一假设,试图更好地理解模型如何处理各种类型的视觉信息以及哪些类型的视觉信息被丢弃。为此,我们引入了一个简单的合成基准数据集,专门用于探测各种视觉特征,以及一套衡量视觉冗余的指标,使我们能够更好地理解它们之间的细微关系。然后,我们探索在多种复杂视觉任务上微调VLLMs,以更好地理解冗余和压缩如何根据模型训练的数据复杂性而变化。我们发现任务复杂性与视觉压缩之间存在联系,表明有足够的高复杂度视觉数据的比例对于改变VLLMs的视觉表示方式并相应提高其在复杂视觉任务上的表现至关重要。我们希望这项工作能为训练下一代VLLMs提供有价值的见解。
Summary / 总结
This study investigates why vision large language models (VLLMs) struggle with complex visual tasks, focusing on visual redundancy. The researchers developed a synthetic dataset to measure visual redundancy and explored fine-tuning VLLMs on various complex visual tasks. They found that task complexity influences visual compression, suggesting that a higher proportion of complex visual data is necessary to improve VLLMs' performance on intricate visual tasks.
研究探讨了为什么视觉大型语言模型(VLLMs)在复杂视觉任务上表现不佳,重点关注视觉冗余的作用。研究人员引入了一个合成数据集来衡量视觉冗余,并对VLLMs进行了复杂视觉任务的微调。他们发现任务复杂性影响视觉压缩,表明需要更多的复杂视觉数据来提高VLLMs在复杂视觉任务上的表现。
PANC: Prior-Aware Normalized Cut for Object Segmentation
Authors: Juan Gutiérrez, Victor Gutiérrez-Garcia, José Luis Blanco-Murillo
First: 2026-02-06T18:07:20+00:00 · Latest: 2026-02-06T18:07:20+00:00
Abstract
Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literature deliver non-deterministic partitions that are sensitive to initialization, seed order, and threshold heuristics.
We propose PANC, a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens to produce stable, controllable, and reproducible object masks. From the TokenCut approach, we augment the token-token affinity graph with a handful of priors coupled to anchor nodes. By manipulating the graph topology, we bias the spectral eigenspace toward partitions that are consistent with the annotations. Our approach preserves the global grouping enforced by dense self-supervised visual features, trading annotated tokens for significant gains in reproducibility, user control, and segmentation quality.
Using 5 to 30 annotations per dataset, our training-free method achieves state-of-the-art performance among weakly and unsupervised approaches on standard benchmarks (e.g., DUTS-TE, ECSSD, MS COCO). Contrarily, it excels in domains where dense labels are costly or intra-class differences are subtle. We report strong and reliable results on homogeneous, fine-grained, and texture-limited domains, achieving 96.8% (+14.43% over SotA), 78.0% (+0.2%), and 78.8% (+0.37%) average mean intersection-over-union (mIoU) on CrackForest (CFD), CUB-200-2011, and HAM10000 datasets, respectively. For multi-object benchmarks, the framework showcases explicit, user-controllable semantic segmentation.
中文标题/摘要
标题:PANC:先验归一化切分用于对象分割
完全无监督的分割管道通常会寻找最显眼的对象,如果存在的话。因此,文献中报道的大多数方法提供的分割结果是非确定性的,对初始化、种子顺序和阈值启发式方法敏感。
我们提出了一种弱监督的谱分割框架PANC,该框架使用少量注释的视觉标记来生成稳定、可控和可重复的对象掩码。从TokenCut方法出发,我们通过将少量先验与锚节点结合来增强标记-标记亲和图。通过操纵图的拓扑结构,我们偏向于与注释一致的谱特征空间。我们的方法保留了由密集自监督视觉特征强制执行的全局分组,用注释的标记换取更高的可重复性、用户控制和分割质量。
使用每数据集5到30个注释,我们的无需训练方法在标准基准(如DUTS-TE、ECSSD、MS COCO)上实现了弱监督和无监督方法的最新性能。在密集标签成本高或类内差异细微的领域,它表现出色。我们在同质、细粒度和纹理受限领域报告了强而可靠的结果,分别在CrackForest (CFD)、CUB-200-2011和HAM10000数据集上实现了96.8%(+14.43%超过SotA)、78.0%(+0.2%)和78.8%(+0.37%)的平均交并比(mIoU)。对于多对象基准,该框架展示了明确、用户可控的语义分割。
Summary / 总结
The paper proposes PANC, a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens to produce stable and controllable object masks. By augmenting the token-token affinity graph with priors, PANC biases the spectral eigenspace toward partitions consistent with the annotations, while preserving global grouping enforced by dense self-supervised visual features. PANC achieves state-of-the-art performance on standard benchmarks with 5 to 30 annotations per dataset, and demonstrates strong results on homogeneous, fine-grained, and texture-limited domains, achieving 96.8% mIoU on CrackForest, 78.0% on CUB-200-2011, and 78.8% on HAM10000.
PANC 是一种弱监督光谱分割框架,使用少量标注的视觉标记来生成稳定且可控的对象掩码。通过在标记-标记亲和图中添加先验信息,PANC 将谱空间偏向于与标注一致的分割,从而提高可重复性和分割质量。在标准基准上,PANC 使用每数据集 5 到 30 个标注达到最佳性能,分别在 CrackForest、CUB-200-2011 和 HAM10000 数据集上报告 96.8%、78.0% 和 78.8% 的平均交并比 (mIoU)。
A first realization of reinforcement learning-based closed-loop EEG-TMS
Authors: Dania Humaidan, Jiahua Xu, Jing Chen, Christoph Zrenner, David Emanuel Vetter, Laura Marzetti, Paolo Belardinelli, Timo Roine, Risto J. Ilmoniemi, Gian Luca Romani, Ulf Zieman
First: 2026-02-06T17:58:26+00:00 · Latest: 2026-02-06T17:58:26+00:00
Abstract
Background: Transcranial magnetic stimulation (TMS) is a powerful tool to investigate neurophysiology of the human brain and treat brain disorders. Traditionally, therapeutic TMS has been applied in a one-size-fits-all approach, disregarding inter- and intra-individual differences. Brain state-dependent EEG-TMS, such as coupling TMS with a pre-specified phase of the sensorimotor mu-rhythm, enables the induction of differential neuroplastic effects depending on the targeted phase. But this approach is still user-dependent as it requires defining an a-priori target phase. Objectives: To present a first realization of a machine-learning-based, closed-loop real-time EEG-TMS setup to identify user-independently the individual mu-rhythm phase associated with high- vs. low-corticospinal excitability states. Methods: We applied EEG-TMS to 25 participants targeting the supplementary motor area-primary motor cortex network and used a reinforcement learning algorithm to identify the mu-rhythm phase associated with high- vs. low corticospinal excitability. We employed linear mixed effects models and Bayesian analysis to determine effects of reinforced learning on corticospinal excitability indexed by motor evoked potential amplitude, and functional connectivity indexed by the imaginary part of resting-state EEG coherence. Results: Reinforcement learning effectively identified the mu-rhythm phase associated with high- vs. low-excitability states, and their repetitive stimulation resulted in long-term increases vs. decreases in functional connectivity in the stimulated sensorimotor network. Conclusions: We demonstrated for the first time the feasibility of closed-loop EEG-TMS in humans, a critical step towards individualized treatment of brain disorders.
中文标题/摘要
标题:基于强化学习的闭环EEG-TMS首次实现
背景:经颅磁刺激(TMS)是一种强大的工具,用于研究人类大脑的神经生理学和治疗脑部疾病。传统上,治疗性TMS采用一刀切的方法,忽视了个体差异。依赖大脑状态的EEG-TMS,如将TMS与感觉运动mu节律的预设相位耦合,能够根据目标相位诱导不同的神经可塑性效应。但这种方法仍然依赖于用户,因为它需要预先定义目标相位。目标:展示一种基于机器学习的闭环实时EEG-TMS设置,以独立于用户的方式识别与高-低皮质脊髓兴奋性状态相关的mu节律相位。方法:我们对25名参与者应用EEG-TMS,目标是补充运动区-初级运动皮层网络,并使用强化学习算法识别与高-低皮质脊髓兴奋性相关的mu节律相位。我们使用线性混合效应模型和贝叶斯分析来确定强化学习对皮质脊髓兴奋性(由运动诱发的电位幅度)和功能连接(由静息态EEG相干性的虚部)的影响。结果:强化学习有效地识别了与高-低兴奋性状态相关的mu节律相位,并且其重复刺激导致刺激的运动感觉网络的功能连接在长期中增加或减少。结论:我们首次展示了人类闭环EEG-TMS的可行性,这是实现脑部疾病个性化治疗的关键步骤。
Summary / 总结
The study aimed to develop a machine-learning-based, closed-loop EEG-TMS system to identify individual mu-rhythm phases associated with high- and low-corticospinal excitability. Using reinforcement learning, the system successfully identified these phases in 25 participants, and repetitive stimulation led to increased functional connectivity for high-excitability states and decreased connectivity for low-excitability states.
研究旨在开发一种基于机器学习的闭环EEG-TMS系统,以识别与高和低皮质脊髓兴奋性相关的个体mu-节奏相位。通过强化学习,系统成功地在25名参与者中识别了这些相位,并且重复刺激导致高兴奋性相位的功能连接增加,而低兴奋性相位的功能连接减少,表明个性化脑刺激治疗的潜力。
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
Authors: Min-Seop Kwak, Junho Kim, Sangdoo Yun, Dongyoon Han, Taekyung Kim, Seungryong Kim, Jin-Hwa Kim
First: 2025-06-13T16:19:00+00:00 · Latest: 2026-02-06T17:56:06+00:00
Comments: Project page at https://cvlab-kaist.github.io/MoAI
Abstract
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation, where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction. We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion. Project page is available at https://cvlab-kaist.github.io/MoAI.
中文标题/摘要
标题:通过跨模态注意力灌输实现对齐的新视角图像和几何合成
我们提出了一种基于扩散的框架,通过扭曲和修复方法实现对齐的新视角图像和几何生成。与需要密集姿态图像或仅限于领域内视角的姿势嵌入生成模型的先前方法不同,我们的方法利用现成的几何预测器从参考图像中预测部分几何,并将新视角合成公式化为图像和几何的修复任务。为了确保生成的图像和几何之间的准确对齐,我们提出了跨模态注意力蒸馏,其中在训练和推理期间将图像扩散分支的注意力图注入到并行的几何扩散分支中。这种多任务方法实现了协同效应,促进了几何上稳健的图像合成以及几何预测的明确性。我们进一步引入基于邻近的网格条件,整合深度和法线线索,插值点云并过滤错误预测的几何对生成过程的影响。实验证明,我们的方法在多种未见过的场景中实现了高保真度的新视角图像和几何外推合成,具有竞争力的插值重建质量,并生成几何对齐的彩色点云,实现全面的3D完成。项目页面可在https://cvlab-kaist.github.io/MoAI/获取。
Summary / 总结
The research introduces a diffusion-based framework for aligned novel view image and geometry synthesis using a warping-and-inpainting approach. Unlike previous methods that rely on dense posed images or pose-embedded generative models, this method leverages off-the-shelf geometry predictors and formulates novel-view synthesis as an inpainting task. The key innovation is cross-modal attention distillation, which ensures accurate alignment between generated images and geometry by injecting attention maps from the image diffusion branch into the geometry diffusion branch. The method also includes proximity-based mesh conditioning to integrate depth and normal cues, improving the quality of geometrically aligned colored point clouds. Experiments show high-fidelity extrapolative view synthesis and competitive reconstruction quality across various unseen scenes.
研究提出了一种基于扩散的框架,使用变形和修复的方法进行对齐的新视角图像和几何生成。不同于需要密集姿态图像或姿态嵌入生成模型的先前方法,该方法使用现成的几何预测器从参考图像中预测部分几何,并将新视角合成视为图像和几何的修复任务。关键创新是跨模态注意力蒸馏,通过将图像扩散分支的注意力图注入平行的几何扩散分支来确保生成的图像和几何之间的准确对齐。该方法还引入了基于邻近的网格条件,以整合深度和法线线索,提高几何对齐的彩色点云质量。实验结果显示了高保真度的视角外推合成和在各种未见场景下的竞争性重建质量。
Parameter-free Dynamic Regret: Time-varying Movement Costs, Delayed Feedback, and Memory
Authors: Emmanuel Esposito, Andrew Jacobsen, Hao Qiu, Mengxiao Zhang
First: 2026-02-06T17:50:22+00:00 · Latest: 2026-02-06T17:50:22+00:00
Abstract
In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $λ_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(1+P_T)(T+\sum_t λ_t)})$ regret, where $P_T$ is the path length of the comparator sequence over $T$ rounds. This recovers the optimal guarantees for both static and dynamic regret in standard OCO as a special case where $λ_t=0$ for all rounds. To demonstrate the versatility of our results, we consider two applications: OCO with delayed feedback and OCO with time-varying memory. We show that both problems can be translated into time-varying movement costs, establishing a novel reduction specifically for the delayed feedback setting that is of independent interest. A crucial observation is that the first-order dependence on movement costs in our regret bound plays a key role in enabling optimal comparator-adaptive dynamic regret guarantees in both settings.
中文标题/摘要
标题:无参数动态遗憾:时间变化的移动成本、延迟反馈和记忆
在本文中,我们研究了具有移动成本的无约束在线凸优化(OCO)中的动态遗憾。具体而言,我们通过允许移动成本系数$λ_t$在时间上任意变化,将标准设置进行了推广。我们的主要贡献是一个新颖的算法,它为这种设置建立了第一个比较器自适应动态遗憾界,保证了$\widetilde{\mathcal{O}}(\sqrt{(1+P_T)(T+\sum_t λ_t)})$遗憾,其中$P_T$是比较器序列在$T$轮中的路径长度。当$λ_t=0$对所有轮次时,这恢复了标准OCO中静态和动态遗憾的最优保证。为了展示我们结果的通用性,我们考虑了两个应用:具有延迟反馈的OCO和具有时间变化记忆的OCO。我们证明了这两个问题都可以转化为时间变化的移动成本,建立了特别针对延迟反馈设置的新型归约,这具有独立兴趣。一个关键观察是,我们遗憾界中对移动成本的一次性依赖在两个设置中实现最优比较器自适应动态遗憾保证中起着关键作用。
Supercharging Simulation-Based Inference for Bayesian Optimal Experimental Design
Authors: Samuel Klein, Willie Neiswanger, Daniel Ratner, Michael Kagan, Sean Gasiorowski
First: 2026-02-06T17:50:00+00:00 · Latest: 2026-02-06T17:50:00+00:00
Abstract
Bayesian optimal experimental design (BOED) seeks to maximize the expected information gain (EIG) of experiments. This requires a likelihood estimate, which in many settings is intractable. Simulation-based inference (SBI) provides powerful tools for this regime. However, existing work explicitly connecting SBI and BOED is restricted to a single contrastive EIG bound. We show that the EIG admits multiple formulations which can directly leverage modern SBI density estimators, encompassing neural posterior, likelihood, and ratio estimation. Building on this perspective, we define a novel EIG estimator using neural likelihood estimation. Further, we identify optimization as a key bottleneck of gradient based EIG maximization and show that a simple multi-start parallel gradient ascent procedure can substantially improve reliability and performance. With these innovations, our SBI-based BOED methods are able to match or outperform by up to $22\%$ existing state-of-the-art approaches across standard BOED benchmarks.
中文标题/摘要
标题:基于模拟推断加速贝叶斯最优实验设计
贝叶斯最优实验设计(BOED)旨在最大化实验的信息增益(EIG)。这需要一个似然估计,而在许多情况下,这种估计是不可处理的。基于模拟的推断(SBI)为此类情况提供了强大的工具。然而,现有将SBI与BOED直接连接的工作仅限于单一的对比EIG界。我们表明,EIG有多种形式,可以直接利用现代SBI密度估计器,包括神经后验、似然和比率估计。基于这一视角,我们定义了一个新的EIG估计器,使用神经似然估计。此外,我们识别出梯度优化是基于梯度的EIG最大化的关键瓶颈,并表明简单的多启动并行梯度上升程序可以显著提高可靠性和性能。借助这些创新,我们的基于SBI的BOED方法能够在标准的BOED基准测试中匹配或超越现有最佳方法,最多提高22%。
Summary / 总结
The paper aims to enhance Bayesian optimal experimental design (BOED) by leveraging simulation-based inference (SBI) to estimate the expected information gain (EIG). The authors propose a novel EIG estimator using neural likelihood estimation and address the optimization bottleneck with a multi-start parallel gradient ascent procedure. Their methods achieve performance improvements of up to 22% compared to existing state-of-the-art approaches on standard BOED benchmarks.
该论文通过利用模拟基于推理(SBI)来解决最大化贝叶斯最优实验设计(BOED)中的预期信息增益(EIG)的挑战。作者提出了一种使用神经似然估计的新EIG估计器,并引入了一种多启动并行梯度上升程序以克服优化瓶颈。他们的方法在标准的BOED基准测试中比现有最先进的方法提高了高达22%的性能。
FeNN-DMA: A RISC-V SoC for SNN acceleration
Authors: Zainab Aizaz, James C. Knight, Thomas Nowotny
First: 2025-11-01T22:59:54+00:00 · Latest: 2026-02-06T17:45:41+00:00
Abstract
Spiking Neural Networks (SNNs) are a promising, energy-efficient alternative to standard Artificial Neural Networks (ANNs) and are particularly well-suited to spatio-temporal tasks such as keyword spotting and video classification. However, SNNs have a much lower arithmetic intensity than ANNs and are therefore not well-matched to standard accelerators like GPUs and TPUs. Field Programmable Gate Arrays (FPGAs) are designed for such memory-bound workloads, and here we present a novel, fully-programmable RISC-V-based system-on-chip (FeNN-DMA), tailored to simulating SNNs on modern UltraScale+ FPGAs. We show that FeNN-DMA has comparable resource usage and energy requirements to state-of-the-art fixed-function SNN accelerators, yet it supports more complex neuron models and network topologies, and can simulate up to 16 thousand neurons and 256 million synapses per core. Using this functionality, we demonstrate state-of-the-art classification accuracy on the Spiking Heidelberg Digits, Neuromorphic MNIST and Braille tactile classification tasks.
中文标题/摘要
标题:FeNN-DMA:一种用于SNN加速的RISC-V SoC
脉冲神经网络(SNNs)是一种有前景的、能效高的替代标准人工神经网络(ANNs)的选择,特别适合空间-时间任务,如关键词识别和视频分类。然而,SNNs的算术强度远低于ANNs,因此不太适合标准加速器如GPU和TPU。现场可编程门阵列(FPGAs)设计用于此类内存受限的工作负载,这里我们介绍了一种新型的、完全可编程的基于RISC-V的系统级芯片(FeNN-DMA),专门用于在现代UltraScale+ FPGA上模拟SNNs。我们展示了FeNN-DMA在资源使用和能耗方面与最先进的固定功能SNN加速器相当,但它支持更复杂的神经元模型和网络拓扑,每个核心可以模拟多达16000个神经元和2.56亿个突触。利用这种功能,我们在Spiking Heidelberg Digits、Neuromorphic MNIST和Braille触觉分类任务上展示了最先进的分类精度。
Summary / 总结
FeNN-DMA is a RISC-V SoC designed to accelerate Spiking Neural Networks (SNNs), which are energy-efficient for spatio-temporal tasks. Motivated by the need for efficient SNN simulation, FeNN-DMA is tailored for modern UltraScale+ FPGAs and can simulate up to 16 thousand neurons and 256 million synapses per core, achieving comparable resource usage and energy requirements to fixed-function SNN accelerators while supporting more complex neuron models and network topologies. It demonstrates state-of-the-art classification accuracy on various tasks including Spiking Heidelberg Digits, Neuromorphic MNIST, and Braille tactile classification.
研究旨在开发一种高效的SNN加速器以处理时空任务。团队设计了FeNN-DMA,这是一种针对现代FPGA的RISC-V SoC,可以每核心模拟多达16000个神经元和2.56亿个突触。该系统在资源使用和能耗方面与固定功能SNN加速器相当,但支持更复杂的神经元模型和网络拓扑结构,并在Spiking Heidelberg Digits、Neuromorphic MNIST和Braille触觉分类任务上实现了最先进的分类精度。
Sample Complexity of Causal Identification with Temporal Heterogeneity
Authors: Ameya Rathod, Sujay Belsare, Salvik Krishna Nautiyal, Dhruv Laad, Ponnurangam Kumaraguru
First: 2026-02-06T17:44:00+00:00 · Latest: 2026-02-06T17:44:00+00:00
Abstract
Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes solvable only by exploiting specific structural or distributional assumptions. While recent work has separately utilized time-series dynamics or multi-environment heterogeneity to constrain this problem, we integrate both as complementary sources of heterogeneity. This integration yields unified necessary identifiability conditions and enables a rigorous analysis of the statistical limits of recovery under thin versus heavy-tailed noise. In particular, temporal structure is shown to effectively substitute for missing environmental diversity, possibly achieving identifiability even under insufficient heterogeneity. Extending this analysis to heavy-tailed (Student's t) distributions, we demonstrate that while geometric identifiability conditions remain invariant, the sample complexity diverges significantly from the Gaussian baseline. Explicit information-theoretic bounds quantify this cost of robustness, establishing the fundamental limits of covariance-based causal graph recovery methods in realistic non-stationary systems. This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.
中文标题/摘要
标题:因果识别中的样本复杂性与时间异质性
从观察数据中恢复唯一的因果图是一个病态问题,因为多个生成机制可能导致相同的观察分布。只有通过利用特定的结构或分布假设,该问题才变得可解。虽然近期的工作分别利用时间序列动力学或多环境异质性来约束该问题,但我们将其整合为互补的异质性来源。这种整合产生了统一的必要可识别条件,并使在薄尾噪声与重尾噪声下恢复统计极限的严格分析成为可能。特别是,时间结构被证明可以有效替代缺失的环境多样性,可能在异质性不足的情况下实现可识别性。将这种分析扩展到重尾(学生t)分布,我们证明几何可识别条件保持不变,但样本复杂性与高斯基线显著不同。明确的信息论界线量化了这种稳健性的成本,确立了基于协方差的因果图恢复方法在现实非平稳系统中的基本极限。这项工作将重点从因果结构是否可识别转移到其在实践中是否可统计恢复。
DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Authors: Nghiem T. Diep, Hien Dang, Tuan Truong, Tan Dinh, Huy Nguyen, Nhat Ho
First: 2025-10-05T19:27:48+00:00 · Latest: 2026-02-06T17:31:48+00:00
Comments: Nghiem T. Diep, Hien Dang, and Tuan Truong contributed equally to this work
Abstract
Parameter-efficient fine-tuning (PEFT) methods have become the standard paradigm for adapting large-scale models. Among these techniques, Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the Low-Rank Adaptation (LoRA) method by explicitly decomposing pre-trained weights into magnitude and directional components. In this work, we propose DoRAN, a new technique designed to stabilize training and boost the sample efficiency of DoRA. Our framework introduces two key components: (i) the injection of learnable noise into the denominator of DoRA weight decomposition, which serves as an adaptive regularizer to mitigate instabilities and improve the estimation rate of low-rank matrices; and (ii) the replacement of static low-rank matrices with auxiliary networks that generate them dynamically, enabling parameter coupling between the query and value projection matrices, leading to improved sample efficiency both theoretically and empirically. Comprehensive experiments on vision and language benchmarks show that DoRAN consistently outperforms LoRA, DoRA, and other PEFT baselines, underscoring the effectiveness of combining noise-based regularization with network-based parameter generation.
中文标题/摘要
标题:DoRAN:通过噪声注入和辅助网络稳定分解低秩适应
参数高效微调(PEFT)方法已成为适应大规模模型的标准范式。在这些技术中,分解低秩适应(DoRA)通过显式地将预训练权重分解为幅度和方向分量,已被证明能够提高低秩适应(LoRA)方法的学习能力和训练稳定性。在本文中,我们提出了一种新的技术DoRAN,旨在稳定训练并提高DoRA的样本效率。我们的框架引入了两个关键组件:(i) 在DoRA权重分解的分母中注入可学习的噪声,作为自适应正则化器,以减轻不稳定性并提高低秩矩阵的估计率;(ii) 用生成动态低秩矩阵的辅助网络替换静态低秩矩阵,使查询和值投影矩阵之间的参数耦合成为可能,从而在理论上和实验上都提高了样本效率。在视觉和语言基准上的全面实验表明,DoRAN在性能上始终优于LoRA、DoRA和其他PEFT基线,证明了结合基于噪声的正则化与基于网络的参数生成的有效性。
Summary / 总结
DoRAN is a method designed to stabilize the training and enhance the sample efficiency of DoRA, a parameter-efficient fine-tuning technique. It introduces learnable noise injection and auxiliary networks to improve the estimation of low-rank matrices, leading to better performance on vision and language benchmarks compared to LoRA, DoRA, and other PEFT baselines.
DoRAN 是一种旨在稳定训练并提高 DoRA 样本效率的技术,DoRA 是一种改进 LoRA 学习能力和训练稳定性的方法。它引入了可学习的噪声注入和辅助网络以动态生成低秩矩阵,从而在各种基准测试中表现出色,优于 LoRA、DoRA 及其他参数高效微调方法。
Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks
Authors: Haotian Li, Shijun Yang, Weizhen Qi, Silei Zhao, Rui Hua, Mingzhu Song, Xiaojian Yang, Chao Peng
First: 2026-01-26T07:27:47+00:00 · Latest: 2026-02-06T17:29:24+00:00
Abstract
Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offline training lags behind these dynamics, leaving the system's capability boundaries rigid and unknown. To address this, we propose the In-Situ Self-Evolving paradigm. This approach treats sequential task interactions as a continuous stream of experience, enabling the system to distill short-term execution feedback into long-term, reusable capabilities without access to ground-truth labels. Within this framework, we identify tool evolution as the critical pathway for capability expansion, which provides verifiable, binary feedback signals. Within this framework, we develop Yunjue Agent, a system that iteratively synthesizes, optimizes, and reuses tools to navigate emerging challenges. To optimize evolutionary efficiency, we further introduce a Parallel Batch Evolution strategy. Empirical evaluations across five diverse benchmarks under a zero-start setting demonstrate significant performance gains over proprietary baselines. Additionally, complementary warm-start evaluations confirm that the accumulated general knowledge can be seamlessly transferred to novel domains. Finally, we propose a novel metric to monitor evolution convergence, serving as a function analogous to training loss in conventional optimization. We open-source our codebase, system traces, and evolved tools to facilitate future research in resilient, self-evolving intelligence.
中文标题/摘要
标题:云阙代理技术报告:一种完全可复现的零起点就地自我演化代理系统用于开放性任务
传统的代理系统在任务分布持续漂移且外部监督稀缺的开放性环境中往往难以应对。它们依赖于静态工具集或离线训练,无法跟上这些动态变化,使系统的能效边界僵化且未知。为解决这一问题,我们提出了就地自我演化范式。该方法将顺序任务交互视为连续的经验流,使系统能够通过短期执行反馈提炼出长期可重用的能力,而无需访问真实标签。在此框架下,我们将工具演化视为能力扩展的关键路径,提供可验证的二元反馈信号。在此框架下,我们开发了云阙代理系统,该系统通过迭代合成、优化和重用工具来应对新兴挑战。为了优化演化效率,我们进一步引入了并行批处理演化策略。在零起点设置下的五个不同基准上的实证评估表明,与专有基线相比,性能显著提升。此外,补充的温启动评估证实了积累的一般知识可以无缝转移到新领域。最后,我们提出了一种新的度量标准来监控演化收敛,类似于传统优化中的训练损失函数。我们开源了我们的代码库、系统跟踪和演化工具,以促进未来在韧性自我演化智能方面的研究。
Summary / 总结
The paper addresses the limitations of conventional agent systems in open-ended environments by proposing the In-Situ Self-Evolving paradigm. This paradigm treats task interactions as a continuous stream of experience, allowing the system to evolve tools and capabilities without ground-truth labels. The authors developed Yunjue Agent, which iteratively synthesizes, optimizes, and reuses tools. They introduced a Parallel Batch Evolution strategy to enhance efficiency. Empirical evaluations across five benchmarks showed significant performance gains over proprietary baselines, and the system can transfer general knowledge to new domains. A novel metric was proposed to monitor evolution convergence, similar to training loss in conventional optimization. The codebase and tools are open-sourced for future research.
论文提出了一种基于现场自我演化的范式,以解决传统代理系统在开放环境中遇到的局限性。该范式将任务交互视为连续的经验流,使系统能够在没有地面真实标签的情况下进化工具和能力。作者开发了Yunjue代理,该代理通过迭代合成、优化和重用工具来应对新兴挑战。他们引入了并行批次进化策略以提高效率。在五个基准上的实证评估显示,该系统在性能上显著优于专有基线,并且可以将通用知识无缝转移到新领域。还提出了一种新的度量标准来监控进化收敛,类似于传统优化中的训练损失。代码库和工具已开源,以促进未来的研究。
FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
Authors: Zhuokun Chen, Jianfei Cai, Bohan Zhuang
First: 2026-02-05T04:57:21+00:00 · Latest: 2026-02-06T17:20:17+00:00
Abstract
Generating long-form content, such as minute-long videos and extended texts, is increasingly important for modern generative models. Block diffusion improves inference efficiency via KV caching and block-wise causal inference and has been widely adopted in diffusion language models and video generation. However, in long-context settings, block diffusion still incurs substantial overhead from repeatedly computing attention over a growing KV cache. We identify an underexplored property of block diffusion: cross-step redundancy of attention within a block. Our analysis shows that attention outputs from tokens outside the current block remain largely stable across diffusion steps, while block-internal attention varies significantly. Based on this observation, we propose FlashBlock, a cached block-external attention mechanism that reuses stable attention output, reducing attention computation and KV cache access without modifying the diffusion process. Moreover, FlashBlock is orthogonal to sparse attention and can be combined as a complementary residual reuse strategy, substantially improving model accuracy under aggressive sparsification. Experiments on diffusion language models and video generation demonstrate up to 1.44$\times$ higher token throughput and up to 1.6$\times$ reduction in attention time, with negligible impact on generation quality. Project page: https://caesarhhh.github.io/FlashBlock/.
中文标题/摘要
标题:FlashBlock:高效长上下文块扩散中的注意力缓存
生成长形式内容,如一分钟的视频和扩展文本,对于现代生成模型来说越来越重要。块扩散通过键值缓存和块级因果推理来提高推理效率,并已在扩散语言模型和视频生成中广泛采用。然而,在长上下文设置中,块扩散仍然会因反复计算不断增长的键值缓存而产生大量开销。我们发现块扩散的一个未被充分探索的特性:块内步骤之间的注意力交叉冗余。我们的分析表明,在扩散步骤中,当前块外部的令牌的注意力输出保持相对稳定,而块内部的注意力则显著变化。基于这一观察,我们提出了FlashBlock,这是一种缓存块外部注意力机制,通过重用稳定的注意力输出来减少注意力计算和键值缓存访问,而不修改扩散过程。此外,FlashBlock 与稀疏注意力是正交的,可以作为补充的残差重用策略,显著提高在激进稀疏化下的模型准确性。在扩散语言模型和视频生成上的实验表明,FlashBlock 可以将令牌吞吐量提高多达 1.44 倍,并将注意力时间减少多达 1.6 倍,对生成质量的影响可以忽略不计。项目页面:https://caesarhhh.github.io/FlashBlock/
Summary / 总结
The paper addresses the challenge of efficient long-form content generation by proposing FlashBlock, a cached block-external attention mechanism. FlashBlock reduces the computational overhead of block diffusion by reusing stable attention outputs from tokens outside the current block, thereby decreasing attention computation and KV cache access. Experiments show that FlashBlock can achieve up to 1.44 times higher token throughput and 1.6 times reduction in attention time without affecting generation quality.
FlashBlock通过利用块内部跨步注意力输出的稳定性来解决长上下文块扩散的低效率问题,提出了一种缓存块外部注意力机制,重用这些稳定的输出,从而减少注意力计算和KV缓存访问。实验结果显示,吞吐量最高可提高1.44倍,注意力时间减少1.6倍,且对生成质量影响甚微。
A Cycle-Consistent Graph Surrogate for Full-Cycle Left Ventricular Myocardial Biomechanics
Authors: Siyu Mu, Wei Xuan Chan, Choon Hwai Yap
First: 2026-02-06T17:14:38+00:00 · Latest: 2026-02-06T17:14:38+00:00
Abstract
Image-based patient-specific simulation of left ventricular (LV) mechanics is valuable for understanding cardiac function and supporting clinical intervention planning, but conventional finite-element analysis (FEA) is computationally intensive. Current graph-based surrogates do not have full-cycle prediction capabilities, and physics-informed neural networks often struggle to converge on complex cardiac geometries. We present CardioGraphFENet (CGFENet), a unified graph-based surrogate for rapid full-cycle estimation of LV myocardial biomechanics, supervised by a large FEA simulation dataset. The proposed model integrates (i) a global--local graph encoder to capture mesh features with weak-form-inspired global coupling, (ii) a gated recurrent unit-based temporal encoder conditioned on the target volume-time signal to model cycle-coherent dynamics, and (iii) a cycle-consistent bidirectional formulation for both loading and inverse unloading within a single framework. These strategies enable high fidelity with respect to traditional FEA ground truths and produce physiologically plausible pressure-volume loops that match FEA results when coupled with a lumped-parameter model. In particular, the cycle-consistency strategy enables a significant reduction in FEA supervision with only minimal loss in accuracy.
中文标题/摘要
标题:一种循环一致的图代理模型用于全循环左心室心肌生物力学
基于图像的患者特异性左心室(LV)力学仿真对于理解心脏功能和支持临床干预计划具有重要价值,但传统的有限元分析(FEA)计算量大。当前的图基代理模型不具备全循环预测能力,而基于物理的神经网络在复杂心脏几何结构上往往难以收敛。我们提出了一种名为CardioGraphFENet(CGFENet)的统一图基代理模型,用于快速估计LV心肌生物力学的全循环,该模型由大规模FEA仿真数据监督。所提出的模型整合了(i)全局-局部图编码器以捕获网格特征并采用弱形式启发式的全局耦合,(ii)基于门控循环单元的时序编码器,该编码器根据目标容积-时间信号建模循环一致的动力学,以及(iii)循环一致的双向公式,用于在单一框架内同时进行加载和逆卸载。这些策略使得模型能够与传统的FEA基准结果保持高度一致,并且在与简化参数模型耦合时产生生理上合理的压力-容积环,这些环与FEA结果相符。特别是,循环一致性策略使得FEA监督显著减少,同时仅轻微损失准确性。
Summary / 总结
The research aims to develop a computationally efficient method for simulating left ventricular (LV) mechanics using a graph-based surrogate model, addressing the limitations of conventional finite-element analysis (FEA) and existing graph-based surrogates. The method, CardioGraphFENet (CGFENet), combines a global-local graph encoder, a gated recurrent unit-based temporal encoder, and a cycle-consistent bidirectional formulation to achieve full-cycle biomechanical predictions. Key experimental findings show that CGFENet can produce high-fidelity results comparable to traditional FEA and generate physiologically plausible pressure-volume loops, even with reduced FEA supervision.
研究旨在通过图基代理模型开发一种计算高效的左心室(LV)力学模拟方法,解决传统有限元分析(FEA)和现有图基模型的局限性。方法CardioGraphFENet(CGFENet)结合了全局-局部图编码器、门控循环单元基时间编码器和双向循环一致性形式,实现了全周期生物力学预测。关键发现包括与FEA地面真实值的高度一致性,并能够生成生理上合理的压力-容积环路,即使在减少FEA监督的情况下也能保持较高的准确性。
Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents
Authors: Syemin Park, Soobin Park, Youn-kyung Lim
First: 2025-07-08T09:39:02+00:00 · Latest: 2026-02-06T17:07:11+00:00
Comments: Accepted to ACM Transactions on Computer-Human Interaction (TOCHI)
Abstract
Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, balance similarities and differences among characters, and intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.
中文标题/摘要
标题:Constella:基于LLM的多智能体支持故事创作者的关联性角色创作
构建角色群组并关注其关系动态是大多数长篇故事创作的关键方面。然而,我们的形成性研究(N=14)表明,作家在构想能够影响现有角色的新角色、平衡角色间的相似性和差异性以及细致描绘角色关系方面存在困难。基于这些观察,我们设计了Constella,一种基于LLM的多智能体工具,以支持故事创作者的关联性角色创作过程。Constella建议相关角色(FRIENDS DISCOVERY功能)、同时揭示多个角色的内心世界(JOURNALS功能),并通过角色间的回应展现关系(COMMENTS功能)。我们的为期7-8天的部署研究(N=11)表明,Constella使创作者能够构建由相关角色组成的广阔社区,促进了角色思想和情感的比较,并加深了创作者对角色关系的理解。最后,我们讨论了多智能体互动如何帮助分散创作者对角色群组的注意力和努力。
Summary / 总结
The paper addresses the challenge of creating interconnected characters in long-form storywriting, which writers often find difficult. It introduces Constella, an LLM-based tool that suggests related characters, reveals their inner thoughts, and shows relationships through inter-character responses. The study with 11 writers over 7-8 days found that Constella helped in creating expansive character communities, comparing characters' thoughts, and deepening understanding of relationships.
研究针对长篇故事创作中构建相互关联的角色这一挑战,发现作家常感到困难。该研究引入了Constella,一种基于LLM的工具,可以建议相关角色、揭示角色的内心世界,并通过角色间的互动展示关系。在为期7-8天的11名作家参与的研究中,发现Constella有助于创建广泛的角色社区,促进了角色间的比较,并加深了作家对角色关系的理解。
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
Authors: Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J. Zico Kolter, Jeff Schneider
First: 2025-06-09T14:48:19+00:00 · Latest: 2026-02-06T17:05:39+00:00
Abstract
Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step sampling while generating higher-reward action trajectories through decoupled training and noise-free reward signals. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a 9.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.
中文标题/摘要
标题:通过奖励意识一致性轨迹蒸馏加速离线RL中的扩散计划者
尽管扩散模型在决策任务中取得了显著成果,但其缓慢的推理速度仍然是一个关键限制。虽然一致性模型提供了一种潜在的解决方案,但现有应用要么在行为克隆下难以提供最优演示,要么依赖于在演员-评论家框架下多个网络的复杂并发训练。在本文中,我们提出了一种新的离线强化学习中的一致性蒸馏方法,该方法直接将奖励优化纳入蒸馏过程。我们的方法通过解耦训练和无噪声奖励信号实现了单步采样,同时生成更高奖励的动作轨迹。在Gym MuJoCo、FrankaKitchen和长时规划基准上的实证评估表明,我们的方法可以比之前最先进的方法提高9.7%的性能,同时在推理时间上比扩散模型快多达142倍。
Summary / 总结
This work addresses the slow inference speed of diffusion models in decision-making tasks by proposing a novel reward-aware consistency trajectory distillation method. The approach integrates reward optimization directly into the distillation process, enabling single-step sampling and higher-reward action trajectories through decoupled training and noise-free reward signals. Experiments on various benchmarks show a 9.7% improvement over previous state-of-the-art methods and up to 142x faster inference time compared to diffusion models.
该研究通过提出一种新的基于奖励感知的一致性轨迹蒸馏方法,解决决策任务中扩散模型的缓慢推理速度问题,适用于离线强化学习。该方法通过解耦训练和无噪声奖励信号提高单步采样并生成更高奖励的动作轨迹。实验结果显示,该方法在各种基准测试上比之前最先进的方法提高了9.7%,并在推理时间上实现了高达142倍的加速。
TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code
Authors: Jiangping Huang, Wenguang Ye, Weisong Sun, Jian Zhang, Mingyue Zhang, Yang Liu
First: 2026-02-06T16:59:48+00:00 · Latest: 2026-02-06T16:59:48+00:00
Abstract
Large Language Models (LLMs) often generate code with subtle but critical bugs, especially for complex tasks. Existing automated repair methods typically rely on superficial pass/fail signals, offering limited visibility into program behavior and hindering precise error localization. In addition, without a way to learn from prior failures, repair processes often fall into repetitive and inefficient cycles. To overcome these challenges, we present TraceCoder, a collaborative multi-agent framework that emulates the observe-analyze-repair process of human experts. The framework first instruments the code with diagnostic probes to capture fine-grained runtime traces, enabling deep insight into its internal execution. It then conducts causal analysis on these traces to accurately identify the root cause of the failure. This process is further enhanced by a novel Historical Lesson Learning Mechanism (HLLM), which distills insights from prior failed repair attempts to inform subsequent correction strategies and prevent recurrence of similar mistakes. To ensure stable convergence, a Rollback Mechanism enforces that each repair iteration constitutes a strict improvement toward the correct solution. Comprehensive experiments across multiple benchmarks show that TraceCoder achieves up to a 34.43\% relative improvement in Pass@1 accuracy over existing advanced baselines. Ablation studies verify the significance of each system component, with the iterative repair process alone contributing a 65.61\% relative gain in accuracy. Furthermore, TraceCoder significantly outperforms leading iterative methods in terms of both accuracy and cost-efficiency.
中文标题/摘要
标题:TraceCoder:一种基于跟踪的多智能体框架,用于自动调试LLM生成的代码
大型语言模型(LLMs)经常生成带有细微但关键错误的代码,尤其是在处理复杂任务时。现有的自动修复方法通常依赖于浅层的通过/失败信号,这限制了对程序行为的可见性,阻碍了精确的错误定位。此外,没有从先前失败中学习的方法,修复过程往往陷入重复且低效的循环中。为克服这些挑战,我们提出了TraceCoder,这是一种协作的多智能体框架,模拟了人类专家的观察-分析-修复过程。该框架首先通过诊断探针对代码进行仪器化,以捕获细粒度的运行时跟踪,从而深入了解其内部执行情况。然后,通过对这些跟踪进行因果分析,准确地识别出失败的根本原因。这一过程进一步通过一种新颖的历史教训学习机制(HLLM)得到增强,该机制从先前失败的修复尝试中提炼出见解,以指导后续的纠正策略并防止类似错误的重复发生。为了确保稳定收敛,回滚机制确保每次修复迭代都严格向正确解决方案改进。在多个基准测试中的全面实验表明,TraceCoder在通过@1准确率上相对于现有高级基线实现了高达34.43%的相对改进。消融研究验证了每个系统组件的重要性,仅迭代修复过程就贡献了65.61%的相对准确率提升。此外,TraceCoder在准确性和成本效率方面均显著优于领先的迭代方法。
Summary / 总结
TraceCoder is a trace-driven multi-agent framework designed to automate the debugging of code generated by Large Language Models (LLMs). It uses diagnostic probes to capture detailed runtime traces, conducts causal analysis to pinpoint the root cause of errors, and incorporates a Historical Lesson Learning Mechanism to avoid repeating past mistakes. Experiments show that TraceCoder improves Pass@1 accuracy by up to 34.43% compared to existing methods and significantly outperforms leading iterative methods in terms of both accuracy and cost-efficiency.
TraceCoder 是一个基于追踪的多代理框架,旨在自动化调试由大型语言模型(LLMs)生成的代码。它使用诊断探针捕获详细的运行时追踪,进行因果分析以确定错误的根本原因,并结合历史教训学习机制从过去的失败中吸取教训。实验表明,TraceCoder 的 Pass@1 准确率相比现有方法提高了最多 34.43%,并且在准确性和成本效率方面显著优于领先的迭代方法。
Downscaling Neural Network for Coastal Simulations
Authors: Zhi-Song Liu, Markus Büttner, Matthew Scarborough, Eirik Valseth, Vadym Aizinger, Bernhard Kainz, Andreas Rupp
First: 2024-08-29T14:16:13+00:00 · Latest: 2026-02-06T16:58:24+00:00
Abstract
Learning the fine-scale details of a coastal ocean simulation from a coarse representation is a challenging task. For real-world applications, high-resolution simulations are necessary to advance understanding of many coastal processes, specifically, to predict flooding resulting from tsunamis and storm surges. We propose a Downscaling Neural Network for Coastal Simulation (DNNCS) for spatiotemporal enhancement to learn the high-resolution numerical solution. Given images of coastal simulations produced on low-resolution computational meshes using low polynomial order discontinuous Galerkin discretizations and a coarse temporal resolution, the proposed DNNCS learns to produce high-resolution free surface elevation and velocity visualizations in both time and space. To model the dynamic changes over time and space, we propose grid-aware spatiotemporal attention to project the temporal features to the spatial domain for non-local feature matching. The coordinate information is also utilized via positional encoding. For the final reconstruction, we use the spatiotemporal bilinear operation to interpolate the missing frames and then expand the feature maps to the frequency domain for residual mapping. Besides data-driven losses, the proposed physics-informed loss guarantees gradient consistency and momentum changes, leading to a 24% reduction in root-mean-square error compared to the model trained with only data-driven losses. To train the proposed model, we propose a coastal simulation dataset and use it for model optimization and evaluation. Our method shows superior downscaling quality and fast computation compared to the state-of-the-art methods.
中文标题/摘要
标题:用于海岸模拟的下尺度神经网络
从粗略表示中学习海岸海洋模拟的精细尺度细节是一项具有挑战性的任务。为了实际应用,高分辨率模拟对于推进对许多海岸过程的理解是必要的,特别是预测海啸和风暴潮引起的洪水。我们提出了一种用于时空增强的海岸模拟下尺度神经网络(DNNCS),以学习高分辨率数值解。给定使用低多项式阶数间断Galerkin离散化和粗略时间分辨率在低分辨率计算网格上生成的海岸模拟图像,所提出的DNNCS学习在时间和空间上生成高分辨率自由表面高程和速度可视化。为了建模时间和空间上的动态变化,我们提出了一种网格感知的时空注意力机制,将时间特征投影到空间域进行非局部特征匹配。坐标信息也通过位置编码加以利用。在最终重建中,我们使用时空双线性操作插值缺失帧,然后将特征图扩展到频域进行残差映射。除了数据驱动的损失外,所提出的物理信息损失保证了梯度一致性并改变了动量变化,导致与仅使用数据驱动损失训练的模型相比,均方根误差减少了24%。为了训练所提出的模型,我们提出了一组海岸模拟数据集,并用于模型优化和评估。与最先进的方法相比,我们的方法在下尺度质量和计算速度上表现出色。
Summary / 总结
The research aims to enhance the resolution of coastal ocean simulations to better predict flooding from tsunamis and storm surges. The proposed Downscaling Neural Network for Coastal Simulation (DNNCS) learns high-resolution details from low-resolution simulations using grid-aware spatiotemporal attention and positional encoding. The method reduces the root-mean-square error by 24% compared to models using only data-driven losses. It also offers superior downscaling quality and faster computation than existing methods.
研究旨在通过提高沿海海洋模拟的分辨率,更好地预测来自海啸和风暴潮的洪水。作者提出了一种沿海模拟降尺度神经网络(DNNCS),使用网格感知时空注意力和位置编码来增强低分辨率模拟。该模型在仅使用数据驱动损失的模型基础上,将均方根误差降低了24%。该方法还提供了比最先进的方法更好的降尺度质量和更快的计算速度。
RFDM: Residual Flow Diffusion Model for Efficient Causal Video Editing
Authors: Mohammadreza Salehi, Mehdi Noroozi, Luca Morreale, Ruchika Chavhan, Malcolm Chadwick, Alberto Gil Ramos, Abhinav Mehrotra
First: 2026-02-06T16:56:30+00:00 · Latest: 2026-02-06T16:56:30+00:00
Abstract
Instructional video editing applies edits to an input video using only text prompts, enabling intuitive natural-language control. Despite rapid progress, most methods still require fixed-length inputs and substantial compute. Meanwhile, autoregressive video generation enables efficient variable-length synthesis, yet remains under-explored for video editing. We introduce a causal, efficient video editing model that edits variable-length videos frame by frame. For efficiency, we start from a 2D image-to-image (I2I) diffusion model and adapt it to video-to-video (V2V) editing by conditioning the edit at time step t on the model's prediction at t-1. To leverage videos' temporal redundancy, we propose a new I2I diffusion forward process formulation that encourages the model to predict the residual between the target output and the previous prediction. We call this Residual Flow Diffusion Model (RFDM), which focuses the denoising process on changes between consecutive frames. Moreover, we propose a new benchmark that better ranks state-of-the-art methods for editing tasks. Trained on paired video data for global/local style transfer and object removal, RFDM surpasses I2I-based methods and competes with fully spatiotemporal (3D) V2V models, while matching the compute of image models and scaling independently of input video length. More content can be found in: https://smsd75.github.io/RFDM_page/
中文标题/摘要
标题:RFDM:残差流扩散模型用于高效的因果视频编辑
指令视频编辑仅使用文本提示对输入视频进行编辑,实现直观的自然语言控制。尽管取得了快速进展,但大多数方法仍然需要固定长度的输入和大量计算。同时,自回归视频生成能够实现高效的可变长度合成,但在视频编辑方面的应用尚未得到充分探索。我们提出了一种因果高效的视频编辑模型,能够逐帧编辑可变长度的视频。为了提高效率,我们从2D图像到图像(I2I)扩散模型出发,并通过在时间步t的编辑条件化于模型在t-1步的预测来适应视频到视频(V2V)编辑。为了利用视频的时间冗余性,我们提出了一种新的I2I扩散前向过程公式,鼓励模型预测目标输出与先前预测之间的残差。我们称之为残差流扩散模型(RFDM),该模型将去噪过程集中在连续帧之间的变化上。此外,我们还提出了一种新的基准测试,更好地评估最先进的方法在编辑任务中的表现。RFDM在配对视频数据上训练,用于全局/局部风格转换和对象去除,超越了基于I2I的方法,并与完全时空(3D)V2V模型竞争,同时与图像模型的计算量相当,并且与输入视频长度无关地扩展。有关更多信息,请参见:https://smsd75.github.io/RFDM_page/
Summary / 总结
The research aims to develop an efficient causal video editing model that can apply text-based edits to variable-length videos. The method involves adapting a 2D image-to-image diffusion model to a video-to-video editing model by conditioning each frame on the previous prediction and focusing on the residual changes. The Residual Flow Diffusion Model (RFDM) outperforms 2D-based methods and matches the computational efficiency of image models, while being competitive with 3D spatiotemporal models in terms of performance and scalability with video length.
研究旨在使用文本提示实现高效因果视频编辑,解决固定长度输入和高计算需求的问题。方法包括将2D图像到图像的扩散模型适应为视频到视频编辑模型,重点预测连续帧之间的残差变化。残差流扩散模型(RFDM)通过在前一帧预测的基础上进行编辑预测,并利用时间冗余性来提高效率。实验结果表明,RFDM 在性能上超越了基于图像的方法,并且与3D视频到视频模型具有可竞争的性能,同时在输入视频长度上具有独立的可扩展性,计算效率与图像模型相当。
T-STAR: A Context-Aware Transformer Framework for Short-Term Probabilistic Demand Forecasting in Dock-Based Shared Micro-Mobility
Authors: Jingyi Cheng, Gonçalo Homem de Almeida Correia, Oded Cats, Shadi Sharif Azadeh
First: 2026-02-06T16:53:02+00:00 · Latest: 2026-02-06T16:53:02+00:00
Comments: This work has been submitted to Transportation Research Part C
Abstract
Reliable short-term demand forecasting is essential for managing shared micro-mobility services and ensuring responsive, user-centered operations. This study introduces T-STAR (Two-stage Spatial and Temporal Adaptive contextual Representation), a novel transformer-based probabilistic framework designed to forecast station-level bike-sharing demand at a 15-minute resolution. T-STAR addresses key challenges in high-resolution forecasting by disentangling consistent demand patterns from short-term fluctuations through a hierarchical two-stage structure. The first stage captures coarse-grained hourly demand patterns, while the second stage improves prediction accuracy by incorporating high-frequency, localized inputs, including recent fluctuations and real-time demand variations in connected metro services, to account for temporal shifts in short-term demand. Time series transformer models are employed in both stages to generate probabilistic predictions. Extensive experiments using Washington D.C.'s Capital Bikeshare data demonstrate that T-STAR outperforms existing methods in both deterministic and probabilistic accuracy. The model exhibits strong spatial and temporal robustness across stations and time periods. A zero-shot forecasting experiment further highlights T-STAR's ability to transfer to previously unseen service areas without retraining. These results underscore the framework's potential to deliver granular, reliable, and uncertainty-aware short-term demand forecasts, which enable seamless integration to support multimodal trip planning for travelers and enhance real-time operations in shared micro-mobility services.
中文标题/摘要
标题:T-STAR:一种基于上下文的短时概率需求预测变换器框架在基于码头的共享微移动性中
可靠的短时需求预测对于管理共享微移动性服务并确保响应式、用户中心的操作至关重要。本研究引入了T-STAR(两阶段空间和时间自适应上下文表示),这是一种基于变换器的新型概率框架,旨在以15分钟分辨率预测站点级别的共享单车需求。T-STAR通过分层的两阶段结构解决高分辨率预测中的关键挑战,通过分离一致的需求模式和短期波动。第一阶段捕捉粗粒度的小时需求模式,第二阶段通过结合高频、局部输入,包括最近的波动和连接地铁服务的实时需求变化,提高预测准确性,以考虑短期需求的时间变化。在两个阶段中使用时间序列变换器模型生成概率预测。使用华盛顿特区的Capital Bikeshare数据进行的大量实验表明,T-STAR在确定性和概率准确性方面均优于现有方法。该模型在各个站点和时间周期内表现出强大的空间和时间鲁棒性。零样本预测实验进一步突显了T-STAR在无需重新训练的情况下向未见过的服务区域转移的能力。这些结果强调了该框架在提供细粒度、可靠和不确定性感知的短时需求预测方面的潜力,这有助于无缝集成以支持旅行者的多模式行程规划,并增强共享微移动性服务的实时操作。
Summary / 总结
T-STAR is a transformer-based framework designed for short-term probabilistic demand forecasting in bike-sharing services. It addresses high-resolution forecasting challenges by using a two-stage structure to capture both coarse-grained hourly patterns and fine-grained short-term fluctuations. Experiments show that T-STAR outperforms existing methods in terms of both deterministic and probabilistic accuracy, and it demonstrates strong robustness across different stations and time periods. Additionally, T-STAR can effectively forecast in new areas without retraining, highlighting its potential for multimodal trip planning and real-time operations in shared micro-mobility services.
T-STAR 是一种基于变压器的框架,用于自行车共享服务的短期概率需求预测。它通过使用两阶段的层次结构来捕捉粗粒度的小时模式和细粒度的短期波动,以应对高分辨率的预测挑战。实验表明,T-STAR 在确定性和概率准确性方面都优于现有方法,并且具有强大的空间和时间鲁棒性,能够在新的区域进行预测而无需重新训练。
Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing
Authors: Meng Lou, Stanley Yu, Yizhou Yu
First: 2026-02-06T16:50:38+00:00 · Latest: 2026-02-06T16:50:38+00:00
Abstract
Adapting pre-trained vision models using parameter-efficient fine-tuning (PEFT) remains challenging, as it aims to achieve performance comparable to full fine-tuning using a minimal number of trainable parameters. When applied to complex dense prediction tasks, existing methods exhibit limitations, including input-agnostic modeling and redundant cross-layer representations. To this end, we propose AdaRoute, a new adapter-style method featuring a simple mixture-of-experts (MoE) architecture. Specifically, we introduce shared expert centers, where each expert is a trainable parameter matrix. During a feedforward pass, each AdaRoute module in the network dynamically generates weight matrices tailored for the current module via a simple dynamic parameter routing mechanism, which selectively aggregates parameter matrices in the corresponding expert center. Dynamic weight matrices in AdaRoute modules facilitate low-rank adaptation in an input-dependent manner, thus generating more customized and powerful feature representations. Moreover, since AdaRoute modules across multiple network layers share the same expert center, they improve feature diversity by promoting implicit cross-layer feature interaction. Extensive experiments demonstrate the superiority of AdaRoute on diverse vision tasks, including semantic segmentation, object detection and instance segmentation, and panoptic segmentation. Code will be available at: https://bit.ly/3NZcr0H.
中文标题/摘要
标题:参数作为专家:使用动态参数路由适应视觉模型
使用参数高效微调(PEFT)适应预训练视觉模型仍然具有挑战性,因为它旨在使用最少的可训练参数实现与完全微调相当的性能。当应用于复杂的密集预测任务时,现有方法存在输入无关建模和跨层冗余表示的局限性。为此,我们提出了一种新的适配器风格方法AdaRoute,其特点是具有简单混合专家(MoE)架构。具体来说,我们引入了共享专家中心,其中每个专家是一个可训练参数矩阵。在网络的前向传递过程中,每个AdaRoute模块通过简单的动态参数路由机制动态生成针对当前模块的权重矩阵,该机制选择性地聚合相应专家中心中的参数矩阵。AdaRoute模块中的动态权重矩阵以输入依赖的方式促进低秩适应,从而生成更定制和强大的特征表示。此外,由于多个网络层的AdaRoute模块共享相同的专家中心,它们通过促进隐式的跨层特征交互来提高特征多样性。广泛的实验表明,AdaRoute在包括语义分割、对象检测和实例分割以及全景分割在内的多种视觉任务上具有优越性。代码将在以下链接获取:https://bit.ly/3NZcr0H.
Summary / 总结
The paper addresses the challenge of parameter-efficient fine-tuning for vision models, proposing AdaRoute, a new adapter-style method with a mixture-of-experts architecture. Each AdaRoute module dynamically generates weight matrices by routing parameters from shared expert centers, enabling input-dependent low-rank adaptation. Experiments show AdaRoute outperforms existing methods on various vision tasks such as semantic segmentation, object detection, and panoptic segmentation.
论文针对视图模型参数高效微调的挑战,提出了AdaRoute,一种具有混合专家架构的新适配器方法。每个AdaRoute模块通过从共享专家中心路由参数动态生成权重矩阵,实现输入依赖的低秩适应,并促进跨层特征交互。实验结果表明,AdaRoute在语义分割、对象检测和全景分割等多种视觉任务上优于现有方法。
Zero-shot Generalizable Graph Anomaly Detection with Mixture of Riemannian Experts
Authors: Xinyu Zhao, Qingyun Sun, Jiayi Luo, Xingcheng Fu, Jianxin Li
First: 2026-02-06T16:46:30+00:00 · Latest: 2026-02-06T16:46:30+00:00
Abstract
Graph Anomaly Detection (GAD) aims to identify irregular patterns in graph data, and recent works have explored zero-shot generalist GAD to enable generalization to unseen graph datasets. However, existing zero-shot GAD methods largely ignore intrinsic geometric differences across diverse anomaly patterns, substantially limiting their cross-domain generalization. In this work, we reveal that anomaly detectability is highly dependent on the underlying geometric properties and that embedding graphs from different domains into a single static curvature space can distort the structural signatures of anomalies. To address the challenge that a single curvature space cannot capture geometry-dependent graph anomaly patterns, we propose GAD-MoRE, a novel framework for zero-shot Generalizable Graph Anomaly Detection with a Mixture of Riemannian Experts architecture. Specifically, to ensure that each anomaly pattern is modeled in the Riemannian space where it is most detectable, GAD-MoRE employs a set of specialized Riemannian expert networks, each operating in a distinct curvature space. To align raw node features with curvature-specific anomaly characteristics, we introduce an anomaly-aware multi-curvature feature alignment module that projects inputs into parallel Riemannian spaces, enabling the capture of diverse geometric characteristics. Finally, to facilitate better generalization beyond seen patterns, we design a memory-based dynamic router that adaptively assigns each input to the most compatible expert based on historical reconstruction performance on similar anomalies. Extensive experiments in the zero-shot setting demonstrate that GAD-MoRE significantly outperforms state-of-the-art generalist GAD baselines, and even surpasses strong competitors that are few-shot fine-tuned with labeled data from the target domain.
中文标题/摘要
标题:基于黎曼专家混合的零样本泛化图异常检测
图异常检测(GAD)旨在识别图数据中的不规则模式,最近的研究探索了零样本泛化GAD以使模型能够泛化到未见过的图数据集。然而,现有的零样本GAD方法大多忽略了不同异常模式内部几何差异,极大地限制了其跨域泛化能力。本文揭示了异常检测能力高度依赖于潜在的几何属性,并且将来自不同领域的图嵌入到单一静态曲率空间中会扭曲异常结构特征。为了解决单一曲率空间无法捕捉几何依赖的图异常模式的挑战,我们提出了GAD-MoRE,一种基于黎曼专家混合的零样本泛化图异常检测的新框架。具体而言,为了确保每个异常模式在最能检测其异常的黎曼空间中建模,GAD-MoRE 使用了一组专门的黎曼专家网络,每个网络在不同的曲率空间中运行。为了使原始节点特征与曲率特定的异常特征对齐,我们引入了一种异常感知的多曲率特征对齐模块,将输入投影到平行的黎曼空间中,从而能够捕捉到多种几何特征。最后,为了促进对未见过模式的更好泛化,我们设计了一种基于历史重建性能的记忆动态路由器,根据与相似异常的历史重建性能,自适应地将每个输入分配给最兼容的专家。在零样本设置下的广泛实验表明,GAD-MoRE 显著优于最先进的泛化GAD基线,并且甚至超过了使用目标领域标记数据进行少量样本微调的强大竞争对手。
Summary / 总结
This work addresses the challenge of zero-shot generalizable graph anomaly detection by proposing GAD-MoRE, which uses a Mixture of Riemannian Experts to model different anomaly patterns in distinct curvature spaces. The framework includes an anomaly-aware multi-curvature feature alignment module and a memory-based dynamic router to improve generalization. Experiments show that GAD-MoRE outperforms existing methods in zero-shot settings.
该研究提出了一种使用混合黎曼专家框架GAD-MoRE来解决零样本泛化图异常检测的问题,该框架能够在最适合检测异常模式的黎曼空间中建模不同的异常模式。方法包括一个异常感知的多曲率特征对齐模块和一个基于历史重建性能的记忆动态路由器,以增强泛化能力。实验表明,GAD-MoRE在零样本设置中显著优于现有的一般性图异常检测方法,甚至超过了使用目标领域标注数据进行少量样本微调的强竞争者。
Designing a Robust, Bounded, and Smooth Loss Function for Improved Supervised Learning
Authors: Soumi Mahato, Lineesh M. C
First: 2026-02-06T16:46:29+00:00 · Latest: 2026-02-06T16:46:29+00:00
Abstract
The loss function is crucial to machine learning, especially in supervised learning frameworks. It is a fundamental component that controls the behavior and general efficacy of learning algorithms. However, despite their widespread use, traditional loss functions have significant drawbacks when dealing with high-dimensional and outlier-sensitive datasets, which frequently results in reduced performance and slower convergence during training. In this work, we develop a robust, bounded, and smooth (RoBoS-NN) loss function to resolve the aforementioned hindrances. The generalization ability of the loss function has also been theoretically analyzed to rigorously justify its robustness. Moreover, we implement RoboS-NN loss in the framework of a neural network (NN) to forecast time series and present a new robust algorithm named $\mathcal{L}_{\text{RoBoS}}$-NN. To assess the potential of $\mathcal{L}_{\text{RoBoS}}$-NN, we conduct experiments on multiple real-world datasets. In addition, we infuse outliers into data sets to evaluate the performance of $\mathcal{L}_{\text{RoBoS}}$-NN in more challenging scenarios. Numerical results show that $\mathcal{L}_{\text{RoBoS}}$-NN outperforms the other benchmark models in terms of accuracy measures.
中文标题/摘要
标题:设计稳健、有界且平滑的损失函数以提高监督学习效果
损失函数对机器学习至关重要,尤其是在监督学习框架中。它是控制学习算法行为和整体有效性的基本组件。然而,尽管传统损失函数被广泛使用,但在处理高维和异常值敏感的数据集时,它们存在显著的缺点,这通常会导致性能降低和训练收敛速度减慢。在本工作中,我们开发了一种稳健、有界且平滑(RoBoS-NN)损失函数以解决上述问题。我们还从理论上分析了该损失函数的泛化能力,以严格证明其稳健性。此外,我们在神经网络(NN)框架中实现了RoboS-NN损失函数,用于预测时间序列,并提出了一种新的稳健算法$\mathcal{L}_{ ext{RoBoS}}$-NN。为了评估$\mathcal{L}_{ ext{RoBoS}}$-NN的潜力,我们在多个真实数据集上进行了实验。此外,我们向数据集中注入异常值,以评估$\mathcal{L}_{ ext{RoBoS}}$-NN在更具挑战性场景中的性能。数值结果表明,$\mathcal{L}_{ ext{RoBoS}}$-NN在准确性指标上优于其他基准模型。
AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
Authors: Alisia Lupidi, Bhavul Gauri, Thomas Simon Foster, Bassel Al Omari, Despoina Magka, Alberto Pepe, Alexis Audran-Reiss, Muna Aghamelu, Nicolas Baldwin, Lucia Cipolina-Kun, Jean-Christophe Gagnon-Audet, Chee Hau Leow, Sandra Lefdal, Hossam Mossalam, Abhinav Moudgil, Saba Nazir, Emanuel Tewolde, Isabel Urrego, Jordi Armengol Estape, Amar Budhiraja, Gaurav Chaurasia, Abhishek Charnalia, Derek Dunfield, Karen Hambardzumyan, Daniel Izcovich, Martin Josifoski, Ishita Mediratta, Kelvin Niu, Parth Pathak, Michael Shvartsman, Edan Toledo, Anton Protopopov, Roberta Raileanu, Alexander Miller, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach
First: 2026-02-06T16:45:02+00:00 · Latest: 2026-02-06T16:45:02+00:00
Comments: 49 pages, 14 figures, 10 tables
Abstract
LLM agents hold significant promise for advancing scientific research. To accelerate this progress, we introduce AIRS-Bench (the AI Research Science Benchmark), a suite of 20 tasks sourced from state-of-the-art machine learning papers. These tasks span diverse domains, including language modeling, mathematics, bioinformatics, and time series forecasting. AIRS-Bench tasks assess agentic capabilities over the full research lifecycle -- including idea generation, experiment analysis and iterative refinement -- without providing baseline code. The AIRS-Bench task format is versatile, enabling easy integration of new tasks and rigorous comparison across different agentic frameworks. We establish baselines using frontier models paired with both sequential and parallel scaffolds. Our results show that agents exceed human SOTA in four tasks but fail to match it in sixteen others. Even when agents surpass human benchmarks, they do not reach the theoretical performance ceiling for the underlying tasks. These findings indicate that AIRS-Bench is far from saturated and offers substantial room for improvement. We open-source the AIRS-Bench task definitions and evaluation code to catalyze further development in autonomous scientific research.
中文标题/摘要
标题:AIRS-Bench:前沿人工智能研究科学代理的一套任务集
大语言模型代理在推进科学研究方面具有巨大潜力。为了加速这一进程,我们引入了AIRS-Bench(人工智能研究科学基准),这是一个包含20项任务的套件,这些任务源自最新的机器学习论文。这些任务涵盖了语言建模、数学、生物信息学和时间序列预测等多个领域。AIRS-Bench任务评估代理在整个研究生命周期中的能力,包括创意生成、实验分析和迭代改进,而不提供基线代码。AIRS-Bench任务格式灵活,便于新任务的集成和不同代理框架之间的严格比较。我们使用前沿模型与顺序和并行支架相结合来建立基线。结果显示,代理在四项任务中超过了人类SOTA,但在其他十六项任务中未能达到。即使代理超越了人类基准,它们也没有达到底层任务的理论性能上限。这些发现表明,AIRS-Bench远未饱和,提供了巨大的改进空间。我们开源了AIRS-Bench任务定义和评估代码,以促进自主科学研究的发展。
Summary / 总结
AIRS-Bench is a suite of 20 tasks designed to evaluate the capabilities of AI agents in scientific research across various domains such as language modeling, mathematics, bioinformatics, and time series forecasting. The tasks assess agents' abilities throughout the research lifecycle, from idea generation to iterative refinement, without providing baseline code. The study uses frontier models to establish baselines and finds that agents outperform humans in four tasks but fall short in sixteen others, suggesting significant room for improvement. The authors open-source the task definitions and evaluation code to promote further development in autonomous scientific research.
AIRS-Bench 是一个包含 20 个任务的基准套件,这些任务来自最前沿的机器学习论文,涵盖了语言建模、数学、生物信息学和时间序列预测等多个领域。它评估代理在整个研究生命周期中的能力,从想法生成到迭代改进。研究表明,代理在四个任务中超越了人类,但在十六个任务中表现不佳,表明仍有很大的改进空间。AIRS-Bench 任务设计灵活,便于新任务的集成和不同框架之间的比较。
Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping
Authors: Chao Zhou, Tianyi Wei, Yiling Chen, Wenbo Zhou, Nenghai Yu
First: 2026-02-06T16:39:10+00:00 · Latest: 2026-02-06T16:39:10+00:00
Abstract
While modern text-to-image models excel at prompt-based generation, they often lack the fine-grained control necessary for specific user requirements like spatial layouts or subject appearances. Multi-condition control addresses this, yet its integration into Diffusion Transformers (DiTs) is bottlenecked by the conventional ``concatenate-and-attend'' strategy, which suffers from quadratic computational and memory overhead as the number of conditions scales. Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant. To this end, we propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies. Specifically, Position-Aligned Attention (PAA) linearizes spatial control by enforcing localized patch alignment, while Keyword-Scoped Attention (KSA) prunes irrelevant subject-driven interactions via semantic-aware masking. To facilitate efficient learning, we further introduce a Conditional Sensitivity-Aware Sampling (CSAS) strategy that reweights the training objective towards critical denoising phases, drastically accelerating convergence and enhancing conditional fidelity. Empirically, PKA delivers a 10.0$\times$ inference speedup and a 5.1$\times$ VRAM saving, providing a scalable and resource-friendly solution for high-fidelity multi-conditioned generation.
中文标题/摘要
标题:重新思考多条件DiTs:通过位置对齐和关键词限定消除冗余注意力
尽管现代文本到图像模型在基于提示的生成方面表现出色,但在实现特定用户需求(如空间布局或主题外观)的精细控制方面仍显不足。多条件控制解决了这一问题,但将其集成到扩散变换器(DiTs)中受到传统“连接和注意”策略的瓶颈限制,该策略随着条件数量的增加而遭受二次计算和内存开销。我们的分析表明,这种跨模态交互在空间上或语义上是冗余的。为此,我们提出了位置对齐和关键词限定注意力(PKA),这是一种高效框架,旨在消除这些冗余。具体而言,位置对齐注意力(PAA)通过局部补丁对齐来线性化空间控制,而关键词限定注意力(KSA)通过语义感知掩码消除无关的主题驱动交互。为了促进高效学习,我们进一步引入了一种条件敏感性感知采样(CSAS)策略,该策略重新加权训练目标,以关键去噪阶段为中心,大幅加速收敛并提高条件保真度。实验证明,PKA 提供了 10.0 倍的推理速度提升和 5.1 倍的 VRAM 节省,为高保真多条件生成提供了一种可扩展且资源友好的解决方案。
Summary / 总结
The research aims to improve the fine-grained control of text-to-image generation by addressing the inefficiencies in Multi-condition Diffusion Transformers (DiTs). The authors propose Position-aligned and Keyword-scoped Attention (PKA), which includes Position-Aligned Attention (PAA) and Keyword-Scoped Attention (KSA) to reduce redundant cross-modal interactions. Additionally, a Conditional Sensitivity-Aware Sampling (CSAS) strategy is introduced to enhance training efficiency. The results show a 10.0x inference speedup and a 5.1x VRAM saving, making multi-condition generation more scalable and resource-friendly.
本文提出了一种名为Position-aligned and Keyword-scoped Attention (PKA)的新框架,通过Position-Aligned Attention (PAA)和Keyword-Scoped Attention (KSA)来消除冗余的跨模态交互,并通过Conditional Sensitivity-Aware Sampling (CSAS)加速训练。该方法实现了10.0倍的推理加速和5.1倍的VRAM节省,提供了一种更高效的多条件生成解决方案。