Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions
Authors: Tobias Schmidt, Steffen Schneider, Matthias Bethge
Venue: NeurIPS 2025
First: 2025-10-24T17:59:46+00:00 · Latest: 2025-10-24T17:59:46+00:00
Comments: Accepted at NeurIPS 2025. The last two authors contributed equally.
Code is available at https://github.com/dynamical-inference/ebc
Abstract
We propose Equivariance by Contrast (EbC) to learn equivariant embeddings
from observation pairs $(\mathbf{y}, g \cdot \mathbf{y})$, where $g$ is drawn
from a finite group acting on the data. Our method jointly learns a latent
space and a group representation in which group actions correspond to
invertible linear maps -- without relying on group-specific inductive biases.
We validate our approach on the infinite dSprites dataset with structured
transformations defined by the finite group $G:= (R_m \times \mathbb{Z}_n
\times \mathbb{Z}_n)$, combining discrete rotations and periodic translations.
The resulting embeddings exhibit high-fidelity equivariance, with group
operations faithfully reproduced in latent space. On synthetic data, we further
validate the approach on the non-abelian orthogonal group $O(n)$ and the
general linear group $GL(n)$. We also provide a theoretical proof for
identifiability. While broad evaluation across diverse group types on
real-world data remains future work, our results constitute the first
successful demonstration of general-purpose encoder-only equivariant learning
from group action observations alone, including non-trivial non-abelian groups
and a product group motivated by modeling affine equivariances in computer
vision.
Summary / 总结
The paper introduces Equivariance by Contrast (EbC) to learn equivariant embeddings from observation pairs, enabling the identification of equivariant latent spaces without specific group biases. The method was validated on the dSprites dataset with structured transformations and on synthetic data from non-abelian groups, showing high-fidelity equivariance. Theoretical identifiability is also proven.
论文提出了一种称为Equivariance by Contrast (EbC)的方法,用于从观测对中学习不变嵌入,无需特定的归纳偏置即可学习潜在空间和群表示。该方法在具有结构变换的dSprites数据集上进行了验证,并在非交换群的合成数据上进一步验证,展示了潜在空间中的高保真不变性。同时证明了理论上的可识别性。
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Authors: Christy Li, Josep Lopez Camuñas, Jake Thomas Touchet, Jacob Andreas, Agata Lapedriza, Antonio Torralba, Tamar Rott Shaham
Venue: Neurips 2025
First: 2025-10-24T17:59:02+00:00 · Latest: 2025-10-24T17:59:02+00:00
Comments: 32 pages, 10 figures, Neurips 2025
Abstract
When a vision model performs image recognition, which visual attributes drive
its predictions? Detecting unintended reliance on specific visual features is
critical for ensuring model robustness, preventing overfitting, and avoiding
spurious correlations. We introduce an automated framework for detecting such
dependencies in trained vision models. At the core of our method is a
self-reflective agent that systematically generates and tests hypotheses about
visual attributes that a model may rely on. This process is iterative: the
agent refines its hypotheses based on experimental outcomes and uses a
self-evaluation protocol to assess whether its findings accurately explain
model behavior. When inconsistencies arise, the agent self-reflects over its
findings and triggers a new cycle of experimentation. We evaluate our approach
on a novel benchmark of 130 models designed to exhibit diverse visual attribute
dependencies across 18 categories. Our results show that the agent's
performance consistently improves with self-reflection, with a significant
performance increase over non-reflective baselines. We further demonstrate that
the agent identifies real-world visual attribute dependencies in
state-of-the-art models, including CLIP's vision encoder and the YOLOv8 object
detector.
中文标题/摘要
标题:自动检测视觉属性依赖的自省代理
当视觉模型进行图像识别时,哪些视觉属性驱动其预测?检测对特定视觉特征的非预期依赖对于确保模型的稳健性、防止过拟合和避免虚假相关性至关重要。我们提出了一种自动框架,用于检测训练好的视觉模型中的此类依赖性。我们方法的核心是一个自省代理,它系统地生成并测试模型可能依赖的视觉属性的假设。这一过程是迭代的:代理根据实验结果不断优化其假设,并使用自我评估协议来评估其发现是否准确解释了模型行为。当出现不一致时,代理会对其发现进行自省,并触发新一轮的实验。我们在一个包含130个模型的新基准上评估了我们的方法,这些模型旨在展示18个类别中多样化的视觉属性依赖性。结果显示,代理的性能通过自省持续提升,与非自省基线相比有显著的性能提升。我们进一步证明,代理能够识别最先进的模型中的真实世界视觉属性依赖性,包括CLIP的视觉编码器和YOLOv8目标检测器。
Summary / 总结
The research aims to identify which visual attributes drive predictions in vision models to ensure robustness and prevent overfitting. It introduces an automated framework with a self-reflective agent that iteratively generates and tests hypotheses about visual attributes. The agent's performance improves with self-reflection, showing a significant increase over non-reflective baselines. The agent successfully identifies real-world visual attribute dependencies in state-of-the-art models like CLIP and YOLOv8.
研究旨在识别驱动视觉模型预测的视觉属性,以确保模型的稳健性和防止过拟合。它引入了一种自动化框架,使用一个自我反思代理,该代理会迭代地生成和测试关于视觉属性的假设。代理的性能在自我反思后显著提高,优于非反思基线。该代理成功识别了如CLIP和YOLOv8等先进模型中的真实世界视觉属性依赖性。
Causal Climate Emulation with Bayesian Filtering
Authors: Sebastian Hickman, Ilija Trajkovic, Julia Kaltenborn, Francis Pelletier, Alex Archibald, Yaniv Gurwicz, Peer Nowack, David Rolnick, Julien Boussard
First: 2025-06-11T16:00:55+00:00 · Latest: 2025-10-24T17:57:09+00:00
Comments: 37 pages, 26 figures
Abstract
Traditional models of climate change use complex systems of coupled equations
to simulate physical processes across the Earth system. These simulations are
highly computationally expensive, limiting our predictions of climate change
and analyses of its causes and effects. Machine learning has the potential to
quickly emulate data from climate models, but current approaches are not able
to incorporate physically-based causal relationships. Here, we develop an
interpretable climate model emulator based on causal representation learning.
We derive a novel approach including a Bayesian filter for stable long-term
autoregressive emulation. We demonstrate that our emulator learns accurate
climate dynamics, and we show the importance of each one of its components on a
realistic synthetic dataset and data from two widely deployed climate models.
中文标题/摘要
标题:基于贝叶斯滤波的因果气候模拟
传统气候变化模型使用复杂的耦合方程系统来模拟地球系统中的物理过程。这些模拟高度计算密集型,限制了我们对气候变化的预测及其原因和影响的分析。机器学习有潜力快速模拟气候模型的数据,但当前的方法无法纳入基于物理的因果关系。在这里,我们开发了一个基于因果表示学习的可解释气候模型模拟器。我们提出了一种新的方法,包括贝叶斯滤波器以实现稳定长期自回归模拟。我们证明了我们的模拟器学习了准确的气候动力学,并在现实的合成数据集和两个广泛部署的气候模型数据上展示了每个组件的重要性。
Summary / 总结
The research aims to develop a computationally efficient emulator for climate models by incorporating causal relationships. The method involves a novel Bayesian filter for stable long-term autoregressive emulation. Key findings show that the emulator accurately learns climate dynamics and highlights the importance of each component in the model on both synthetic and real climate model data.
研究旨在通过引入因果关系来开发一个计算高效的气候模型模拟器。方法包括一种新颖的贝叶斯滤波器,用于稳定长期自回归模拟。关键发现表明,该模拟器能够准确学习气候动力学,并在合成数据和两个广泛使用的气候模型数据上强调了每个组件的重要性。
BachVid: Training-Free Video Generation with Consistent Background and Character
Authors: Han Yan, Xibin Song, Yifu Wang, Hongdong Li, Pan Ji, Chao Ma
First: 2025-10-24T17:56:37+00:00 · Latest: 2025-10-24T17:56:37+00:00
Comments: Project page: https://wolfball.github.io/bachvid
Abstract
Diffusion Transformers (DiTs) have recently driven significant progress in
text-to-video (T2V) generation. However, generating multiple videos with
consistent characters and backgrounds remains a significant challenge. Existing
methods typically rely on reference images or extensive training, and often
only address character consistency, leaving background consistency to
image-to-video models. We introduce BachVid, the first training-free method
that achieves consistent video generation without needing any reference images.
Our approach is based on a systematic analysis of DiT's attention mechanism and
intermediate features, revealing its ability to extract foreground masks and
identify matching points during the denoising process. Our method leverages
this finding by first generating an identity video and caching the intermediate
variables, and then inject these cached variables into corresponding positions
in newly generated videos, ensuring both foreground and background consistency
across multiple videos. Experimental results demonstrate that BachVid achieves
robust consistency in generated videos without requiring additional training,
offering a novel and efficient solution for consistent video generation without
relying on reference images or additional training.
中文标题/摘要
标题:BachVid:无需训练的视频生成,具有一致的背景和角色
扩散变换器(DiTs)最近在文本到视频(T2V)生成方面取得了显著进展。然而,生成具有一致角色和背景的多个视频仍然是一个重大挑战。现有方法通常依赖参考图像或大量训练,且往往仅解决角色一致性问题,而将背景一致性留给图像到视频模型。我们介绍了BachVid,这是第一个无需训练的方法,能够在无需任何参考图像的情况下实现一致的视频生成。我们的方法基于对DiT注意力机制和中间特征的系统分析,揭示了其在去噪过程中提取前景掩模和识别匹配点的能力。我们的方法利用这一发现,首先生成一个身份视频并缓存中间变量,然后将这些缓存的变量注入到新生成视频的相应位置,确保多个视频中前景和背景的一致性。实验结果表明,BachVid在无需额外训练的情况下实现了生成视频的一致性,提供了一种新颖且高效的解决方案,无需依赖参考图像或额外训练。
Summary / 总结
BachVid is a training-free method for generating consistent videos with both characters and backgrounds. It leverages the attention mechanism and intermediate features of Diffusion Transformers to extract foreground masks and identify matching points, which are then cached and reused to ensure consistency across multiple videos. Experimental results show that BachVid achieves robust consistency in generated videos without needing reference images or additional training.
BachVid 是一种无需训练的方法,用于生成具有一致人物和背景的视频。它利用扩散变换器中的注意力机制和中间特征来提取前景掩码并识别匹配点,然后将这些信息缓存并在新生成的视频中重用,以确保多个视频的一致性。实验结果表明,BachVid 在无需参考图像或额外训练的情况下实现了生成视频的一致性。
A Knowledge-Graph Translation Layer for Mission-Aware Multi-Agent Path Planning in Spatiotemporal Dynamics
Authors: Edward Holmberg, Elias Ioup, Mahdi Abdelguerfi
First: 2025-10-24T17:55:55+00:00 · Latest: 2025-10-24T17:55:55+00:00
Comments: 10 pages, 10 figures, conference submission
Abstract
The coordination of autonomous agents in dynamic environments is hampered by
the semantic gap between high-level mission objectives and low-level planner
inputs. To address this, we introduce a framework centered on a Knowledge Graph
(KG) that functions as an intelligent translation layer. The KG's two-plane
architecture compiles declarative facts into per-agent, mission-aware
``worldviews" and physics-aware traversal rules, decoupling mission semantics
from a domain-agnostic planner. This allows complex, coordinated paths to be
modified simply by changing facts in the KG. A case study involving Autonomous
Underwater Vehicles (AUVs) in the Gulf of Mexico visually demonstrates the
end-to-end process and quantitatively proves that different declarative
policies produce distinct, high-performing outcomes. This work establishes the
KG not merely as a data repository, but as a powerful, stateful orchestrator
for creating adaptive and explainable autonomous systems.
中文标题/摘要
标题:一种面向任务的多智能体路径规划时空动态知识图谱翻译层
在动态环境中的自主代理协调受到高层任务目标与低层规划输入之间语义差距的阻碍。为解决这一问题,我们提出了一种以知识图谱(KG)为中心的框架,该知识图谱充当智能翻译层。KG的双平面架构将声明性事实编译为每个代理的、面向任务的“世界观”和物理感知的穿越规则,将任务语义与领域无关的规划者解耦。这使得通过更改KG中的事实即可简单地修改复杂的、协调的路径。墨西哥湾的自主水下车辆(AUV)案例研究可视化地展示了端到端的过程,并定量证明了不同的声明性策略会产生不同的、高性能的结果。这项工作不仅将KG视为数据存储库,还将其视为强大的、状态化的协调器,用于创建适应性强且可解释的自主系统。
Summary / 总结
The paper addresses the challenge of coordinating autonomous agents in dynamic environments by introducing a Knowledge Graph (KG) as a translation layer. The KG's two-plane architecture translates high-level mission objectives into mission-aware and physics-aware rules for each agent, decoupling mission semantics from the planner. The study demonstrates this framework using AUVs in the Gulf of Mexico, showing that different declarative policies lead to distinct, high-performing paths. This work positions the KG as a powerful orchestrator for adaptive and explainable autonomous systems.
论文通过引入知识图谱(KG)作为智能翻译层,解决了自主代理在动态环境中的协调问题。KG的两层架构将高层任务目标转化为每个代理的使命感知和物理感知规则,解耦任务语义与领域无关的规划器。通过墨西哥湾的自主水下车辆(AUV)案例研究,展示了不同声明性策略产生高性能结果的有效性。KG被定位为强大的、状态化的协调器,用于创建适应性强且可解释的自主系统。
Mechanistic Interpretability for Neural TSP Solvers
Authors: Reuben Narad, Leonard Boussioux, Michael Wagner
First: 2025-10-24T17:54:19+00:00 · Latest: 2025-10-24T17:54:19+00:00
Abstract
Neural networks have advanced combinatorial optimization, with
Transformer-based solvers achieving near-optimal solutions on the Traveling
Salesman Problem (TSP) in milliseconds. However, these models operate as black
boxes, providing no insight into the geometric patterns they learn or the
heuristics they employ during tour construction. We address this opacity by
applying sparse autoencoders (SAEs), a mechanistic interpretability technique,
to a Transformer-based TSP solver, representing the first application of
activation-based interpretability methods to operations research models. We
train a pointer network with reinforcement learning on 100-node instances, then
fit an SAE to the encoder's residual stream to discover an overcomplete
dictionary of interpretable features. Our analysis reveals that the solver
naturally develops features mirroring fundamental TSP concepts: boundary
detectors that activate on convex-hull nodes, cluster-sensitive features
responding to locally dense regions, and separator features encoding geometric
partitions. These findings provide the first model-internal account of what
neural TSP solvers compute before node selection, demonstrate that geometric
structure emerges without explicit supervision, and suggest pathways toward
transparent hybrid systems that combine neural efficiency with algorithmic
interpretability. Interactive feature explorer:
https://reubennarad.github.io/TSP_interp
中文标题/摘要
标题:神经TSP求解器的机制可解释性
神经网络在组合优化方面取得了进展,基于Transformer的求解器在毫秒内就能为旅行商问题(TSP)找到接近最优解。然而,这些模型作为黑盒运行,无法提供它们学习的几何模式或在路径构建过程中使用的启发式方法的见解。我们通过将稀疏自编码器(SAEs)应用于基于Transformer的TSP求解器,解决了这一透明度问题,这是首次将基于激活的可解释性方法应用于运筹学模型。我们使用强化学习训练一个指针网络处理100节点实例,然后拟合一个SAE到编码器的残差流中,以发现一个过度完备的可解释特征字典。我们的分析表明,求解器自然发展出反映基本TSP概念的特征:边界检测器在凸包节点上激活,聚类敏感特征响应局部密集区域,以及编码几何分割的分离特征。这些发现提供了神经TSP求解器在节点选择前计算的首个模型内部解释,证明了几何结构的出现无需显式监督,并提出了结合神经效率与算法可解释性的透明混合系统途径。交互式特征探索器:https://reubennarad.github.io/TSP_interp
Summary / 总结
This paper addresses the lack of interpretability in neural network-based solvers for the Traveling Salesman Problem (TSP) by applying sparse autoencoders (SAEs) to a Transformer-based TSP solver. The authors train a pointer network using reinforcement learning and then fit an SAE to the encoder's residual stream to discover interpretable features. Key findings include the development of boundary detectors, cluster-sensitive features, and separator features, which mirror fundamental TSP concepts and suggest that geometric structure emerges without explicit supervision. This work provides insights into the internal computations of neural TSP solvers and paves the way for transparent hybrid systems combining neural efficiency with interpretability. Interactive feature explorer: https://reubennarad.github.io/TSP_interp
本文通过将稀疏自编码器(SAEs)应用于基于Transformer的旅行商问题(TSP)求解器,解决了神经网络在TSP求解中的不可解释性问题。作者使用强化学习训练了一个指针网络,并将SAE拟合到编码器的残差流中以发现可解释的特征。关键发现包括边界检测器、聚类敏感特征和分割特征,这些特征反映了TSP的基本概念,并表明几何结构可以在没有显式监督的情况下自然形成。这项工作为结合神经效率和解释性的透明混合系统提供了见解。互动特征探索器:https://reubennarad.github.io/TSP_interp
Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics
Authors: Reece Keller, Alyn Kirsch, Felix Pei, Xaq Pitkow, Leo Kozachkov, Aran Nayebi
First: 2025-05-30T18:21:40+00:00 · Latest: 2025-10-24T17:52:29+00:00
Comments: 17 pages, 7 figures
Abstract
Autonomy is a hallmark of animal intelligence, enabling adaptive and
intelligent behavior in complex environments without relying on external reward
or task structure. Existing reinforcement learning approaches to exploration in
reward-free environments, including a class of methods known as model-based
intrinsic motivation, exhibit inconsistent exploration patterns and do not
converge to an exploratory policy, thus failing to capture robust autonomous
behaviors observed in animals. Moreover, systems neuroscience has largely
overlooked the neural basis of autonomy, focusing instead on experimental
paradigms where animals are motivated by external reward rather than engaging
in ethological, naturalistic and task-independent behavior. To bridge these
gaps, we introduce a novel model-based intrinsic drive explicitly designed
after the principles of autonomous exploration in animals. Our method
(3M-Progress) achieves animal-like exploration by tracking divergence between
an online world model and a fixed prior learned from an ecological niche. To
the best of our knowledge, we introduce the first autonomous embodied agent
that predicts brain data entirely from self-supervised optimization of an
intrinsic goal -- without any behavioral or neural training data --
demonstrating that 3M-Progress agents capture the explainable variance in
behavioral patterns and whole-brain neural-glial dynamics recorded from
autonomously behaving larval zebrafish, thereby providing the first
goal-driven, population-level model of neural-glial computation. Our findings
establish a computational framework connecting model-based intrinsic motivation
to naturalistic behavior, providing a foundation for building artificial agents
with animal-like autonomy.
中文标题/摘要
标题:自主智能体的内在目标:虚拟斑马鱼的模型驱动探索预测了生态行为和全脑动力学
自主性是动物智能的标志,使动物能够在复杂环境中表现出适应性和智能行为,而无需依赖外部奖励或任务结构。现有的在无奖励环境中探索的强化学习方法,包括一类称为模型驱动内在动机的方法,表现出不一致的探索模式,并未收敛到探索性策略,因此未能捕捉到动物中观察到的稳健自主行为。此外,系统神经科学主要忽视了自主性的神经基础,而是专注于依赖外部奖励的实验范式,而不是探索生态学上的自然行为。为了弥合这些差距,我们提出了一种新的模型驱动内在驱动力,该驱动力明确遵循动物自主探索的原则。我们的方法(3M-Progress)通过跟踪在线世界模型与从生态位中学习的固定先验之间的差异来实现类似动物的探索。据我们所知,我们首次引入了一个完全通过自我监督优化内在目标来预测脑数据的自主体,无需任何行为或神经训练数据,证明了3M-Progress智能体能够捕捉到自主行为的斑马鱼记录的行为模式和全脑神经-胶质动力学的可解释方差,从而提供了第一个目标驱动的群体水平的神经-胶质计算模型。我们的发现建立了一个将模型驱动内在动机与自然行为联系起来的计算框架,为构建具有动物般自主性的智能体奠定了基础。
Summary / 总结
The research aims to develop autonomous agents that can explore complex environments without external rewards, similar to animal behavior. The method, 3M-Progress, uses a model-based intrinsic motivation approach that tracks the divergence between an online world model and a fixed prior learned from an ecological niche. The key experimental finding is that 3M-Progress agents can predict brain data from autonomously behaving larval zebrafish, capturing both behavioral patterns and whole-brain neural-glial dynamics, thus providing a goal-driven model of neural-glial computation.
研究旨在开发无需外部奖励即可在复杂环境中自主探索的代理,类似于动物行为。方法3M-Progress使用基于模型的内在动机方法,通过跟踪在线世界模型与从生态位中学习的固定先验之间的差异来实现这一目标。关键实验发现是,3M-Progress代理可以从自主行为的斑马鱼幼体中预测脑部数据,捕捉行为模式和整个大脑的神经-胶质动态,从而提供一个目标驱动的神经-胶质计算模型。
Multimodal Datasets with Controllable Mutual Information
Authors: Raheem Karim Hashmani, Garrett W. Merz, Helen Qu, Mariel Pettee, Kyle Cranmer
First: 2025-10-24T17:44:40+00:00 · Latest: 2025-10-24T17:44:40+00:00
Comments: 15 pages, 4 figures, 1 table. Our code is publicly available at
https://github.com/RKHashmani/MmMi-Datasets
Abstract
We introduce a framework for generating highly multimodal datasets with
explicitly calculable mutual information between modalities. This enables the
construction of benchmark datasets that provide a novel testbed for systematic
studies of mutual information estimators and multimodal self-supervised
learning techniques. Our framework constructs realistic datasets with known
mutual information using a flow-based generative model and a structured causal
framework for generating correlated latent variables.
中文标题/摘要
标题:可控互信息的多模态数据集
我们介绍了一种生成具有明确可计算互信息的多模态数据集的框架。这使得能够构建基准数据集,为互信息估计器和多模态自监督学习技术的系统研究提供新的测试平台。我们的框架使用基于流的生成模型和结构化因果框架构建具有已知互信息的现实数据集。
Federated Unlearning Made Practical: Seamless Integration via Negated Pseudo-Gradients
Authors: Alessio Mora, Carlo Mazzocca, Rebecca Montanari, Paolo Bellavista
First: 2025-04-08T09:05:33+00:00 · Latest: 2025-10-24T17:44:21+00:00
Abstract
The right to be forgotten is a fundamental principle of privacy-preserving
regulations and extends to Machine Learning (ML) paradigms such as Federated
Learning (FL). While FL enhances privacy by enabling collaborative model
training without sharing private data, trained models still retain the
influence of training data. Federated Unlearning (FU) methods recently proposed
often rely on impractical assumptions for real-world FL deployments, such as
storing client update histories or requiring access to a publicly available
dataset. To address these constraints, this paper introduces a novel method
that leverages negated Pseudo-gradients Updates for Federated Unlearning (PUF).
Our approach only uses standard client model updates, which are employed during
regular FL rounds, and interprets them as pseudo-gradients. When a client needs
to be forgotten, we apply the negation of their pseudo-gradients, appropriately
scaled, to the global model. Unlike state-of-the-art mechanisms, PUF seamlessly
integrates with FL workflows, incurs no additional computational and
communication overhead beyond standard FL rounds, and supports concurrent
unlearning requests. We extensively evaluated the proposed method on two
well-known benchmark image classification datasets (CIFAR-10 and CIFAR-100) and
a real-world medical imaging dataset for segmentation (ProstateMRI), using
three different neural architectures: two residual networks and a vision
transformer. The experimental results across various settings demonstrate that
PUF achieves state-of-the-art forgetting effectiveness and recovery time,
without relying on any additional assumptions.
中文标题/摘要
标题:联邦遗忘的实用化实现:通过否定伪梯度无缝集成
被遗忘的权利是隐私保护法规的基本原则,并延伸至机器学习(ML)范式如联邦学习(FL)。虽然FL通过协作模型训练而不共享私人数据来增强隐私,但训练后的模型仍然保留了训练数据的影响。最近提出的联邦遗忘(FU)方法往往依赖于在实际FL部署中难以实现的假设,例如存储客户端更新历史或需要访问公共数据集。为了解决这些限制,本文介绍了一种新颖的方法,该方法利用否定伪梯度更新进行联邦遗忘(PUF)。我们的方法仅使用标准客户端模型更新,这些更新在常规FL轮次中使用,并将其解释为伪梯度。当需要遗忘某个客户端时,我们对其伪梯度进行适当的否定和缩放,应用于全局模型。与现有机制不同,PUF能够无缝集成到FL工作流中,不会产生额外的计算和通信开销,且支持并发遗忘请求。我们使用三种不同的神经网络架构(两个残差网络和一个视觉变换器)在两个知名基准图像分类数据集(CIFAR-10和CIFAR-100)和一个实际医疗成像数据集(ProstateMRI)上对所提出的方法进行了广泛评估。实验结果在各种设置下表明,PUF在遗忘效果和恢复时间方面达到了最先进的水平,无需依赖任何额外假设。
Summary / 总结
This paper addresses the challenge of Federated Unlearning by introducing a novel method called PUF (Pseudo-Gradients Updates for Federated Unlearning). PUF leverages standard client model updates during regular Federated Learning rounds and interprets them as pseudo-gradients. When a client needs to be forgotten, the negation of their pseudo-gradients is applied to the global model. The method seamlessly integrates with FL workflows, incurs no additional overhead, and supports concurrent unlearning requests. Experimental results on benchmark datasets and a real-world medical imaging dataset show that PUF achieves state-of-the-art forgetting effectiveness and recovery time without requiring additional assumptions.
本文通过引入一种名为PUF(Pseudo-gradients Updates for Federated Unlearning)的新方法来解决联邦卸载(FU)的挑战。PUF利用标准客户端模型更新,在常规的联邦学习(FL)过程中进行解释,将其视为伪梯度。当需要卸载某个客户端时,应用其伪梯度的否定,并适当缩放。该方法无缝地集成到FL工作流程中,不增加额外的开销,并支持并发卸载请求。在基准数据集和实际医疗成像数据集上的实验表明,PUF在各种设置下实现了最先进的遗忘效果和恢复时间,无需任何额外假设。
DynamicPAE: Generating Scene-Aware Physical Adversarial Examples in Real-Time
Authors: Jin Hu, Xianglong Liu, Jiakai Wang, Junkai Zhang, Xianqi Yang, Haotong Qin, Yuqing Ma, Ke Xu
First: 2024-12-11T03:00:15+00:00 · Latest: 2025-10-24T17:42:08+00:00
Abstract
Physical adversarial examples (PAEs) are regarded as whistle-blowers of
real-world risks in deep-learning applications, thus worth further
investigation. However, current PAE generation studies show limited adaptive
attacking ability to diverse and varying scenes, revealing the urgent
requirement of dynamic PAEs that are generated in real time and conditioned on
the observation from the attacker. The key challenge in generating dynamic PAEs
is learning the sparse relation between PAEs and the observation of attackers
under the noisy feedback of attack training. To address the challenge, we
present DynamicPAE, the first generative framework that enables scene-aware
real-time physical attacks. Specifically, to address the noisy feedback problem
that obfuscates the exploration of scene-related PAEs, we introduce the
residual-guided adversarial pattern exploration technique. Residual-guided
training, which relaxes the attack training with a reconstruction task, is
proposed to enrich the feedback information, thereby achieving a more
comprehensive exploration of PAEs. To address the alignment problem between the
trained generator and the real-world scenario, we introduce the
distribution-matched attack scenario alignment, consisting of the
conditional-uncertainty-aligned data module and the skewness-aligned objective
re-weighting module. The former aligns the training environment with the
incomplete observation of the real-world attacker. The latter facilitates
consistent stealth control across different attack targets with the skewness
controller. Extensive digital and physical evaluations demonstrate the superior
attack performance of DynamicPAE, attaining a 2.07 $\times$ boost (58.8%
average AP drop under attack) on representative object detectors (e.g., DETR)
over state-of-the-art static PAE generating methods. Overall, our work opens
the door to end-to-end modeling of dynamic PAEs.
中文标题/摘要
标题:DynamicPAE: 生成实时场景感知物理对抗样本
物理对抗样本(PAEs)被视为深度学习应用中现实世界风险的揭露者,因此值得进一步研究。然而,当前的PAE生成研究显示了有限的适应性攻击能力,针对多样且变化的场景,揭示了实时生成并根据攻击者观察生成的动态PAE的迫切需求。生成动态PAE的关键挑战是在攻击训练的嘈杂反馈下学习PAEs与攻击者观察之间的稀疏关系。为了解决这一挑战,我们提出了DynamicPAE,这是第一个能够实现场景感知实时物理攻击的生成框架。具体而言,为了解决嘈杂反馈问题,该问题模糊了场景相关PAE的探索,我们引入了残差引导的对抗模式探索技术。残差引导训练通过引入重建任务来放松攻击训练,从而丰富反馈信息,实现对PAEs更全面的探索。为了解决训练生成器与现实世界场景之间的对齐问题,我们引入了分布匹配的攻击场景对齐,包括条件不确定性对齐的数据模块和偏斜对齐的目标重权模块。前者使训练环境与现实世界攻击者的不完整观察相一致。后者通过偏斜控制器促进了不同攻击目标之间的一致隐身控制。广泛的数字和物理评估表明,DynamicPAE具有优越的攻击性能,在代表性对象检测器(例如DETR)上相对于最先进的静态PAE生成方法,平均AP下降58.8%,提升幅度为2.07倍。总体而言,我们的工作为动态PAE的端到端建模打开了大门。
WorldGrow: Generating Infinite 3D World
Authors: Sikuang Li, Chen Yang, Jiemin Fang, Taoran Yi, Jia Lu, Jiazhong Cen, Lingxi Xie, Wei Shen, Qi Tian
First: 2025-10-24T17:39:52+00:00 · Latest: 2025-10-24T17:39:52+00:00
Comments: Project page: https://world-grow.github.io/ Code:
https://github.com/world-grow/WorldGrow
Abstract
We tackle the challenge of generating the infinitely extendable 3D world --
large, continuous environments with coherent geometry and realistic appearance.
Existing methods face key challenges: 2D-lifting approaches suffer from
geometric and appearance inconsistencies across views, 3D implicit
representations are hard to scale up, and current 3D foundation models are
mostly object-centric, limiting their applicability to scene-level generation.
Our key insight is leveraging strong generation priors from pre-trained 3D
models for structured scene block generation. To this end, we propose
WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our
method features three core components: (1) a data curation pipeline that
extracts high-quality scene blocks for training, making the 3D structured
latent representations suitable for scene generation; (2) a 3D block inpainting
mechanism that enables context-aware scene extension; and (3) a coarse-to-fine
generation strategy that ensures both global layout plausibility and local
geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset,
WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely
supporting infinite scene generation with photorealistic and structurally
consistent outputs. These results highlight its capability for constructing
large-scale virtual environments and potential for building future world
models.
中文标题/摘要
标题:WorldGrow:生成无限扩展的3D世界
我们解决了生成无限可扩展的3D世界的挑战——大型、连续的环境,具有连贯的几何结构和逼真的外观。现有方法面临关键挑战:2D提升方法在不同视角之间存在几何和外观不一致的问题,3D隐式表示难以扩展,当前的3D基础模型主要以对象为中心,限制了其在场景级生成中的应用。我们的关键见解是利用预训练3D模型的强大生成先验进行结构化场景块生成。为此,我们提出了WorldGrow,一种无界3D场景合成的分层框架。我们的方法包括三个核心组件:(1) 数据整理管道,提取高质量的场景块用于训练,使3D结构化潜在表示适合场景生成;(2) 3D块修补机制,实现上下文感知的场景扩展;(3) 粗糙到精细的生成策略,确保全局布局的合理性以及局部几何和纹理的准确性。在大规模3D-FRONT数据集上评估,WorldGrow在几何重建方面达到SOTA性能,同时支持无限场景生成,输出具有照片级真实感和结构一致性的结果。这些结果突显了其构建大规模虚拟环境的能力及其构建未来世界模型的潜力。
Summary / 总结
WorldGrow addresses the challenge of generating infinite 3D worlds with coherent geometry and realistic appearance. It proposes a hierarchical framework with three core components: a data curation pipeline for training, a 3D block inpainting mechanism for context-aware scene extension, and a coarse-to-fine generation strategy ensuring global layout and local fidelity. WorldGrow outperforms existing methods on the 3D-FRONT dataset, achieving state-of-the-art geometry reconstruction and supporting infinite, photorealistic scene generation.
WorldGrow 解决了生成具有连贯几何结构和逼真外观的无限3D世界的挑战。它提出了一种分层框架,包含三个核心组件:数据采集流水线、3D 块修补机制和粗到细生成策略。在3D-FRONT数据集上的评估表明,WorldGrow 在几何重建方面优于现有方法,并支持具有逼真和结构一致输出的无限场景生成。
Reinforcement Learning with Action Chunking
Authors: Qiyang Li, Zhiyuan Zhou, Sergey Levine
Venue: NeurIPS 2025
First: 2025-07-10T17:48:03+00:00 · Latest: 2025-10-24T17:37:23+00:00
Comments: The Thirty-Ninth Annual Conference on Neural Information Processing
Systems (NeurIPS 2025); 36 pages, 17 figures
Abstract
We present Q-chunking, a simple yet effective recipe for improving
reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks.
Our recipe is designed for the offline-to-online RL setting, where the goal is
to leverage an offline prior dataset to maximize the sample-efficiency of
online learning. Effective exploration and sample-efficient learning remain
central challenges in this setting, as it is not obvious how the offline data
should be utilized to acquire a good exploratory policy. Our key insight is
that action chunking, a technique popularized in imitation learning where
sequences of future actions are predicted rather than a single action at each
timestep, can be applied to temporal difference (TD)-based RL methods to
mitigate the exploration challenge. Q-chunking adopts action chunking by
directly running RL in a 'chunked' action space, enabling the agent to (1)
leverage temporally consistent behaviors from offline data for more effective
online exploration and (2) use unbiased $n$-step backups for more stable and
efficient TD learning. Our experimental results demonstrate that Q-chunking
exhibits strong offline performance and online sample efficiency, outperforming
prior best offline-to-online methods on a range of long-horizon, sparse-reward
manipulation tasks.
中文标题/摘要
标题:带有动作分块的强化学习
我们提出了Q-分块,一种简单而有效的改进长时程、稀疏奖励任务的强化学习(RL)算法的方法。我们的方法适用于离线到在线的RL设置,目标是利用离线先验数据来最大化在线学习的样本效率。有效的探索和样本高效学习仍然是该设置中的核心挑战,因为不明显如何利用离线数据来获取一个好的探索性策略。我们的关键洞察是,动作分块,一种在模仿学习中流行的预测未来动作序列的技术,可以应用于基于时差(TD)的RL方法,以缓解探索挑战。Q-分块通过直接在“分块”动作空间中运行RL,使智能体能够(1)利用离线数据中的时间一致行为进行更有效的在线探索,(2)使用无偏的$n$步备份进行更稳定和高效的TD学习。我们的实验结果表明,Q-分块在离线性能和在线样本效率方面表现出色,优于一系列长时程、稀疏奖励操作任务的先前最佳离线到在线方法。
Summary / 总结
Q-chunking is a method designed to improve reinforcement learning for long-horizon, sparse-reward tasks by leveraging offline data to enhance exploration and sample efficiency. It achieves this by applying action chunking, which predicts sequences of future actions, to temporal difference-based RL methods. Experimental results show that Q-chunking performs well both offline and online, outperforming previous methods on manipulation tasks with sparse rewards.
Q-chunking 是一种方法,旨在通过利用离线数据中的动作分组来提高长期、稀疏奖励任务中的样本效率。该技术通过在‘分组’动作空间中运行 RL,有助于更有效的探索和更稳定的 TD 学习。实验结果表明,Q-chunking 在操纵任务中的离线性能和在线样本效率方面均优于先前的方法。
Quantum Temporal Fusion Transformer
Authors: Krishnakanta Barik, Goutam Paul
First: 2025-08-06T03:21:20+00:00 · Latest: 2025-10-24T17:37:07+00:00
Abstract
The \textit{Temporal Fusion Transformer} (TFT), proposed by Lim \textit{et
al.}, published in \textit{International Journal of Forecasting} (2021), is a
state-of-the-art attention-based deep neural network architecture specifically
designed for multi-horizon time series forecasting. It has demonstrated
significant performance improvements over existing benchmarks. In this work, we
introduce the Quantum Temporal Fusion Transformer (QTFT), a quantum-enhanced
hybrid quantum-classical architecture that extends the capabilities of the
classical TFT framework. The core idea of this work is inspired by the
foundation studies, \textit{The Power of Quantum Neural Networks} by Amira
Abbas \textit{et al.} and \textit{Quantum Vision Transformers} by El Amine
Cherrat \textit{et al.}, published in \textit{ Nature Computational Science}
(2021) and \textit{Quantum} (2024), respectively. A key advantage of our
approach lies in its foundation on a variational quantum algorithm, enabling
implementation on current noisy intermediate-scale quantum (NISQ) devices
without strict requirements on the number of qubits or circuit depth. Our
results demonstrate that QTFT is successfully trained on the forecasting
datasets and is capable of accurately predicting future values. In particular,
our experimental results on two different datasets display that the model
outperforms its classical counterpart in terms of both training and test loss.
These results indicate the prospect of using quantum computing to boost deep
learning architectures in complex machine learning tasks.
中文标题/摘要
标题:量子时间融合变换器
Lim 等人提出的《时间融合变换器》(TFT),发表于《国际预测杂志》(2021),是一种基于注意力机制的先进深度神经网络架构,专门设计用于多时间尺度时间序列预测。它在现有基准上展示了显著的性能提升。在此工作中,我们引入了量子时间融合变换器 (QTFT),这是一种量子增强的量子-经典混合架构,扩展了经典 TFT 框架的能力。本文的核心思想受到 Amira Abbas 等人关于《量子神经网络的力量》和 El Amine Cherrat 等人关于《量子视觉变换器》的研究的启发,分别发表于《自然计算科学》(2021) 和《量子》(2024)。我们方法的一个关键优势在于其基于可变量子算法的基础,这使得它可以在当前的嘈杂中等规模量子 (NISQ) 设备上实现,而无需对量子比特数量或电路深度有严格要求。我们的结果表明,QTFT 成功地在预测数据集上进行了训练,并能够准确预测未来值。特别是,我们在两个不同数据集上的实验结果表明,该模型在训练损失和测试损失方面均优于其经典对应物。这些结果表明,使用量子计算来增强复杂机器学习任务中的深度学习架构的前景。
Summary / 总结
The Quantum Temporal Fusion Transformer (QTFT) is a quantum-enhanced hybrid architecture that builds upon the Temporal Fusion Transformer (TFT) for multi-horizon time series forecasting. Inspired by quantum neural network studies, QTFT uses a variational quantum algorithm, making it feasible on current NISQ devices. Experimental results show that QTFT outperforms the classical TFT in terms of training and test loss on two datasets, suggesting potential for quantum computing in enhancing deep learning models.
研究引入了量子时间融合变换器(QTFT),这是一种基于时间融合变换器(TFT)的量子增强混合架构,用于多时间尺度时间序列预测。QTFT 利用变量子算法,使其适用于当前的 NISQ 设备。实验结果表明,QTFT 在两个不同数据集上的训练和测试损失均优于经典 TFT,表明量子计算在复杂机器学习任务中的潜在优势。
A Gravity-informed Spatiotemporal Transformer for Human Activity Intensity Prediction
Authors: Yi Wang, Zhenghong Wang, Fan Zhang, Chaogui Kang, Sijie Ruan, Di Zhu, Chengling Tang, Zhongfu Ma, Weiyu Zhang, Yu Zheng, Philip S. Yu, Yu Liu
First: 2025-06-16T16:32:51+00:00 · Latest: 2025-10-24T17:36:52+00:00
Comments: IEEE TPAMI 2025. 18 pages, 14 figures
Abstract
Human activity intensity prediction is crucial to many location-based
services. Despite tremendous progress in modeling dynamics of human activity,
most existing methods overlook physical constraints of spatial interaction,
leading to uninterpretable spatial correlations and over-smoothing phenomenon.
To address these limitations, this work proposes a physics-informed deep
learning framework, namely Gravity-informed Spatiotemporal Transformer
(Gravityformer) by integrating the universal law of gravitation to refine
transformer attention. Specifically, it (1) estimates two spatially explicit
mass parameters based on spatiotemporal embedding feature, (2) models the
spatial interaction in end-to-end neural network using proposed adaptive
gravity model to learn the physical constraint, and (3) utilizes the learned
spatial interaction to guide and mitigate the over-smoothing phenomenon in
transformer attention. Moreover, a parallel spatiotemporal graph convolution
transformer is proposed for achieving a balance between coupled spatial and
temporal learning. Systematic experiments on six real-world large-scale
activity datasets demonstrate the quantitative and qualitative superiority of
our model over state-of-the-art benchmarks. Additionally, the learned gravity
attention matrix can be not only disentangled and interpreted based on
geographical laws, but also improved the generalization in zero-shot
cross-region inference. This work provides a novel insight into integrating
physical laws with deep learning for spatiotemporal prediction.
中文标题/摘要
标题:基于引力信息的空间时间变换器对人体活动强度预测
人体活动强度预测对于许多基于位置的服务至关重要。尽管在建模人体活动动力学方面取得了巨大进展,但大多数现有方法忽略了空间交互的物理约束,导致空间相关性不可解释和过度平滑现象。为了解决这些局限性,本文提出了一种基于物理的深度学习框架,即通过将万有引力定律整合到变换器注意机制中来改进变换器注意机制的引力信息空间时间变换器(Gravityformer)。具体而言,它(1)基于时空嵌入特征估计两个空间显式质量参数,(2)使用提出的自适应引力模型在端到端神经网络中建模空间交互以学习物理约束,(3)利用学习到的空间交互来引导和缓解变换器注意中的过度平滑现象。此外,还提出了一种并行的空间时间图卷积变换器以实现空间和时间学习的平衡。系统实验在六个大规模真实世界活动数据集上证明了我们的模型在与最先进的基准相比的定量和定性优越性。此外,学习到的引力注意矩阵不仅可以基于地理定律进行分离和解释,还可以在零样本跨区域推理中提高泛化能力。本文为将物理定律与空间时间预测的深度学习集成提供了新的见解。
Summary / 总结
This work addresses the limitations of existing methods in human activity intensity prediction by proposing a physics-informed deep learning framework called Gravity-informed Spatiotemporal Transformer (Gravityformer). It integrates the universal law of gravitation to refine transformer attention, estimating two spatially explicit mass parameters and modeling spatial interaction using an adaptive gravity model. Experiments on six real-world datasets show that Gravityformer outperforms state-of-the-art benchmarks both quantitatively and qualitatively, and the learned gravity attention matrix can be disentangled and interpreted based on geographical laws, improving generalization in cross-region inference.
本文提出了一种基于物理定律的深度学习框架——引力感知时空变换器(Gravityformer),以解决现有方法在人类活动强度预测中的局限性。该框架通过引入万有引力定律来细化变压器注意力,估计空间显式质量参数,并使用自适应引力模型建模空间交互。在六个真实世界的大规模活动数据集上的实验结果表明,Gravityformer在定量和定性上均优于最先进的基准模型,且学习到的引力注意力矩阵可以根据地理规律进行拆解和解释,从而提高跨区域推理的一般性。
A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection
Authors: Gaku Morio, Harri Rowlands, Dominik Stammbach, Christopher D. Manning, Peter Henderson
Venue: NeurIPS 2025
First: 2025-10-24T17:34:28+00:00 · Latest: 2025-10-24T17:34:28+00:00
Comments: Forthcoming in NeurIPS 2025 Datasets and Benchmarks Track
Abstract
Companies spend large amounts of money on public relations campaigns to
project a positive brand image. However, sometimes there is a mismatch between
what they say and what they do. Oil & gas companies, for example, are accused
of "greenwashing" with imagery of climate-friendly initiatives. Understanding
the framing, and changes in framing, at scale can help better understand the
goals and nature of public relations campaigns. To address this, we introduce a
benchmark dataset of expert-annotated video ads obtained from Facebook and
YouTube. The dataset provides annotations for 13 framing types for more than 50
companies or advocacy groups across 20 countries. Our dataset is especially
designed for the evaluation of vision-language models (VLMs), distinguishing it
from past text-only framing datasets. Baseline experiments show some promising
results, while leaving room for improvement for future work: GPT-4.1 can detect
environmental messages with 79% F1 score, while our best model only achieves
46% F1 score on identifying framing around green innovation. We also identify
challenges that VLMs must address, such as implicit framing, handling videos of
various lengths, or implicit cultural backgrounds. Our dataset contributes to
research in multimodal analysis of strategic communication in the energy
sector.
中文标题/摘要
标题:一种多模态基准,用于石油与天然气广告框架和潜在绿色漂洗检测
公司花费大量资金进行公共关系活动以塑造积极的品牌形象。然而,有时他们的言行并不一致。例如,石油和天然气公司因使用气候友好型项目的图像而被指责进行“绿色漂洗”。理解大规模的框架及其变化有助于更好地了解公共关系活动的目标和性质。为了解决这一问题,我们引入了一个由专家标注的视频广告基准数据集,这些数据来自Facebook和YouTube。该数据集为来自20个国家的50多家公司或倡导组织提供了超过13种框架类型的注释。我们的数据集特别设计用于评估视觉-语言模型(VLMs),使其不同于以往仅基于文本的框架数据集。基线实验显示了一些有希望的结果,但仍为未来的工作留下了改进的空间:GPT-4.1可以以79%的F1分数检测环境信息,而我们最好的模型在识别围绕绿色创新的框架方面只能达到46%的F1分数。我们还指出了视觉-语言模型必须解决的挑战,如隐含的框架、处理不同长度的视频或隐含的文化背景。我们的数据集为能源领域战略沟通的多模态分析研究做出了贡献。
Summary / 总结
The research aims to understand the framing and changes in public relations campaigns, particularly in the oil and gas industry, to detect potential greenwashing. A benchmark dataset of expert-annotated video ads from Facebook and YouTube is introduced, providing annotations for 13 framing types across 50+ companies or advocacy groups. Baseline experiments show that while some models can detect environmental messages with high accuracy, identifying framing around green innovation remains challenging, indicating areas for improvement in vision-language models.
研究旨在通过一个多模态基准数据集理解石油和天然气广告中的框架及其潜在的绿色漂洗问题。该数据集包含来自Facebook和YouTube的专家标注视频广告,涵盖了20个国家超过50家公司或倡导组织的13种框架类型。基线实验表明,虽然一些模型可以高精度地检测环境信息,但在识别绿色创新框架方面仍存在挑战。数据集强调了模型需要解决隐含框架和处理不同长度的视频以及多元文化背景的需求。
CityAQVis: Integrated ML-Visualization Sandbox Tool for Pollutant Estimation in Urban Regions Using Multi-Source Data (Software Article)
Authors: Brij Bidhin Desai, Yukta Arvind Rajapur, Aswathi Mundayatt, Jaya Sreevalsan-Nair
First: 2025-09-13T18:16:29+00:00 · Latest: 2025-10-24T17:31:44+00:00
Comments: 19 pages, 10 figures, 2 tables
Abstract
Urban air pollution poses significant risks to public health, environmental
sustainability, and policy planning. Effective air quality management requires
predictive tools that can integrate diverse datasets and communicate complex
spatial and temporal pollution patterns. There is a gap in interactive tools
with seamless integration of forecasting and visualization of spatial
distributions of air pollutant concentrations. We present CityAQVis, an
interactive machine learning ML sandbox tool designed to predict and visualize
pollutant concentrations at the ground level using multi-source data, which
includes satellite observations, meteorological parameters, population density,
elevation, and nighttime lights. While traditional air quality visualization
tools often lack forecasting capabilities, CityAQVis enables users to build and
compare predictive models, visualizing the model outputs and offering insights
into pollution dynamics at the ground level. The pilot implementation of the
tool is tested through case studies predicting nitrogen dioxide (NO2)
concentrations in metropolitan regions, highlighting its adaptability to
various pollutants. Through an intuitive graphical user interface (GUI), the
user can perform comparative visualizations of the spatial distribution of
surface-level pollutant concentration in two different urban scenarios. Our
results highlight the potential of ML-driven visual analytics to improve
situational awareness and support data-driven decision-making in air quality
management.
中文标题/摘要
标题:CityAQVis:基于多源数据的城市区域污染物估算集成机器学习-可视化沙箱工具(软件文章)
城市空气污染对公众健康、环境可持续性和政策规划构成了重大风险。有效的空气质量管理需要能够整合多种数据集并传达复杂的空间和时间污染模式的预测工具。目前缺乏能够无缝集成预测和空间污染物浓度可视化功能的交互式工具。我们介绍了CityAQVis,这是一种交互式的机器学习ML沙箱工具,旨在使用多源数据(包括卫星观测、气象参数、人口密度、海拔和夜间灯光)预测和可视化地表污染物浓度。传统的空气质量可视化工具通常缺乏预测能力,而CityAQVis则使用户能够构建和比较预测模型,可视化模型输出并提供地表污染动态的见解。通过案例研究预测大都市地区二氧化氮(NO2)浓度的初步实施表明,该工具具有适应各种污染物的能力。通过直观的图形用户界面(GUI),用户可以在两种不同的城市场景中进行地表污染物浓度空间分布的比较可视化。我们的结果突显了基于机器学习的可视化分析在提高情况意识和支持基于数据的空气质量管理决策方面的潜力。
Summary / 总结
CityAQVis is an interactive machine learning sandbox tool designed to predict and visualize ground-level pollutant concentrations using multi-source data such as satellite observations, meteorological parameters, and population density. It fills the gap in traditional visualization tools by offering forecasting capabilities and an intuitive GUI for comparative visualizations. The tool was tested in case studies predicting nitrogen dioxide concentrations in metropolitan regions, demonstrating its adaptability to various pollutants and potential to enhance situational awareness and data-driven decision-making in air quality management.
CityAQVis 是一个交互式的机器学习沙盒工具,用于使用卫星观测、气象参数和人口密度等多源数据预测和可视化地表污染物浓度。它弥补了传统可视化工具缺乏预测能力的空白,通过案例研究展示了其在改善空气质量管理中的情况意识和数据驱动决策支持方面的有效性。
Fixed-Point RNNs: Interpolating from Diagonal to Dense
Authors: Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto
Venue: NeurIPS 2025 Spotlight
First: 2025-03-13T18:50:22+00:00 · Latest: 2025-10-24T17:27:32+00:00
Comments: NeurIPS 2025 (Spotlight)
Abstract
Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as
Mamba have become promising alternatives to softmax-attention as sequence
mixing layers in Transformer architectures. Current models, however, do not
exhibit the full state-tracking expressivity of RNNs because they rely on
channel-wise (i.e. diagonal) sequence mixing. In this paper, we investigate
parameterizations of a large class of dense linear RNNs as fixed-points of
parallelizable diagonal linear RNNs. The resulting models can naturally trade
expressivity for efficiency at a fixed number of parameters and achieve
state-of-the-art results on the state-tracking benchmarks $A_5$ and $S_5$,
while matching performance on copying and other tasks.
中文标题/摘要
标题:定点RNN:从对角到密集的插值
线性循环神经网络(RNN)和状态空间模型(SSM),如Mamba,已成为Transformer架构中序列混合层的有前途的替代方案,这些模型在序列混合层中作为softmax-注意力的替代品。然而,当前的模型并未表现出RNN的完整状态跟踪表达能力,因为它们依赖于通道内(即对角线)的序列混合。在本文中,我们研究了一类密集线性RNN参数化为可并行化的对角线线性RNN的不动点。由此产生的模型可以在固定参数数量下自然地在表达能力和效率之间进行权衡,并在状态跟踪基准$A_5$和$S_5$上达到最先进的结果,同时在复制和其他任务上匹配性能。
Summary / 总结
This paper explores the use of fixed-point recurrent neural networks (RNNs) to enhance the expressivity of sequence mixing in Transformer architectures. By parameterizing dense linear RNNs as fixed-points of parallelizable diagonal RNNs, the authors achieve a balance between expressivity and efficiency. The models outperform existing methods on state-tracking benchmarks $A_5$ and $S_5$, while maintaining comparable performance on copying tasks and other benchmarks.
本文探讨了使用固定点递归神经网络(RNN)来增强Transformer架构中序列混合层的表达能力。通过将密集线性RNN参数化为可并行化的对角线RNN的固定点,这些模型可以在固定参数数量下平衡表达能力和效率。研究结果显示,这些模型在状态跟踪基准$A_5$和$S_5$上达到了最先进的效果,同时在复制和其他任务上保持了相当的性能。
Metropolis-Hastings Sampling for 3D Gaussian Reconstruction
Authors: Hyunjin Kim, Haebeom Jung, Jaesik Park
Venue: NeurIPS 2025
First: 2025-06-15T19:12:37+00:00 · Latest: 2025-10-24T17:23:51+00:00
Comments: NeurIPS 2025. Project Page: https://hjhyunjinkim.github.io/MH-3DGS
Abstract
We propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS)
that leverages comprehensive multi-view photometric error signals within a
unified Metropolis-Hastings approach. Vanilla 3DGS heavily relies on
heuristic-based density-control mechanisms (e.g., cloning, splitting, and
pruning), which can lead to redundant computations or premature removal of
beneficial Gaussians. Our framework overcomes these limitations by
reformulating densification and pruning as a probabilistic sampling process,
dynamically inserting and relocating Gaussians based on aggregated multi-view
errors and opacity scores. Guided by Bayesian acceptance tests derived from
these error-based importance scores, our method substantially reduces reliance
on heuristics, offers greater flexibility, and adaptively infers Gaussian
distributions without requiring predefined scene complexity. Experiments on
benchmark datasets, including Mip-NeRF360, Tanks and Temples and Deep Blending,
show that our approach reduces the number of Gaussians needed, achieving faster
convergence while matching or modestly surpassing the view-synthesis quality of
state-of-the-art models.
中文标题/摘要
标题:Metropolis-Hastings采样在3D高斯重建中的应用
我们提出了一种适应性采样框架,用于3D高斯点积(3DGS),该框架在统一的Metropolis-Hastings方法中利用了全面的多视图光度误差信号。传统的3DGS高度依赖于基于启发式的密度控制机制(例如克隆、分裂和修剪),这可能导致冗余计算或过早移除有益的高斯点。我们的框架通过将密度增加和修剪重新表述为概率采样过程,动态地根据多视图误差和透明度分数插入和重新定位高斯点,从而克服了这些限制。根据这些基于误差的重要性得分推导出的贝叶斯接受测试指导,我们的方法大大减少了对启发式的依赖,提供了更大的灵活性,并能够自适应地推断高斯分布,而无需预先定义场景复杂性。在基准数据集上的实验,包括Mip-NeRF360、Tanks and Temples和Deep Blending,表明我们的方法减少了所需的高斯点数量,实现了更快的收敛速度,同时匹配或适度超过了最先进的模型的视图合成质量。
Summary / 总结
The paper proposes an adaptive sampling framework for 3D Gaussian Splatting (3DGS) that uses a Metropolis-Hastings approach to integrate multi-view photometric errors and opacity scores. This method reformulates densification and pruning as probabilistic sampling, reducing reliance on heuristics and achieving faster convergence with comparable or slightly better view-synthesis quality on benchmark datasets like Mip-NeRF360, Tanks and Temples, and Deep Blending, while using fewer Gaussians.
论文提出了一种基于Metropolis-Hastings方法的自适应采样框架,用于3D Gaussian Splatting (3DGS),利用多视图光度误差。该方法将密度控制和修剪重新表述为概率采样过程,根据聚合的误差和透明度分数动态插入和重新定位高斯体。实验表明,这种方法减少了所需的高斯体数量,实现了更快的收敛速度,并且在视图合成质量上与最先进的模型相当或略好。
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
Authors: Chunyu Miao, Henry Peng Zou, Yangning Li, Yankai Chen, Yibo Wang, Fangxin Wang, Yifan Li, Wooseong Yang, Bowei He, Xinni Zhang, Dianzhi Yu, Hanchen Yang, Hoang H Nguyen, Yue Zhou, Jie Yang, Jizhou Guo, Wenzhe Fan, Chin-Yuan Yeh, Panpan Meng, Liancheng Fang, Jinhu Qi, Wei-Chieh Huang, Zhengyao Gu, Yuwei Han, Langzhou He, Yuyao Yang, Yinghui Li, Hai-Tao Zheng, Xue Liu, Irwin King, Philip S. Yu
First: 2025-10-07T17:45:35+00:00 · Latest: 2025-10-24T17:20:26+00:00
Comments: Code and dataset are available at github.com/ChunyuMiao98/RECODE
Abstract
Large language models (LLMs) show the promise in supporting scientific
research implementation, yet their ability to generate correct and executable
code remains limited. Existing works largely adopt one-shot settings, ignoring
the iterative and feedback-driven nature of realistic workflows of scientific
research development. To address this gap, we present RECODE-H, a benchmark of
102 tasks from research papers and repositories that evaluates LLM agents
through multi-turn interactions with LLM-simulated human feedback. It includes
structured instructions,unit tests, and a five-level feedback hierarchy to
reflect realistic researcher-agent collaboration. We further present
ReCodeAgent, a framework that integrates feedback into iterative code
generation. Experiments with leading LLMs, including GPT-5, Claude-Sonnet-4,
DeepSeek-V3.1, and Gemini 2.5, show substantial performance gains with richer
feedback, while also highlighting ongoing challenges in the generation of
complex research code. RECODE-H establishes a foundation for developing
adaptive, feedback-driven LLM agents in scientific research implementation
中文标题/摘要
标题:RECODE-H:一种基于交互式人类反馈的科研代码开发基准
大型语言模型(LLMs)在支持科学研究实现方面显示出潜力,但它们生成正确且可执行代码的能力仍然有限。现有工作大多采用一次性设置,忽略了科学研究开发中迭代和反馈驱动的现实工作流程的性质。为了解决这一差距,我们提出了RECODE-H,这是一个包含102项任务的基准,这些任务来自研究论文和代码库,并通过与LLM模拟的人类反馈的多轮交互来评估LLM代理。它包括结构化指令、单元测试和五级反馈层次结构,以反映现实的科研人员-代理协作。我们还提出了ReCodeAgent框架,该框架将反馈整合到迭代代码生成中。使用包括GPT-5、Claude-Sonnet-4、DeepSeek-V3.1和Gemini 2.5在内的领先LLM进行的实验表明,随着反馈的丰富,性能有了显著提升,同时也指出了生成复杂科研代码的持续挑战。RECODE-H为开发适应性和反馈驱动的LLM代理奠定了基础,以支持科学研究的实现。
Summary / 总结
RECODE-H is a benchmark for evaluating large language models (LLMs) in scientific research code development, focusing on multi-turn interactions and human feedback. It includes 102 tasks with structured instructions, unit tests, and a five-level feedback hierarchy. Experiments with leading LLMs show significant performance improvements with richer feedback, but also highlight ongoing challenges in generating complex research code.
RECODE-H 是一个基准,用于评估大型语言模型(LLMs)在科学研究代码开发中的表现,侧重于多轮交互和人类反馈。它包含102个任务,具有结构化指令、单元测试和五级反馈层次结构。实验表明,随着反馈的丰富,领先LLM的表现有了显著提升,但也揭示了生成复杂研究代码的持续挑战。
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Authors: Adibvafa Fallahpour, Andrew Magnuson, Purav Gupta, Shihao Ma, Jack Naimer, Arnav Shah, Haonan Duan, Omar Ibrahim, Hani Goodarzi, Chris J. Maddison, Bo Wang
First: 2025-05-29T15:49:27+00:00 · Latest: 2025-10-24T17:16:49+00:00
Comments: 28 pages, 4 figures, 4 tables
Abstract
Unlocking deep and interpretable biological reasoning from complex genomic
data remains a major AI challenge limiting scientific progress. While current
DNA foundation models excel at representing sequences, they struggle with
multi-step reasoning and lack transparent, biologically meaningful
explanations. BioReason addresses this by tightly integrating a DNA foundation
model with a large language model (LLM), enabling the LLM to directly interpret
and reason over genomic information. Through supervised fine-tuning and
reinforcement learning, BioReason learns to produce logical, biologically
coherent deductions. It achieves major performance gains, boosting KEGG-based
disease pathway prediction accuracy from 86% to 98% and improving variant
effect prediction by an average of 15% over strong baselines. BioReason can
reason over unseen biological entities and explain its decisions step by step,
offering a transformative framework for interpretable, mechanistic AI in
biology. All data, code, and checkpoints are available at
https://github.com/bowang-lab/BioReason
中文标题/摘要
标题:BioReason:在DNA-LLM模型中激励多模态生物推理
从复杂基因组数据中解锁深层次和可解释的生物推理仍然是限制科学进步的主要AI挑战。尽管当前的DNA基础模型在表示序列方面表现出色,但在多步推理方面存在困难,并缺乏透明且具有生物学意义的解释。BioReason 通过紧密集成DNA基础模型和大型语言模型(LLM),使LLM能够直接解释和推理基因组信息来解决这一问题。通过监督微调和强化学习,BioReason 学会生成逻辑且生物学上连贯的推断。它实现了显著的性能提升,将基于KEGG的疾病通路预测准确性从86%提高到98%,并平均提高了15%的变体效应预测性能,超过强大的基线。BioReason 可以推理未知的生物实体,并逐步解释其决策,为生物学中的可解释、机制性AI提供变革性的框架。所有数据、代码和检查点均可在 https://github.com/bowang-lab/BioReason 获取。
Summary / 总结
BioReason aims to enhance the interpretability and reasoning capabilities of DNA foundation models by integrating them with large language models. Through supervised fine-tuning and reinforcement learning, BioReason improves the accuracy of disease pathway prediction and variant effect prediction. It achieves a significant boost in KEGG-based disease pathway prediction accuracy from 86% to 98% and an average improvement of 15% in variant effect prediction over strong baselines. BioReason also provides step-by-step explanations for its reasoning, making it a transformative framework for interpretable AI in biology.
BioReason旨在通过将DNA基础模型与大型语言模型紧密结合,增强其解释能力和推理能力。通过监督微调和强化学习,BioReason提高了疾病通路预测和变异效应预测的准确性。它将KEGG基于的疾病通路预测准确性从86%提升到98%,并在变异效应预测上平均提高了15%。BioReason还能提供其推理过程的逐步解释,使其成为生物学中可解释AI的一个变革性框架。
Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer
Authors: Qiyi Tong, Olivia Nocentini, Marta Lagomarsino, Kuanqi Cai, Marta Lorenzini, Arash Ajoudani
First: 2025-10-13T08:19:56+00:00 · Latest: 2025-10-24T17:14:46+00:00
Abstract
Facial Landmark Detection (FLD) in thermal imagery is critical for
applications in challenging lighting conditions, but it is hampered by the lack
of rich visual cues. Conventional cross-modal solutions, like feature fusion or
image translation from RGB data, are often computationally expensive or
introduce structural artifacts, limiting their practical deployment. To address
this, we propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), a
novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer
from model compression to create both accurate and efficient thermal FLD
models. A central challenge during knowledge transfer is the profound modality
gap between RGB and thermal data, where traditional unidirectional distillation
fails to enforce semantic consistency across disparate feature spaces. To
overcome this, we introduce Dual-Injected Knowledge Distillation (DIKD), a
bidirectional mechanism designed specifically for this task. DIKD establishes a
connection between modalities: it not only guides the thermal student with rich
RGB features but also validates the student's learned representations by
feeding them back into the frozen teacher's prediction head. This closed-loop
supervision forces the student to learn modality-invariant features that are
semantically aligned with the teacher, ensuring a robust and profound knowledge
transfer. Experiments show that our approach sets a new state-of-the-art on
public thermal FLD benchmarks, notably outperforming previous methods while
drastically reducing computational overhead.
中文标题/摘要
标题:通过多级跨模态知识蒸馏在热像中进行轻量级面部特征点检测
在热像中进行面部特征点检测(FLD)对于在复杂光照条件下应用至关重要,但缺乏丰富的视觉线索使其受到限制。传统的跨模态解决方案,如特征融合或从RGB数据进行图像转换,通常计算成本高昂或引入结构伪影,限制了其实际部署。为了解决这一问题,我们提出了一种新颖的框架——多级跨模态知识蒸馏(MLCM-KD),该框架将高保真RGB到热像的知识转移与模型压缩解耦,从而创建出既准确又高效的热像FLD模型。知识转移中的一个主要挑战是RGB和热像数据之间巨大的模态差距,传统的单向蒸馏无法在不同的特征空间中强制执行语义一致性。为克服这一挑战,我们引入了一种专为此任务设计的双向机制——双注入知识蒸馏(DIKD)。DIKD在模态之间建立了联系:它不仅用丰富的RGB特征引导热像学生,还通过将学生学习的表示反馈到冻结教师的预测头来验证学生的表示。这种闭环监督迫使学生学习跨模态不变的特征,这些特征在语义上与教师对齐,确保了知识转移的稳健性和深度。实验表明,我们的方法在公共热像FLD基准测试中达到了新的最佳水平,显著优于先前的方法,同时大幅减少了计算开销。
Summary / 总结
The research aims to improve facial landmark detection (FLD) in thermal images, which are crucial for applications in low-light conditions. To address the lack of visual cues and computational inefficiency of conventional methods, the authors propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), which decouples high-fidelity knowledge transfer from model compression. The key method, Dual-Injected Knowledge Distillation (DIKD), establishes a bidirectional mechanism to enforce semantic consistency between RGB and thermal data, leading to both accurate and efficient thermal FLD models. Experiments demonstrate that this approach outperforms previous methods on public thermal FLD benchmarks and significantly reduces computational overhead.
论文针对热图像中面部特征点检测(FLD)的挑战,这种检测在低光照条件下至关重要但缺乏视觉线索。为克服传统跨模态解决方案的计算效率低下和结构缺陷,作者提出了多级跨模态知识蒸馏(MLCM-KD),该方法将高保真知识转移与模型压缩分离。关键创新是双向注入知识蒸馏(DIKD),确保RGB和热图像之间的语义一致性,从而实现准确且高效的热图像FLD模型。实验表明,该方法在公共热图像FLD基准测试中显著优于先前的方法,同时大幅减少了计算开销。
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
Authors: Yifan Xu, Xiao Liu, Xinghan Liu, Jiaqi Fu, Hanchen Zhang, Bohao Jing, Shudan Zhang, Yuting Wang, Wenyi Zhao, Yuxiao Dong
First: 2025-09-10T13:09:27+00:00 · Latest: 2025-10-24T17:13:05+00:00
Abstract
Building general-purpose graphical user interface (GUI) agents has become
increasingly promising with the progress in vision language models. However,
developing effective mobile GUI agents with reinforcement learning (RL) remains
challenging due to the heavy-tailed distribution of task difficulty and the
inefficiency of large-scale environment sampling. We present an online agentic
reinforcement learning framework MobileRL to enhance GUI agents in mobile
environments. Its core component is the Difficulty-ADAptive GRPO (ADAGRPO)
algorithm. In ADAGRPO, we design difficulty-adaptive positive replay and
failure curriculum filtering to adapt the model to different task difficulties.
We introduce the shortest-path reward adjustment strategy to reshape rewards
concerning the task length in multi-turn agentic tasks. Those strategies
jointly stabilize RL training, improve sample efficiency, and generate strong
performance across diverse mobile apps and tasks. We apply MOBILERL to two open
models (Qwen2.5-VL-7B-Instruct and GLM-4.1V-9B-Base). The resultant MOBILERL-9B
model achieves state-of-the-art results in terms of success rates on both
AndroidWorld (80.2%) and AndroidLab (53.6%). The MOBILERL framework is
open-sourced at: https://github.com/THUDM/MobileRL.
中文标题/摘要
标题:MobileRL:移动GUI代理的在线代理强化学习
随着视觉语言模型的进步,构建通用图形用户界面(GUI)代理变得越来越有前景。然而,由于任务难度的重尾分布和大规模环境采样的低效性,使用强化学习(RL)开发有效的移动GUI代理仍然具有挑战性。我们提出了一种在线代理强化学习框架MobileRL,以增强移动环境中的GUI代理。其核心组件是难度自适应GRPO(ADAGRPO)算法。在ADAGRPO中,我们设计了难度自适应正重播和失败课程筛选,以使模型适应不同的任务难度。我们引入了最短路径奖励调整策略,以在多轮代理任务中重新塑造与任务长度相关的奖励。这些策略共同稳定了RL训练,提高了样本效率,并在各种移动应用和任务中生成了强大的性能。我们将MOBILERL应用于两个开源模型(Qwen2.5-VL-7B-Instruct和GLM-4.1V-9B-Base)。MOBILERL-9B模型在AndroidWorld(80.2%)和AndroidLab(53.6%)的成功率方面均达到了最先进的结果。MOBILERL框架已开源:https://github.com/THUDM/MobileRL。
Summary / 总结
MobileRL is an online agentic reinforcement learning framework designed to enhance mobile GUI agents. It uses the Difficulty-ADAptive GRPO (ADAGRPO) algorithm, which includes difficulty-adaptive positive replay and failure curriculum filtering to adapt to varying task difficulties. The framework also introduces a shortest-path reward adjustment strategy to improve reward shaping in multi-turn tasks. Experimental results show that the MOBILERL-9B model achieves state-of-the-art success rates on AndroidWorld (80.2%) and AndroidLab (53.6%).
MobileRL 是一种在线智能体强化学习框架,旨在提升移动 GUI 代理。它使用了 Difficulty-ADAptive GRPO (ADAGRPO) 算法,该算法包括难度自适应正向重播和失败课程筛选,以适应不同的任务难度。框架还采用了最短路径奖励调整策略来改进多轮任务中的奖励塑造。实验结果显示,MOBILERL-9B 模型在 AndroidWorld (80.2%) 和 AndroidLab (53.6%) 上取得了最先进的成功率。
Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging
Authors: Ying Xue, Jiaxi Jiang, Rayan Armani, Dominik Hollidt, Yi-Chi Liao, Christian Holz
Venue: ICCV 2025
First: 2025-10-24T17:11:50+00:00 · Latest: 2025-10-24T17:11:50+00:00
Comments: Accepted by ICCV 2025, Code:
https://github.com/eth-siplab/GroupInertialPoser
Abstract
Tracking human full-body motion using sparse wearable inertial measurement
units (IMUs) overcomes the limitations of occlusion and instrumentation of the
environment inherent in vision-based approaches. However, purely IMU-based
tracking compromises translation estimates and accurate relative positioning
between individuals, as inertial cues are inherently self-referential and
provide no direct spatial reference for others. In this paper, we present a
novel approach for robustly estimating body poses and global translation for
multiple individuals by leveraging the distances between sparse wearable
sensors - both on each individual and across multiple individuals. Our method
Group Inertial Poser estimates these absolute distances between pairs of
sensors from ultra-wideband ranging (UWB) and fuses them with inertial
observations as input into structured state-space models to integrate temporal
motion patterns for precise 3D pose estimation. Our novel two-step optimization
further leverages the estimated distances for accurately tracking people's
global trajectories through the world. We also introduce GIP-DB, the first
IMU+UWB dataset for two-person tracking, which comprises 200 minutes of motion
recordings from 14 participants. In our evaluation, Group Inertial Poser
outperforms previous state-of-the-art methods in accuracy and robustness across
synthetic and real-world data, showing the promise of IMU+UWB-based multi-human
motion capture in the wild. Code, models, dataset:
https://github.com/eth-siplab/GroupInertialPoser
中文标题/摘要
标题:群体惯性摆动器:利用稀疏惯性测量单元和超宽带测距的多人姿态和全局平移估计
使用稀疏穿戴式惯性测量单元(IMU)跟踪人体全身运动,克服了基于视觉的方法中固有的遮挡和环境仪器化限制。然而,纯基于IMU的跟踪会牺牲平移估计的准确性以及个体之间的精确相对定位,因为惯性线索本质上是自参照的,无法直接提供其他个体的空间参考。在本文中,我们提出了一种新颖的方法,通过利用稀疏穿戴传感器之间的距离——既在每个个体内部,又跨多个个体——来稳健地估计多人的身体姿态和全局平移。我们的方法Group Inertial Poser从超宽带测距(UWB)中估计这些传感器对之间的绝对距离,并将它们与惯性观测值融合,作为结构化状态空间模型的输入,以整合时间上的运动模式,实现精确的3D姿态估计。我们的新颖两步优化进一步利用估计的距离准确跟踪人们的全球轨迹。我们还引入了GIP-DB,这是第一个用于两人跟踪的IMU+UWB数据集,包含14名参与者200分钟的运动记录。在我们的评估中,Group Inertial Poser在合成数据和真实世界数据中均表现出色,优于先前的最先进的方法,展示了基于IMU+UWB的多人运动捕捉在野外的潜力。代码、模型、数据集:https://github.com/eth-siplab/GroupInertialPoser
Summary / 总结
The research aims to improve the accuracy of multi-person pose estimation and global translation tracking using sparse inertial sensors and ultra-wideband (UWB) ranging. The method, Group Inertial Poser, leverages UWB distances between sensors to estimate absolute distances and integrates these with inertial observations. This approach outperforms previous methods in accuracy and robustness, as demonstrated through both synthetic and real-world data. The dataset, GIP-DB, includes 200 minutes of motion recordings from 14 participants, facilitating the evaluation of the method.
研究旨在利用稀疏惯性传感器和超宽带(UWB)测距提高多人姿态和全局平移跟踪的准确性。方法Group Inertial Poser通过UWB估计传感器对之间的绝对距离,并将这些距离与惯性观测值融合,以整合时间上的运动模式。该方法在准确性和鲁棒性方面优于先前的方法,展示了IMU+UWB在野外多人动作捕捉中的潜力。
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Authors: Jonathan Bragg, Mike D'Arcy, Nishant Balepur, Dan Bareket, Bhavana Dalvi, Sergey Feldman, Dany Haddad, Jena D. Hwang, Peter Jansen, Varsha Kishore, Bodhisattwa Prasad Majumder, Aakanksha Naik, Sigal Rahamimov, Kyle Richardson, Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu, Guy Wiener, Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka, Brooke Vlahos, Peter Clark, Doug Downey, Yoav Goldberg, Ashish Sabharwal, Daniel S. Weld
First: 2025-10-24T17:10:26+00:00 · Latest: 2025-10-24T17:10:26+00:00
Abstract
AI agents hold the potential to revolutionize scientific productivity by
automating literature reviews, replicating experiments, analyzing data, and
even proposing new directions of inquiry; indeed, there are now many such
agents, ranging from general-purpose "deep research" systems to specialized
science-specific agents, such as AI Scientist and AIGS. Rigorous evaluation of
these agents is critical for progress. Yet existing benchmarks fall short on
several fronts: they (1) fail to provide holistic, product-informed measures of
real-world use cases such as science research; (2) lack reproducible agent
tools necessary for a controlled comparison of core agentic capabilities; (3)
do not account for confounding variables such as model cost and tool access;
(4) do not provide standardized interfaces for quick agent prototyping and
evaluation; and (5) lack comprehensive baseline agents necessary to identify
true advances. In response, we define principles and tooling for more
rigorously benchmarking agents. Using these, we present AstaBench, a suite that
provides the first holistic measure of agentic ability to perform scientific
research, comprising 2400+ problems spanning the entire scientific discovery
process and multiple scientific domains, and including many problems inspired
by actual user requests to deployed Asta agents. Our suite comes with the first
scientific research environment with production-grade search tools that enable
controlled, reproducible evaluation, better accounting for confounders.
Alongside, we provide a comprehensive suite of nine science-optimized classes
of Asta agents and numerous baselines. Our extensive evaluation of 57 agents
across 22 agent classes reveals several interesting findings, most importantly
that despite meaningful progress on certain individual aspects, AI remains far
from solving the challenge of science research assistance.
中文标题/摘要
标题:AstaBench:科学研究套件中的AI代理严格基准测试
AI代理有潜力通过自动化文献回顾、重复实验、数据分析甚至提出新的研究方向来革新科学研究的生产力;目前已有许多这样的代理,从通用的“深度研究”系统到专门的科学专用代理,如AI科学家和AIGS。对这些代理进行严格的评估对于进步至关重要。然而,现有的基准测试在多个方面存在不足:它们(1)未能提供全面的产品导向的度量标准来衡量实际使用案例,如科学研究;(2)缺乏可重复的代理工具,无法进行受控比较核心代理能力;(3)未考虑模型成本和工具访问等混淆变量;(4)未提供标准化接口以快速进行代理原型设计和评估;(5)缺乏全面的基础代理以识别真正的进步。为此,我们定义了更严格的基准测试原则和工具。基于这些原则,我们提出了AstaBench,这是一个提供科学研究能力全面衡量的套件,包含2400多个涵盖整个科学发现过程和多个科学领域的研究问题,其中许多问题受到实际用户请求部署的Asta代理的启发。我们的套件配备了首个生产级的科学研究环境,具有可控制和可重复的评估工具,更好地考虑了混淆因素。此外,我们还提供了一个全面的九类科学优化的Asta代理套件和多个基线。我们对22类57个代理的广泛评估揭示了一些有趣的结果,最重要的是,尽管在某些方面取得了有意义的进步,但AI仍远未解决科学研究辅助的挑战。
A Dynamic Knowledge Distillation Method Based on the Gompertz Curve
Authors: Han Yang, Guangjun Qin
First: 2025-10-24T17:07:27+00:00 · Latest: 2025-10-24T17:07:27+00:00
Comments: 15 pages, 2 figures
Abstract
This paper introduces a novel dynamic knowledge distillation framework,
Gompertz-CNN, which integrates the Gompertz growth model into the training
process to address the limitations of traditional knowledge distillation.
Conventional methods often fail to capture the evolving cognitive capacity of
student models, leading to suboptimal knowledge transfer. To overcome this, we
propose a stage-aware distillation strategy that dynamically adjusts the weight
of distillation loss based on the Gompertz curve, reflecting the student's
learning progression: slow initial growth, rapid mid-phase improvement, and
late-stage saturation. Our framework incorporates Wasserstein distance to
measure feature-level discrepancies and gradient matching to align backward
propagation behaviors between teacher and student models. These components are
unified under a multi-loss objective, where the Gompertz curve modulates the
influence of distillation losses over time. Extensive experiments on CIFAR-10
and CIFAR-100 using various teacher-student architectures (e.g., ResNet50 and
MobileNet_v2) demonstrate that Gompertz-CNN consistently outperforms
traditional distillation methods, achieving up to 8% and 4% accuracy gains on
CIFAR-10 and CIFAR-100, respectively.
中文标题/摘要
标题:基于戈珀茨曲线的动态知识蒸馏方法
本文介绍了一种新颖的动态知识蒸馏框架Gompertz-CNN,将戈珀茨增长模型整合到训练过程中,以解决传统知识蒸馏的局限性。传统方法往往无法捕捉学生模型的认知能力演变,导致知识转移效果不佳。为克服这一问题,我们提出了一种阶段感知的蒸馏策略,根据戈珀茨曲线动态调整蒸馏损失的权重,反映学生的学习进程:初始缓慢增长,中期快速改进,后期饱和。该框架结合了 Wasserstein 距离来衡量特征级差异,并通过梯度匹配来使教师和学生模型的反向传播行为对齐。这些组件在多损失目标下统一,其中戈珀茨曲线调节了蒸馏损失随时间的影响。在使用各种教师-学生架构(例如ResNet50和MobileNet_v2)对CIFAR-10和CIFAR-100进行的广泛实验中,Gompertz-CNN 一致地优于传统蒸馏方法,在CIFAR-10和CIFAR-100上分别实现了高达8%和4%的准确率提升。
Summary / 总结
This paper proposes Gompertz-CNN, a dynamic knowledge distillation framework that uses the Gompertz growth model to address the limitations of traditional methods in capturing the evolving cognitive capacity of student models. It introduces a stage-aware distillation strategy that adjusts the weight of distillation loss based on the Gompertz curve, incorporating Wasserstein distance and gradient matching to improve feature-level alignment and backward propagation behaviors. Experiments on CIFAR-10 and CIFAR-100 show that Gompertz-CNN achieves up to 8% and 4% accuracy gains over traditional methods.
本文提出了一种基于Gompertz曲线的动态知识蒸馏框架Gompertz-CNN,该框架利用Gompertz增长模型解决传统方法在捕捉学生模型认知能力演变方面的局限性。该框架采用基于Gompertz曲线的阶段感知蒸馏策略来调整蒸馏损失的权重,并包含Wasserstein距离和梯度匹配以对齐特征级差异和反向传播行为。实验结果表明,Gompertz-CNN在CIFAR-10和CIFAR-100上的表现优于传统方法,分别实现了8%和4%的准确率提升。