AI 每日资讯 — 2026-05-18

🔥 HuggingFace 每日论文

1. ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

Ziyu Guo, Rain Liu, Xinyan Chen

本文针对视觉推理中中间视觉状态建模的效率与泛化性难题，提出ATLAS框架：仅用一个离散的“功能词元”（functional token）统一实现代理式操作与潜在视觉推理。该词元内化视觉操作语义，无需视觉监督，可直接通过标准自回归语

言模型生成。ATLAS避免了显式图像生成开销，兼容现有SFT与RL训练范式，无需架构修改。为缓解强化学习中功能词元稀疏导致的训练不稳定，进一步提出Latent-Anchored GRPO（LA-GRPO），通过静态加权辅助目标锚定功能词元，显著提升梯度稳定性与收敛性能。实验表明，ATLAS在多步视觉推理任务上兼具高效性、泛化性与训练鲁棒性。

PDF · arXiv · 项目 | ❤️ 17

2. RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Yanzuo Lu, Ronglai Zuo, Jiankang Deng

本文提出RAVEN框架，旨在解决因果自回归视频扩散模型在长时序外推中因训练与推理历史分布不一致导致的质量退化问题。RAVEN通过将自展开轨迹重构成清洁历史端点与噪声去噪状态的交错序列，使训练注意力机制对齐推理时的外推过程，并利用下

游分块损失监督历史表征。进一步，作者设计一致性模型分组相对策略优化（CM-GRPO），将一致性采样建模为条件高斯转移，并直接在其上实施在线强化学习，摒弃了传统流模型RL中依赖欧拉-丸山辅助过程的做法。实验表明，RAVEN在生成质量、语义一致性和动态保真度上全面超越现有因果视频蒸馏方法，CM-GRPO与其结合可带来进一步提升。

PDF · arXiv · 代码 · 项目 | ❤️ 8

3. Aligning Latent Geometry for Spherical Flow Matching in Image Generation

Tuna Han Salih Meral, Kaan Oktay, Hidir Yesiltepe

本文针对潜空间流匹配图像生成中欧氏线性路径偏离球面分布的问题，提出一种球面流匹配对齐方法。通过将潜变量分解为径向与角向分量，发现解码后的感知与语义内容主要由方向决定，半径贡献甚微。据此，作者将数据潜变量投影至固定半径球面，以高斯噪

声的径向投影作为球面先验，冻结编码器并微调解码器，同时用球面线性插值（Slerp）替代线性插值，确保轨迹全程位于球面上且速度目标纯为角向。实验表明，该方法在ImageNet-256类条件生成任务中稳定提升FID，兼容多种图像tokenizer与现有扩散架构，无需额外编码器或对齐目标。

PDF · arXiv · 项目 | ❤️ 4

4. FutureSim: Replaying World Events to Evaluate Adaptive Agents

Shashwat Goel, Nikhil Chandak, Arvindh Arun

本文提出FutureSim，一种基于真实世界事件时序回放的基准测试框架，用于评估AI智能体在开放动态环境中的长期自适应能力。FutureSim通过按时间顺序注入真实新闻与逐步揭晓的问题，要求智能体在知识截止点之后持续预测未来事件。

在2026年1—3月为期三个月的评测中，前沿智能体表现差异显著：最优模型准确率仅25%，部分模型Brier技能分甚至低于无预测基线。消融实验表明，该框架能有效支撑长周期测试时适应、搜索、记忆及不确定性推理等关键研究方向。FutureSim为衡量AI在真实世界长时域开放适应能力提供了可扩展、可复现的评估范式。

PDF · arXiv · 代码 · 项目 | ❤️ 4

5. EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

Ruozhen He, Meng Wei, Ziyan Yang

本文针对长程多镜头视频生成中实体（人物、物体、场景）一致性难以维持的问题，提出首个面向实体一致性的基准测试集EntityBench。该基准包含140集真实叙事媒体衍生的2491个镜头，覆盖易/中/难三类难度，支持最多50镜头序列、

13个跨镜头人物、8个跨镜头地点及22个跨镜头物体，并定义最长达48镜头的实体重现间隔。配套提出三维度评估体系，涵盖单镜头质量、提示对齐度与跨镜头实体一致性，并引入保真度门控机制确保一致性评分仅基于准确识别的实体。作为基线方法，作者设计了EntityMem——一种在生成前将经验证的实体视觉表征存入持久化记忆库的记忆增强系统。实验表明，现有方法随实体重现距离增加一致性显著下降，而EntityMem在人物保真度（Cohen’s d = +2.33）和存在性上均表现最优。

PDF · arXiv

6. RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

Xiang Fan, Yuheng Wang, Bohan Fang

本文针对视频生成中解码器缺乏条件引导导致细节丢失与结构不一致的问题，提出RefDecoder——一种参考图像条件化的视频VAE解码器。其核心是通过参考注意力机制，将轻量图像编码器提取的高保真参考帧token，与去噪后的视频潜在表示

在每一上采样阶段协同处理。该方法无需微调即可即插即用地集成至现有视频生成系统（如Wan 2.1、VideoVAE+），在Inter4K、WebVid等重建基准上PSNR提升达+2.1dB，并在VBench I2V评测中显著增强主体一致性、背景一致性及整体质量。此外，RefDecoder在风格迁移与视频编辑等任务中亦展现出良好泛化性。

PDF · arXiv

7. VGGT-Ω

Jianyuan Wang, Minghao Chen, Shangzhan Zhang

本文提出VGGT-Ω，一种面向静态与动态场景的高效、可扩展神经重建模型。针对现有前馈重建模型在精度、训练效率与动态建模能力上的局限，作者通过三项关键技术改进：（1）简化网络结构，采用单密集预测头与多任务监督，移除高分辨率卷积层；（

2）引入可学习寄存器（registers）聚合场景表征，并设计寄存器注意力机制，以局部化帧间信息交互、替代部分全局注意力；（3）构建高质量动态场景标注流程与自监督学习协议，支持大规模有/无标签视频数据训练。实验表明，VGGT-Ω训练内存占用仅为原VGGT的30%，可利用15倍监督数据及海量未标注视频，在Sintel等基准上将相机位姿估计精度提升77%，显著推动神经重建方法的实用性与泛化能力。

PDF · arXiv

8. Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

Matt Zhou, Ruining Li, Xiaoyang Lyu

本文针对 articulated 3D 物体理解研究中缺乏大规模、多样化数据集的瓶颈，提出 Articraft——一种基于大语言模型（LLM）的智能体系统，用于可扩展地生成可动3D资产。其核心是将资产生成建模为程序编写任务，并设计

面向领域的SDK与受限执行环境（harness），引导LLM自动生成定义部件、组合几何、指定关节及验证逻辑的代码。该框架规避了URDF编写与复杂环境管理等干扰因素，显著提升生成质量。实验表明，Articraft优于现有 articulated-asset 生成器与通用代码智能体；基于其构建的 Articraft-10K 数据集（覆盖245类、超1万资产）有效支撑了 articulated 模型训练及机器人仿真、虚拟现实等下游应用。

PDF · arXiv

🔥 arXiv 每日论文

🔬 OpenReview 近期论文

1. Measuring the Intrinsic Dimension of Earth Representations

Arjun Rao, Marc Rußwurm, Konstantin Klemmer

本文首次系统研究了地理隐式神经表示（INRs）的本征维度，旨在量化地球观测表征中蕴含的信息量及其空间分布特性。作者基于256–512维嵌入空间，采用几何与概率方法估计INRs的本征维度，发现其值集中于2–10之间，并受空间分辨率与

输入模态（如卫星影像、文本）显著影响。实验表明，本征维度与下游任务性能呈强相关性，且能有效识别模型中的空间伪影，为无监督评估、模型诊断及预训练策略设计提供了架构无关、无需标注的通用度量标准。

AI 每日资讯 — 2026-05-18#

🔥 HuggingFace 每日论文#

1. ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both#

2. RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO#

3. Aligning Latent Geometry for Spherical Flow Matching in Image Generation#

4. FutureSim: Replaying World Events to Evaluate Adaptive Agents#

5. EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation#

6. RefDecoder: Enhancing Visual Generation with Conditional Video Decoding#

7. VGGT-Ω#

8. Articraft: An Agentic System for Scalable Articulated 3D Asset Generation#

🔥 arXiv 每日论文#

🔬 OpenReview 近期论文#

1. Measuring the Intrinsic Dimension of Earth Representations#

2. CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density#

3. Hubble: a Model Suite to Advance the Study of LLM Memorization#

4. Denoising Neural Reranker for Recommender Systems#

5. Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling#

6. TaskCraft: Automated Generation of Agentic Tasks#

7. Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation#

8. Trapped by simplicity: When Transformers fail to learn from noisy features#

9. Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks#

10. LFQA-E: Carefully Benchmarking Long-form QA Evaluation#

11. HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals#

12. Are Deep Speech Denoising Models Robust to Adversarial Noise?#

13. LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models#

14. Diffusion Transformers with Representation Autoencoders#

15. Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism#

📝 AI 官方博客#

1. The new AI-powered Google Finance is expanding to Europe.#

2. See what happens when creative legends use AI to make ads for small businesses.#

3. 5 gardening tips you can try right in Search#

4. Early Indicators of Reward Hacking via Reasoning Interpolation#

5. Reward Hacking Resarch Update#

6. Pretraining Data Filtering for Open-Weight AI Safety#

📬 TLDR AI 精选#

1. one daily email#

💬 Hacker News AI 热门#

1. I don’t think AI will make your processes go faster#

2. Every AI Subscription Is a Ticking Time Bomb for Enterprise#

📰 TechCrunch AI 新闻#

1. If you’re giving a commencement speech in 2026, maybe don’t mention AI#

2. TechCrunch Mobility: The AI skills arms race is coming for automotive#

3. The haves and have nots of the AI gold rush#

4. Research repository ArXiv will ban authors for a year if they let AI do all the work#

5. OpenAI co-founder Greg Brockman takes charge of product strategy#

AI 每日资讯 — 2026-05-18

🔥 HuggingFace 每日论文

1. ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

2. RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

3. Aligning Latent Geometry for Spherical Flow Matching in Image Generation

4. FutureSim: Replaying World Events to Evaluate Adaptive Agents

5. EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

6. RefDecoder: Enhancing Visual Generation with Conditional Video Decoding

7. VGGT-Ω

8. Articraft: An Agentic System for Scalable Articulated 3D Asset Generation

🔥 arXiv 每日论文

🔬 OpenReview 近期论文

1. Measuring the Intrinsic Dimension of Earth Representations

2. CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density

3. Hubble: a Model Suite to Advance the Study of LLM Memorization

4. Denoising Neural Reranker for Recommender Systems

5. Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling

6. TaskCraft: Automated Generation of Agentic Tasks

7. Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation

8. Trapped by simplicity: When Transformers fail to learn from noisy features

9. Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks

10. LFQA-E: Carefully Benchmarking Long-form QA Evaluation

11. HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

12. Are Deep Speech Denoising Models Robust to Adversarial Noise?

13. LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

14. Diffusion Transformers with Representation Autoencoders

15. Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

📝 AI 官方博客

1. The new AI-powered Google Finance is expanding to Europe.

2. See what happens when creative legends use AI to make ads for small businesses.

3. 5 gardening tips you can try right in Search

4. Early Indicators of Reward Hacking via Reasoning Interpolation

5. Reward Hacking Resarch Update

6. Pretraining Data Filtering for Open-Weight AI Safety

📬 TLDR AI 精选

1. one daily email

💬 Hacker News AI 热门

1. I don’t think AI will make your processes go faster

2. Every AI Subscription Is a Ticking Time Bomb for Enterprise

📰 TechCrunch AI 新闻

1. If you’re giving a commencement speech in 2026, maybe don’t mention AI

2. TechCrunch Mobility: The AI skills arms race is coming for automotive

3. The haves and have nots of the AI gold rush

4. Research repository ArXiv will ban authors for a year if they let AI do all the work

5. OpenAI co-founder Greg Brockman takes charge of product strategy