AI 每日资讯 — 2026-05-11

五月 11, 2026 · 7 分钟 · Pan

AI 每日资讯 — 2026-05-11

AI 每日资讯 — 2026-05-11

🔥 HuggingFace 每日论文

1. Flow-OPD: On-Policy Distillation for Flow Matching Models

Zhen Fang, Wenxuan Huang, Yu Zeng

现有流匹配（Flow Matching, FM）文本到图像模型在多任务对齐中面临奖励稀疏性与异构目标联合优化导致的梯度干扰，引发指标“跷跷板效应”与普遍的奖励作弊问题。本文提出Flow-OPD——首个将在线策略蒸馏（On-Policy Di

stillation, OPD）引入FM模型的统一后训练框架。其采用两阶段对齐策略：先通过单奖励GRPO微调构建领域专用教师模型；再基于流匹配的冷启动机制初始化策略，并通过在线采样、任务路由标注与稠密轨迹级监督完成知识融合。进一步提出流形锚点正则化（MAR），利用无任务偏置的教师模型提供全数据监督，稳定生成流形并缓解纯强化学习对美学质量的损害。在Stable Diffusion 3.5 Medium上，GenEval得分由63提升至92，OCR准确率由59提升至94。

PDF · arXiv · 代码 · 项目 | ❤️ 71

2. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang

本文提出AutoTTS框架，旨在通过环境驱动的智能体自动发现更优的测试时扩展（TTS）策略，以提升大语言模型（LLM）推理性能。区别于依赖人工设计启发式规则的传统方法，AutoTTS将研究焦点转向构建可学习环境：其核心在于构造具备可处理控制

空间与高频廉价反馈的发现环境。具体地，作者将宽—深TTS建模为基于预采集推理轨迹与探针信号的控制器综合问题，并引入β参数化与细粒度执行迹反馈机制，显著提升搜索效率与可诊断性。实验表明，在数学推理基准上，自动发现的策略在准确率—计算成本权衡上超越强手工基线，且具备跨任务与跨模型规模的泛化能力；整个发现过程仅耗时160分钟、成本39.9美元。

PDF · arXiv · 代码 · 项目 | ❤️ 51

3. Normalizing Trajectory Models

Jiatao Gu, Tianrong Chen, Ying Shen

本文提出归一化轨迹模型（NTM），旨在解决扩散模型在极少采样步数（如4步）下因高斯噪声假设失效而导致生成质量下降的问题。NTM将每一步反向过程建模为可精确计算似然的条件归一化流，结合步内浅层可逆模块与跨轨迹深层并行预测器，支持端到端训练或从

流匹配预训练模型初始化。其精确轨迹似然还支持自蒸馏机制：仅用轻量级去噪器在模型自身分数上训练，即可实现高质量四步采样。实验表明，NTM在文本到图像生成任务中以4步采样即达到或超越主流生成基线，且唯一保持对完整生成轨迹的精确似然估计。

PDF · arXiv | ❤️ 8

4. SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

Tianfei Ren, Zhipeng Yan, Yiming Zhao

本文针对文本到图像生成中复杂视觉意图难以忠实实现的问题，提出SCOPE框架，旨在解决语义承诺（semantic commitments）在生成全生命周期中因概念断裂（Conceptual Rift）导致的跟踪失效问题。SCOPE通过结构化规

格演进机制持续维护语义承诺，并基于承诺状态条件式调用检索、推理与修复技能。为评估承诺级意图实现能力，作者构建了人工标注基准Gen-Arena及严格指标EGIP。实验表明，SCOPE在Gen-Arena上EGIP达0.60，显著优于所有基线，并在WISE-V（0.907）和MindBench（0.61）上表现优异，验证了持久化承诺跟踪对复杂图像生成的有效性。

PDF · arXiv · 代码 · 项目 | ❤️ 7

5. Fast Byte Latent Transformer

Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz

本文提出Fast Byte Latent Transformer（FBLT），旨在解决字节级语言模型（LMs）因逐字节自回归生成导致的推理速度瓶颈问题。作者设计了三种高效生成方法：（1）BLT-Diffusion（BLT-D），通过联合训练

块级扩散目标与标准下一字节预测损失，实现每步并行生成多字节；（2）BLT-Self-speculation（BLT-S），利用本地解码器越界“起草”字节并由全模型单次验证；（3）BLT-Diffusion+Verification（BLT-DV），在扩散生成后引入自回归校验。实验表明，三者均可将生成阶段的内存带宽开销降低50%以上，在保持字节级建模优势的同时显著提升推理效率与生成质量。

PDF · arXiv | ❤️ 5

6. Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

Shuhang Lin, Chuhao Zhou, Xiao Lin

本文针对知识图谱问答（KGQA）中答案可靠性不足的问题，提出可信赖的“共形路径推理”（CPR）框架。CPR通过在路径级分数上实施查询级共形校准，保障交换性并生成路径预测集；同时设计轻量级残差共形价值网络（RCVNet），结合PUCT引导探索

学习高判别力的路径非一致性分数。在多个基准上的实验表明，CPR相较现有共形方法将经验覆盖率提升34%，平均预测集规模降低40%，在严格满足统计覆盖保证的同时显著提升答案紧凑性与实用性。

PDF · arXiv | ❤️ 1

7. Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

Maryam Maghsoudi, Shihab Shamma

本文提出一种零样本想象语音解码新方法，旨在克服非侵入式脑信号（如MEG）中想象语音数据稀缺、跨被试/会话时间对齐困难等挑战。研究采集了受过训练的音乐家在聆听与想象节奏性旋律及语音刺激时的配对MEG数据，利用其优异的时间一致性提升建模可靠性。

方法采用三阶段解码流程：首先构建从想象到聆听MEG响应的跨条件映射模型；其次仅基于聆听数据训练对比式词解码器，并融合语义、声学与音素嵌入进行评估；最后将新被试的想象MEG经映射后输入该解码器。秩分析表明，所解码词汇显著优于随机水平，验证了零样本想象语音解码的可行性。

PDF · arXiv

8. EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

Wei Yu, Yunhang Qian

本文针对基于事件的图像重建任务中CNN难以建模全局依赖、ViT计算复杂度高（O(n²)）等问题，提出高效视觉状态空间模型EmambaIR。该模型融合事件流的时空稀疏性，创新性地设计了跨模态Top-k稀疏注意力模块（TSAM）与门控状态空间模

块（GSSM）：TSAM实现像素级稀疏交互以生成高判别性融合特征；GSSM在保持线性复杂度（O(n)）的同时，通过非线性门控机制增强时序建模能力。在运动去模糊、去雨和HDR增强三大任务共六个数据集上的实验表明，EmambaIR在重建质量上显著超越SOTA方法，并大幅降低显存占用与计算开销。

PDF · arXiv

🔥 arXiv 每日论文

📄 arXiv: cs.AI

1. GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

Wenjin Li, Jiaming Cui

本文针对大语言模型（LLMs）在图算法推理任务中性能不足的问题，提出GraphDC——一种基于分治策略的多智能体系统。GraphDC将输入图递归划分为子图，由专用代理并行执行局部推理，并通过主代理融合子图结果及跨子图依赖关系，实现全局解的协

同生成。该分层架构显著降低了单个代理的推理复杂度，缓解了计算瓶颈，提升了对大规模图的鲁棒性。实验表明，GraphDC在多种图算法任务上均显著优于现有方法，尤其在大规模图实例上展现出更强的可扩展性与可靠性。

2. More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

Xiao Wang

本文揭示了推理模型中一种被忽视的“长度驱动位置偏差”现象：在多选题问答任务中，链式思维（CoT）等推理过程越长，模型对选项位置（如A/B/C/D）的偏好越强，而非仅由浅层启发式导致。作者在MMLU、ARC-Challenge和GPQA上系统

评估13种推理配置（涵盖7B至671B规模模型），发现12种存在显著正向偏相关（r=0.11–0.41, p<0.05）；截断实验证实该偏差具因果性——从推理路径更晚位置续写时，模型转向位置偏好答案的概率从16%升至32%。即使在671B模型中整体偏差微弱（PBS=0.019），最长推理段仍达0.071，表明准确性仅抑制而非消除该机制。研究进一步区分了直接回答与CoT下的位置偏差本质差异，并提出PBS等可解释性工具，呼吁MCQ评测需将位置鲁棒性纳入推理模型审计标准。

3. Fast and Effective Redistricting Optimization via Composite-Move Tabu Search

Hai Jin, Diansheng Guo

本文针对空间选区划分这一具有强实际约束的组合优化问题，提出一种复合移动禁忌搜索算法（CM-Tabu），旨在克服传统方法中连通性约束导致邻域空间萎缩、易陷入局部最优的瓶颈。该方法通过识别关节点与双连通分量，在线性时间内生成保持连通性的单单元移

动及最小单元集协同移动或交换等复合移动操作，系统扩展可行邻域。实验表明，CM-Tabu在解质量、运行稳定性与计算效率上均显著优于传统禁忌搜索及其他基线方法；以费城案例为例，其可稳定达到人口均衡目标的理论全局最优，并有效支持多目标权衡，具备面向实际决策支持场景的实用性。

📄 arXiv: cs.CL

1. Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

Jon-Paul Cacioli

本文系统评估了33个前沿大语言模型在MMLU六大领域（应用/专业知识、形式推理、自然科学等）的领域级元认知监控能力，基于1500道题目（每域250题）与口头化置信度评分（0–100）计算Type-2 AUROC。结果表明：所有具备高于随机水

平整体监控能力的模型均存在显著领域差异；应用/专业知识最易监控（平均AUROC=0.742），形式推理与自然科学最难（共占33模型中27个的末两位）；六领域划分具有实践合理性但非潜在结构验证；模型家族内监控剖面呈显著聚类（Anthropic、Gemini、Qwen），而DeepSeek、Gemma、OpenAI则不显著；Gemma 4较Gemma 3提升0.202 AUROC；二元KEEP/WITHDRAW探针失效模型在口头置信下仍表现正常，凸显探针格式特异性。研究揭示聚合指标掩盖的重要领域异质性，主张在部署前开展基准阶段的领域筛选。

2. VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

Jiacheng Xu, Heting Gao, Liufei Xie, Zhenchuan Yang, Lijiang Li, Yiting Chen, Bin Zhang, Meng Chen, Chaoyu Fu, Weifeng Zhao, Wenjiang Zhou

本文提出VITA-QinYu，首个支持角色扮演与歌唱生成的端到端表达式语音语言模型（SLM）。针对语音中超越文本内容的个性、情绪及表演性表达建模难题，该模型采用混合语音-文本范式，通过多码本音频令牌扩展交错式文音联合建模，在增强副语言信息表

征能力的同时保障模态解耦。研究构建了覆盖15.8K小时自然对话、角色扮演与歌唱的高质量合成数据集。实验表明，VITA-QinYu在角色扮演客观评测中领先同类SLM 7个百分点，在歌唱主观MOS评分（5分制）上提升0.13分，并在C3和URO对话基准上分别以1.38%和4.98%优势刷新准确率与流畅度SOTA。代码、模型及支持流式与全双工交互的演示系统均已开源。

3. IntentGrasp: A Comprehensive Benchmark for Intent Understanding

Yuwei Yin, Chuyuan Li, Giuseppe Carenini

本文提出IntentGrasp——一个面向意图理解能力评估的综合性基准，涵盖12个领域、源自49个高质量开源语料库，包含26.3万训练样本及两个评测集（All Set含1.29万例，Gem Set含470例，更具挑战性与平衡性）。在20个主

流大语言模型上的实验表明，现有模型在All Set和Gem Set上平均准确率分别低于60%和25%，其中17个模型在Gem Set上甚至不及随机猜测基线（15.2%），而人类表现达81.1%。为此，作者提出意向性微调（IFT）方法，在IntentGrasp训练集上微调模型，使F1分数在All Set和Gem Set上分别提升超30点和20点；跨域留一域验证（Lodo）进一步证实其强泛化能力。该工作为构建更智能、可靠、以人为本的AI助手提供了关键基准与技术路径。

📄 arXiv: cs.LG

1. RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung

本文针对大语言模型推理中KV缓存内存开销过大的问题，提出基于率失真理论的混合精度量化方法RateQuant。现有方法对所有注意力头采用统一比特宽，忽视其重要性差异；而简单按重要性分配比特会导致“失真模型错配”——不同量化器的失真衰减率β差异

显著（3.6–5.3），跨模型复用失真模型反而劣于均匀量化。RateQuant通过小规模校准集为每种量化器拟合专属失真模型，并利用逆水填算法闭式求解最优比特分配。在Qwen3-8B上，2.5比特平均精度下，其将KIVI的困惑度从49.3降至14.9（下降70%），优于QuaRot达6.6 PPL，校准仅需1.6秒且推理零开销。

2. LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

Enshuai Zhou, Yifan Hao, Chao Wang, Rui Zhang, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Yunji Chen

本文针对大语言模型（LLM）长上下文推理中KV缓存内存随序列长度线性增长的瓶颈问题，提出端到端可学习的KV缓存淘汰框架LKV。LKV将KV压缩建模为可微优化问题，包含两部分：LKV-H学习任务驱动的头级全局预算分配，LKV-T基于查询-键内

在关系无须显式计算注意力矩阵即可评估KV重要性。该方法摒弃依赖统计先验或固定归纳偏置的传统启发式策略，实现压缩过程与下游任务目标的严格对齐。在LongBench和RULER基准上，LKV在高压缩率下达到SOTA性能；尤其在LongBench上仅保留15% KV缓存即可实现近无损推理。消融分析进一步证实，数据驱动的预算学习是性能提升的主导因素。

3. A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

Antoine Heranval (BioSP), Olivier Lopez (CREST), Didier Ngatcha (CREST), Daniel Nkameni (CREST)

本文针对气候变化下土壤沉降等自然灾害风险加剧、传统保险精算模型难以应对中长期气候情景的挑战，提出一种基于Wasserstein生成对抗网络（WGAN）的气候情景生成器SwiGAN。该模型以法国土壤湿度指数（SWI）为关键变量，采用条件生成对

抗网络架构，学习历史SWI时空演化规律，生成至2050年高保真、物理合理的时间序列与空间分布图。实验表明，SwiGAN在统计特性、时空连续性及极端事件再现能力上显著优于基准模型，可有效支撑干旱风险动态评估、巨灾债券定价及Solvency II框架外的长期偿付能力压力测试，方法亦可拓展至其他气候相关险种与经济情景生成任务。

📄 arXiv: cs.CV

1. Visual Text Compression as Measure Transport

Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

本文针对视觉文本压缩（VTC）中“压缩率与下游性能脱钩”这一核心问题，提出基于测度传输（measure transport）的理论框架。将文本与视觉token建模为经验概率测度，揭示ViT patch编码器所诱导的推前映射可分解为表征精度损

失（within-patch聚合）与覆盖偏差损失（cross-patch碎片化），两类代价均可通过无标签下游探针估计。据此构建无监督路由准则与传输感知的foveation重编码机制。在24个NLP数据集上，该方法以零标签成本实现70.8%的数据集级oracle匹配率，并在平均减少10.3%解码token的同时提升任务性能3.3%。

2. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

Yiwen Xu, Tariq M. Khan, Yang Song, Erik Meijering

本文对边缘深度学习（Edge Deep Learning）在计算机视觉与医学诊断领域的研究进展进行了系统性综述。文章首先阐述了边缘深度学习融合边缘计算与深度学习的核心范式及其在低延迟、隐私保护与环境自适应决策方面的技术优势；进而提出基于性能

与应用场景的边缘硬件平台新型分类体系；随后重点梳理了面向边缘设备的轻量化模型设计、模型压缩与高效推理等关键技术；结合典型计算机视觉及医学影像诊断案例，验证了其在真实场景中的实用性与临床价值；最后分析了算力受限、模型泛化性、数据异构性等关键挑战，并展望了智能边缘协同、自适应学习与可信AI等未来方向。

3. HumanNet: Scaling Human-centric Video Learning to One Million Hours

Yufan Deng, Daquan Zhou

本文提出HumanNet——一个规模达百万小时的人类中心视频数据集，旨在突破具身智能发展中物理交互学习的数据瓶颈。该数据集涵盖第一人称与第三人称视角，覆盖细粒度活动、人-物交互、工具使用及长时程行为，并提供动作描述、手部/身体信号及多模态标

注。其核心创新在于构建了以人类中心过滤、时序结构化、视角多样性与标注增强为原则的系统性数据治理范式。实验表明，在固定验证集下，基于HumanNet中1000小时第一人称视频对Qwen视觉语言模型进行持续训练，性能超越使用100小时真实机器人数据（Magic Cobot）的基线，验证了人类视频作为具身学习可扩展、低成本替代数据源的有效性。

🔬 OpenReview 近期论文

1. From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

Ruilin Luo, Chufan Shi, Yizhen Zhang

本文聚焦多模态大推理模型（MLRMs）冷启动阶段中视觉注意力机制的关键作用，提出视觉注意力分数（VAS）作为量化模型对视觉token关注程度的指标。实验发现VAS与推理性能高度相关（r=0.9616），但标准多模态冷启动未能提升VAS，反而

呈现“懒惰注意力定位”现象；而纯文本冷启动却显著增强视觉注意力。基于此，作者设计无需训练的推理时注意力干预方法，并进一步提出注意力引导的视觉锚定与反思框架（AVAR），融合视觉锚定数据合成、注意力引导优化目标及视觉锚定奖励塑形。在Qwen2.5-VL-7B上，AVAR在7个基准上平均提升7.0%，消融实验验证各模块的渐进贡献。

AI 每日资讯 — 2026-05-11#

🔥 HuggingFace 每日论文#

1. Flow-OPD: On-Policy Distillation for Flow Matching Models#

2. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling#

3. Normalizing Trajectory Models#

4. SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation#

5. Fast Byte Latent Transformer#

6. Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration#

7. Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping#

8. EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction#

🔥 arXiv 每日论文#

📄 arXiv: cs.AI#

1. GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning#

2. More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models#

3. Fast and Effective Redistricting Optimization via Composite-Move Tabu Search#

📄 arXiv: cs.CL#

1. Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas#

2. VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing#

3. IntentGrasp: A Comprehensive Benchmark for Intent Understanding#

📄 arXiv: cs.LG#

1. RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory#

2. LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction#

3. A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence#

📄 arXiv: cs.CV#

1. Visual Text Compression as Measure Transport#

2. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey#

3. HumanNet: Scaling Human-centric Video Learning to One Million Hours#

🔬 OpenReview 近期论文#

1. From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning#

2. One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning#

3. Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding#

4. Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation#

5. CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model#

6. Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)#

7. Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning#

8. P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context#

9. Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations#

10. Dual Randomized Smoothing: Beyond Global Noise Variance#

11. Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI#

12. APPLE: Toward General Active Perception via Reinforcement Learning#

13. Equivariant Splitting: Self-supervised learning from incomplete data#

14. Evaluating GFlowNet from partial episodes for stable and flexible policy-based training#

15. Steer Away From Mode Collisions: Improving Composition In Diffusion Models#

📝 AI 官方博客#

1. The new AI-powered Google Finance is expanding to Europe.#

2. See what happens when creative legends use AI to make ads for small businesses.#

3. 5 gardening tips you can try right in Search#

4. Early Indicators of Reward Hacking via Reasoning Interpolation#

5. Reward Hacking Resarch Update#

6. Pretraining Data Filtering for Open-Weight AI Safety#

7. Introducing Claude Opus 4.7ProductApr 16, 2026Our latest Opus model brings stronger performance across coding, agents, vision, and multi-step tasks, with greater thoroughness and consistency on the work that matters most.#

8. ProductApr 17, 2026Introducing Claude Design by Anthropic LabsToday, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.#

9. AnnouncementsApr 7, 2026Project GlasswingA new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software.#

📬 TLDR AI 精选#

1. Wispr Flow#

2. Start Free#

3. Google shipped Gemini 3.1 Flash-Lite in General Availability#

4. Akamai climbs to highest level since 2000#

5. Nvidia embraces role of AI investor, pushing past $40 billion in equity bets this year#

6. Why MistralAI Grows Faster Than OpenAI/Anthropic#

7. Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts#

8. Useful memories become faulty when continuously updated by LLMs#

9. Build a Realtime Speech Translation#

10. The Anti-Singularity#

💬 Hacker News AI 热门#

1. Gmail registration now requires scanning a QR code and sending a text message#

2. Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s#

📰 TechCrunch AI 新闻#

1. There aren’t enough rockets for space data centers. Cowboy Space raised $275 million to build them.#

2. Get ready for the whisper-filled office of the future#

3. Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts#

4. We’re feeling cynical about xAI’s big deal with Anthropic#

5. Voice AI in India is hard. Wispr Flow is betting on it anyway.#

AI 每日资讯 — 2026-05-11

🔥 HuggingFace 每日论文

1. Flow-OPD: On-Policy Distillation for Flow Matching Models

2. LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

3. Normalizing Trajectory Models

4. SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

5. Fast Byte Latent Transformer

6. Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

7. Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

8. EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

🔥 arXiv 每日论文

📄 arXiv: cs.AI

1. GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

2. More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

3. Fast and Effective Redistricting Optimization via Composite-Move Tabu Search

📄 arXiv: cs.CL

1. Domain-level metacognitive monitoring in frontier LLMs: A 33-model atlas

2. VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

3. IntentGrasp: A Comprehensive Benchmark for Intent Understanding

📄 arXiv: cs.LG

1. RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

2. LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

3. A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

📄 arXiv: cs.CV

1. Visual Text Compression as Measure Transport

2. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

3. HumanNet: Scaling Human-centric Video Learning to One Million Hours

🔬 OpenReview 近期论文

1. From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning

2. One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

3. Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding

4. Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation

5. CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

6. Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)

7. Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

8. P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context

9. Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations

10. Dual Randomized Smoothing: Beyond Global Noise Variance

11. Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

12. APPLE: Toward General Active Perception via Reinforcement Learning

13. Equivariant Splitting: Self-supervised learning from incomplete data

14. Evaluating GFlowNet from partial episodes for stable and flexible policy-based training

15. Steer Away From Mode Collisions: Improving Composition In Diffusion Models

📝 AI 官方博客

1. The new AI-powered Google Finance is expanding to Europe.

2. See what happens when creative legends use AI to make ads for small businesses.

3. 5 gardening tips you can try right in Search

4. Early Indicators of Reward Hacking via Reasoning Interpolation

5. Reward Hacking Resarch Update

6. Pretraining Data Filtering for Open-Weight AI Safety

7. Introducing Claude Opus 4.7ProductApr 16, 2026Our latest Opus model brings stronger performance across coding, agents, vision, and multi-step tasks, with greater thoroughness and consistency on the work that matters most.

8. ProductApr 17, 2026Introducing Claude Design by Anthropic LabsToday, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.

9. AnnouncementsApr 7, 2026Project GlasswingA new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software.

📬 TLDR AI 精选

1. Wispr Flow

2. Start Free

3. Google shipped Gemini 3.1 Flash-Lite in General Availability

4. Akamai climbs to highest level since 2000

5. Nvidia embraces role of AI investor, pushing past $40 billion in equity bets this year

6. Why MistralAI Grows Faster Than OpenAI/Anthropic

7. Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

8. Useful memories become faulty when continuously updated by LLMs

9. Build a Realtime Speech Translation

10. The Anti-Singularity

💬 Hacker News AI 热门

1. Gmail registration now requires scanning a QR code and sending a text message

2. Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

📰 TechCrunch AI 新闻

1. There aren’t enough rockets for space data centers. Cowboy Space raised $275 million to build them.

2. Get ready for the whisper-filled office of the future

3. Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

4. We’re feeling cynical about xAI’s big deal with Anthropic

5. Voice AI in India is hard. Wispr Flow is betting on it anyway.