AI Tech Daily - 2026-05-28 | Recsys Frontier

type

Post

status

Published

date

May 28, 2026 04:31

slug

ai-daily-en-2026-05-28

summary

📊 Today's Overview

AI coding and agent infrastructure dominated the news cycle. Cognition AI raised $1B at a $26B valuation, while Fireworks AI is reportedly in talks at $15B — the AI coding race is heating up fast. On the technical side, NVIDIA open-sourced Polar for GRPO training across agent tools, Hugging Face slashed RL bandwidth by 97% with delta weight sync, and MiniMax released M2.5 hitting 80.2% on SWE-Bench. Taiwan also publicly cracked down on suspected NVIDIA chip smuggling to China, adding geopolitical tension to the AI supply chain story.

🔥 Trend Insights

AI coding funding frenzy: Cognition ($26B) and Fireworks ($15B) both raised massive rounds, signaling the market's bet on AI programming tools and inference infrastructure as the next big platform shift.

Agent RL infrastructure matures: NVIDIA's Polar framework, Hugging Face's delta weight sync, and OpenAI's self-improving tax agent all demonstrate that the tooling for training production agents is finally becoming practical and scalable.

On-device MoE goes mainstream: Meta's MobileMoE paper shows sub-billion MoE models can match dense models with 2-4x fewer FLOPs, while MiniMax's M2 series proves MoE works at scale — the architecture is winning at both ends.

🐦 X/Twitter Highlights

📈 热点与趋势

Cognition AI raises $1B at $26B valuation, $492M annual revenue, 10x enterprise user growth – Co-founder says cloud agents have gone from niche to mainstream two years after Devin launched, with dozens of large enterprise customers. Cognition is now the world's largest independent agent lab. @cognition (Cognition AI (Devin developer)) | @swyx

MiniMax announces M2 series end, M3 coming soon – Also released M2 technical report (arXiv). Core design: full attention only (rejects hybrid sliding window), 128-expert MoE (top-8 routing on 2B active params lifts MATH from 19.6→24.1), careful pipeline for agent behavior training (GitHub PR mining, Docker environments, task test rewards), self-evolution (M2.7 handles 30-50% of daily RL iterations, 100 rounds of scaffold optimization improves internal eval by 30%). @MiniMax_AI | @rasbt (AI researcher/author)

Taiwan publicly cracks down on suspected NVIDIA AI chip smuggling to China via Japan for the first time; Elon Musk says US must make its own chips to maintain AI advantage – Musk says US leads in AI near-term, but chip manufacturing geography determines long-term outcomes. Currently 100% of advanced fabs are in Taiwan — if China invades, global supply would be cut off. @unusual_whales (Independent financial media) | @Vivek4real_ (Independent blogger)

🔧 工具与产品

SGLang v0.5.12.post1 released: fixes DeepSeek V4 garbled output, NIXL PD crash, Blackwell FA4 compatibility – 12 cherry-pick patches including fix for V4-Pro single-token decoding garbled text, EAGLE deconstruction crash, HiSparse+Compressor V2 accuracy recovery from 0.825 to 0.960 (GSM8K), elimination of DeepSeek V4 20-40 second cold bucket stalls, etc. @lmsysorg

LiteParse v2: Rust-rewritten PDF parser, 100x faster, supports 50+ document types – Jerry Liu (LlamaIndex founder) releases, natively installable in Python, Node, Rust, and WASM, usable in browser/edge runtime. @jerryjliu0

Perplexity open-sources colbert model (PPLX 0.6B) – Supports single-vector (pplx-emb) and multi-vector (pplx-late) interaction, enabling direct comparison of retrieval gains from multi-vector interaction. @lateinteraction (Stanford assistant professor)

Robinhood launches AI agent trading accounts – CEO Vlad Tenev (Robinhood CEO) announces AI agents can connect to accounts for trade execution and portfolio management. @vladtenev

Hermes Agent adds built-in MCP Catalog and integrates Krea 2 image generation API – Krea 2 is a new foundation model supporting style transfer, mood board input, and creative control. @NousResearch | @NousResearch

OpenAI announces private MCP servers can connect to ChatGPT/Codex via outbound HTTPS – Enterprises can keep MCP servers on internal networks without opening inbound ports. @OpenAIDevs

⚙️ 技术实践

MiniMax M2 technical report deep dive – Sebastian Raschka (AI researcher/author) summarizes: production quality tradeoffs don't support hybrid windows; linear/sparse attention is fragile at low-precision KV and has poor prefix cache support; 128 experts top-8 routing outperforms 32 experts top-2; M2.7 already handles 30-50% of daily RL iterations autonomously. @rasbt

Sakana AI releases DiffusionBlocks: block-wise training of deep networks, memory drops to single-block level – ICLR 2026 paper, treats network forward pass as diffusion denoising process. Matches end-to-end training performance on ViT, DiT, autoregressive Transformer, recurrent deep Transformer. @SakanaAILabs | @hardmaru (Sakana AI research lead)

Qwen3.5 achieves 580 tps agent inference on TokenSpeed engine – Joint effort by Alibaba Qwen team, Lightseek OS Foundation, NVIDIA, Mooncake team. Special thanks to Tri Dao (FlashAttention author / Together AI Chief Scientist) for FlashAttention-4 optimization. @Alibaba_Qwen

Genesis AI open-sources World 1.0 simulation platform: 10x faster startup, 4.6x faster runtime – Includes GPU-accelerated compiler Quadrants, penetration-free multi-physics contact solver, physics AI renderer Nyx. Supports multiple robot types (Unitree, dexterous hands, etc.) with extremely low sim-to-real gap. @gs_ai_ (Genesis Robotics)

Percy Liang (Stanford professor / CRFM director) proposes Self-Verified Distillation – Model self-verifies then trains only on passed responses, no ground truth or external verifier needed. @percyliang

RLM training code and model open-sourced: trains recursive language models based on prime-rl and verifier – Alex Zhang (community developer) trains RLM-Qwen3-30B-A3B-v0.1, shows comprehensive improvements on long-context tasks, trainable on 8×A100 in one day. @lateinteraction

⭐ Featured Content

AI coding track sees another mega-funding round: Cognition valued at $26B, Fireworks at $15B ｜ AI coding and reasoning infrastructure capital frenzy continues

Cognition (Devin developer) raised $1B at a $26B valuation, led by existing investors and sovereign wealth funds — one of the largest single funding rounds in the AI coding track to date. Meanwhile, Fireworks AI is in talks for a new round at a $15B valuation, reflecting strong market recognition of the LLM inference optimization track. These two funding stories together outline capital heat in both AI coding tools and reasoning infrastructure — a key signal for practitioners understanding market direction.

Sources: Bloomberg (Cognition) ｜ Bloomberg (Fireworks)

NVIDIA releases Polar framework: GRPO reinforcement learning training for Codex, Claude Code, and other agent tools ｜ Token-faithful rollout without modifying harness

NVIDIA open-sources Polar framework. Core innovation: places a proxy gateway at the model API boundary, enabling GRPO training on Codex CLI, Claude Code, Qwen Code, and other tools without modifying agent harness logic — solving the pain point where traditional RL infrastructure requires rewriting harness logic. Supports multiple API formats (Anthropic, OpenAI, Google) and local inference engines like vLLM. For practitioners working on LLM agent training and RLHF/GRPO, this is a directly usable open-source tool.

Source: MarkTechPost

ITBench-AA: First enterprise IT agent benchmark released, frontier models score below 50% ｜ IBM and Artificial Analysis jointly launch SRE scenario evaluation

IBM and Artificial Analysis release ITBench-AA, the first agent benchmark for enterprise IT operations (SRE). Frontier models all score below 50%, with Claude Opus 4.7 leading at 47%, followed by GPT-5.5 (46%) and Qwen3.7 Max (42%). Key finding: longer reasoning trajectories don't improve accuracy — models often over-investigate, leading to false positives. Open-source models like GLM-5.1 (40%) and DeepSeek V4 Pro (38%) show competitive performance. The benchmark is open-sourced, with plans to expand to FinOps and CISO tasks. For practitioners focused on enterprise agent deployment, this is an important reference for model selection.

Source: Hugging Face

Hugging Face implements Delta Weight Sync: asynchronous RL training bandwidth reduced by 97% ｜ Transmits sparse differences via Hub Bucket, no shared cluster needed

Hugging Face implements delta weight sync based on Hub Bucket in TRL, solving the bandwidth bottleneck of weight synchronization in asynchronous reinforcement learning. Key finding: between adjacent RL optimization steps, ~99% of bf16 weight bits are identical — only sparse differences need transmission. Measured on Qwen3-0.6B: per-step load drops from 1.2 GB to 20-35 MB. The solution has successfully run fully separated training: trainer, vLLM, and Wordle environment each in different spaces, synchronizing weights through a single Hub Bucket — no shared cluster or RDMA needed. For teams doing large-scale RL training, this is a directly reusable engineering innovation.

Source: Hugging Face

OpenAI and Thrive build self-improving tax agent: accuracy rises from 25% to 86% in six weeks ｜ Codex-driven production feedback loop

OpenAI partners with Thrive Holdings to build Tax AI using Codex. The system uses a three-part cycle — "practitioner correction → product tracking → Codex optimization" — to convert production feedback into structured signals for autonomous improvement. In a pilot at Crete CPA firm, Tax AI processed 7,000 tax returns, saving roughly one-third of time, achieving 97% accuracy, and boosting throughput by 50%. The article details the quantitative improvement from deployment to 86% accuracy in six weeks. For practitioners building production agents, this is a highly valuable self-improvement loop methodology.

Source: OpenAI

AWS releases AgentCore-powered Field Advisor and NarrateAI: enterprise multi-agent orchestration in practice ｜ Two reusable architectures for sales and BI scenarios

AWS shares two production-grade agent systems built on Bedrock AgentCore: Field Advisor solves orchestration across 20+ domain agents, handling 120K+ prompts post-launch, saving sales reps 2 hours per week with 41% lower latency; NarrateAI uses a two-layer architecture (offline batch narrative generation + online multi-agent Q&A) for executive BI scenarios. Both articles detail AgentCore's key capabilities: isolated execution, unified gateway, persistent memory, identity propagation. For practitioners building enterprise multi-agent systems, these are directly referenceable architecture designs.

Sources: AWS (Field Advisor) ｜ AWS (NarrateAI)

MiniMax M2.5 released: SWE-Bench 80.2%, coding speed up 37%, extremely low cost ｜ Coding and search agent capabilities reach SOTA

MiniMax releases M2.5 model, achieving 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, 76.3% on BrowseComp. Coding speed is 37% faster than M2.1, with extremely low cost ($1/hour). The model reaches SOTA in coding, search, and tool calling, with emphasis on multilingual programming and real-world environment training. For LLM/agent practitioners, this is a noteworthy new model release with specific benchmark and pricing information for comparison with Qwen, DeepSeek, and competitors.

Source: MiniMax

AnythingLLM v1.13.0 introduces Model Router: first consumer-grade hybrid AI routing system ｜ Intelligent switching between local and cloud models, supports scheduled agent tasks

AnythingLLM v1.13.0 releases three core features: Model Router (first consumer-grade hybrid AI routing system, allowing users to define custom rules for intelligent switching between local and cloud models, supporting triggers like keywords, token count, time), Scheduled Jobs (timed automated agent tasks with visual Cron configuration), Automatic Memories (automatic memory extraction and personalization). These features combine local privacy with cloud capabilities, offering a new hybrid deployment paradigm for agent engineering — fully open-source and self-hostable. For practitioners focused on agent deployment and hybrid inference architectures, this is a tool worth trying.

Source: NewReleases

🎙️ Podcast Picks

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

📍 Source: Latent Space | ⭐⭐⭐⭐⭐ | 🏷️ LLM, Research, Interview | ⏱️ 1:10:12

Alex Rives introduces ESMFold2, a BERT-like transformer trained on protein sequences. By scaling data and compute, it surpasses AlphaFold3 on hard problems like antibodies. Discussion covers scaling laws in protein space, limitations of MSA inductive bias, effects of inference-time scaling, and the release of a 6.8 billion protein atlas. Core thesis: general language model approaches can beat specialized models — the bitter lesson applies to biology.

💡 Why Listen: Heavyweight guest (ESM team lead) in a deep interview, unveiling ESMFold2's major breakthrough. Shows how LLM methods beat AlphaFold3 in protein folding — essential listening for anyone in AI+science.

📄 Paper Highlights

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax ｜ 🏷️ Architecture, Training, MoE, Agent Framework

229.9B total params with only 9.8B activated per token. Introduces agent-driven data pipelines, Forge RL system, and self-evolving M2.7 checkpoint that autonomously debugs its own training — a complete end-to-end agent-native model family.

MobileMoE: Scaling On-Device Mixture of Experts

Meta AI ｜ 🏷️ Architecture, MoE, Quantization, Inference

First on-device MoE scaling law, identifying a sweet spot with moderate sparsity and shared experts. MobileMoE matches dense models with 2-4x fewer FLOPs and runs efficiently on commodity smartphones — the architecture for edge AI.

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

AI Agent Technologies (Hong Kong) Limited ｜ 🏷️ Agent Framework, Compression, Inference

Diagnoses "action-grammar destruction" — token-level compressors reliably remove action semantics. AGORA's step-level compressor with structural parsing and a 125M relevance scorer is the only method retaining ≥75% performance across 8 of 9 test cells.

🐙 GitHub Trending

Polar ｜ GRPO training framework for agent tools

NVIDIA's open-source framework places a proxy gateway at the model API boundary, enabling token-faithful GRPO training on Codex CLI, Claude Code, and Qwen Code without modifying harness logic. Supports Anthropic, OpenAI, Google APIs plus local vLLM — a drop-in RL infrastructure for agent teams.

GitHub ｜ ⭐ New ｜ 🗣️ Python ｜ 🏷️ RL, Agent, Training

AnythingLLM v1.13.0 ｜ Consumer-grade hybrid AI routing

First Model Router for intelligent local/cloud model switching with keyword, token, and time triggers. Adds scheduled agent tasks with visual Cron config and automatic memory extraction. Fully open-source and self-hostable — a new paradigm for hybrid agent deployment.

GitHub ｜ ⭐ 40,000+ ｜ 🗣️ JavaScript ｜ 🏷️ Agent, LLM, DevTool