AI Tech Daily - 2026-05-22 | Recsys Frontier

type

Post

status

Published

date

May 22, 2026 05:01

slug

ai-daily-en-2026-05-22

summary

Today's AI landscape is dominated by Agent infrastructure — from how to provision compute for agents, to building multi-agent systems, to the economic models of an agent-driven web. We cover 19 articles (5 featured), 5 GitHub projects, 4 podcast episodes, and 30 KOL tweets. The big theme: agents are

📊 Today's Overview

Today's AI landscape is dominated by Agent infrastructure — from how to provision compute for agents, to building multi-agent systems, to the economic models of an agent-driven web. We cover 19 articles (5 featured), 5 GitHub projects, 4 podcast episodes, and 30 KOL tweets. The big theme: agents are moving from prototypes to production, and the entire stack — from chips to deployment platforms to content economics — is being rethought.

Featured articles: 5 | GitHub projects: 5 | Papers: 0 | KOL tweets: 30

🔥 Trend Insights

Agent Infrastructure Goes Mainstream: The conversation has shifted from "can agents work?" to "how do we run them at scale?" The Latent Space interview with Daytona's CEO reveals that RL/eval workloads now account for 50% of sandbox usage, and that Kubernetes is a poor fit for agent compute. GitHub projects like `multica` and `chrome-devtools-mcp` are filling the gaps in agent management and browser control, while Microsoft's MagenticLite shows that even small models can be effective agents with the right orchestration.

The Cost Reality Check: The era of cheap AI is ending. Microsoft canceled internal Claude Code licenses due to token costs, Uber burned through its 2026 AI budget in four months, and US AI software prices have risen 20-37%. This is driving interest in local models (Forge, a reliability layer for self-hosted LLMs) and efficient architectures (Gated DeltaNet-2, HRM-Text). The market is demanding agents that are not just capable, but cost-effective.

Content Economics in the Agentic Web: Stratechery's interview with Parallel's founder tackles a fundamental question: if agents consume content directly, how do we value and reward creators? This is a strategic shift from human-centric to machine-centric content economics, with implications for everything from SEO to paywalls to micropayments.

🐦 X/Twitter Highlights

AI/科技信息日报 | 2026-05-22

📊 本期收录：14 条推文（合并后） | 18 位作者

📈 热点与趋势

阿里发布Qwen3.7-Max旗舰模型，AA指数56.6，接近前沿，支持35小时自主Agent操作 – 科学推理和Agent能力大幅提升，GPQA Diamond 92.4，支持1M上下文，Coding Agent端到端完成多文件重构和调试，可通过AI/ML API调用 @Alibaba_Qwen | @ArtificialAnlys | @rohanpaul_ai | @aimlapi

微软取消内部Claude Code许可证，因token计费成本过高；Uber四个月烧完2026年AI预算 – GitHub Copilot也转向按用量计费，美国AI软件价格已上涨20%-37%，HedgieMarkets（市场分析账号）称AI补贴时代正在结束 @GaryMarcus | @HedgieMarkets | @BrianRoemmele

Georgia Power征用21处房产为AI数据中心建输电线路，居民起诉 – Project Sail数据中心需900兆瓦，Coweta县居民和Stop Project Sail组织已提起诉讼 @HedgieMarkets

Amazon过去一年因AI裁减3万员工 – Jeff Bezos称“AI不是来取代工作，而是升级工作” @DarrigoMelanie

Pizza Hut加盟商起诉AI配送系统致配送速度慢50%，索赔1亿美元 – 指控公司AI交付效率低下 @Polymarket

Sundar Pichai（Google CEO）在I/O上与Matthew Berman讨论AGI、Agent、开源和中美竞争 – 探讨AI代理是否会杀死“原始互联网”、Google未开源大模型的商业原因等 @sundarpichai

Kling AI（快手旗下AI视频生成）参与全AI电影RAPHAEL，计划2026年院线上映 – 与Mateo AI Studio、韩国MBC C&I合作，证明纯AI电影工业可行性 @Kling_ai

🔧 工具与产品

OpenAI Codex 周四更新：Appshots和远程控制Mac – Appshots可将Mac窗口截图和文本附加到对话中；Codex可从手机安全控制Mac，即使Mac锁定且屏幕关闭 @OpenAIDevs | @OpenAIDevs | @OpenAI

xAI 宣布Grok（xAI模型）在OpenCode中可用，同时Grok iOS推出Agent Mode – Agent Mode支持跨生成保持人物一致性、多场景和不同相机角度生成 @xai | @XFreeze

vLLM（开源推理引擎 / UC Berkeley出品）推出弹性专家并行，API热调整DP/EP拓扑无需重启 – 一个API调用即可动态改变数据并行大小；支持故障后重分配专家并恢复服务 @vllm_project

Weaviate v1.37.1（向量数据库）发布内置MCP服务器 – 编码Agent通过`/v1/mcp`可直接对代码库进行混合搜索，BM25锚定精确token、向量找语义相关内容 @weaviate_io

腾讯开源Hy-MT2多语言翻译模型，三个规模：1.8B/7B/30B-A3B – 1.8B版本（440MB）可在手机运行并超越微软API，30B版本优于10倍参数模型，支持33种语言 @TencentAI_News

SGLang（开源推理引擎 / lmsys出品）在AMD GPU集群上部署PD-disaggregated推理教程 – 使用dstack（ML基础设施）单配置文件实现自动缩放端点 @lmsysorg | @dstackai

⚙️ 技术实践

Gated DeltaNet-2发布，解耦擦除和写入门控，1.3B模型超越Mamba-3和KDA – 线性注意力新架构，长上下文RULER检索中S-NIAH-3从63提升到90，多键针检索从28提升到38；训练速度、章式WY算法支持 @ahatamiz1 (via @rasbt)

RLVR预测研究：少于20%训练即可预测完整训练轨迹 – 发布500+ RLVR检查点供社区研究训练动力和外推 @weizhepei（Zhepei Wei，RLVR研究一作）

CODA方法：将整个Transformer重写为gemm + epilogue – Tri Dao（FlashAttention作者 / Together AI首席科学家）介绍，所有Transformer操作可融合到矩阵乘法的epilogue中，LLM自己也能写出接近编译最优的CODA内核 @tri_dao

Nous Research（开源AI研究组织）发布1.7B字节级LLM训练研究，验证子词分词七大假设中三项有效 – 在FineWeb-Edu、LLaMA-3架构下控制实验表明计算效率、子词边界结构先验和优化目标是真实受益因素 @NousResearch

HRM-Text：1B参数模型仅训练40B tokens、约$1000即达竞争性性能 – 基于层次循环计算、任务完成训练和潜在空间推理 @makingAGI（Guan Wang，HRM-Text一作）| @zhuci19

swyx用AI agent在16小时内将vibecoded应用重构为生产级代码库，生成103次提交 – 最终获得具备端到端测试、可维护、可并行化的agent仓库 @swyx

⭐ Featured Content

1. Giving Agents Computers — Ivan Burazin, Daytona

📍 Source: Latent Space | ⭐⭐⭐⭐⭐ | 🏷️ Agent, Infra, 部署服务, Insight, 深度复盘

📝 Summary:

A deep-dive interview with Daytona CEO Ivan Burazin on what AI agents actually need from compute infrastructure. Key findings: agents need composable, stateful computers — not disposable code execution sandboxes. Daytona built its own scheduler on bare metal, achieving 60ms boot for a single sandbox and 75 seconds for 50,000. Their largest customer runs ~850,000 sandboxes daily. RL and evaluation workloads grew from 0% to 50% of usage in months. The interview also covers why CLI might matter more than MCP, why agents need Windows and macOS environments, why Kubernetes is the wrong tool for this job, and why the future AI cloud might look more like Stripe than AWS.

💡 Why Read:

This is the most practical, data-rich piece on agent infrastructure you'll find today. If you're building or deploying agent systems, the numbers alone — 60ms boot, 50% RL workload growth, 850K daily sandboxes — are worth the read. The anti-Kubernetes argument is a conversation starter. The "AI cloud as Stripe" thesis is genuinely forward-looking. Skip this only if you don't care about how agents actually run in production.

2. How to Build a Multi-Agent Research Assistant in Python

📍 Source: Jason Brownlee | ⭐⭐⭐⭐⭐ | 🏷️ Agent, 多Agent, Agentic Workflow, Tutorial, 工具调用

📝 Summary:

A step-by-step tutorial on building a multi-agent research assistant using the OpenAI Agents SDK. Covers SDK installation, agent definition, tool integration (search, web scraping), and orchestration. The core idea: multiple specialized agents (search, summarization) collaborate under a coordinator. Includes complete Python code examples, instruction design patterns, tool binding, and context passing. A ready-to-use guide for anyone wanting to build multi-agent systems fast.

💡 Why Read:

This is a copy-paste-and-run tutorial. If you've been reading about multi-agent systems but haven't built one, this is your on-ramp. The code is complete, the patterns are reusable, and the research assistant use case is immediately practical. Five minutes of reading gets you a working prototype. That's hard to beat.

3. Datasette Agent

📍 Source: simonwillison | ⭐⭐⭐⭐ | 🏷️ Agent, Product, 功能发布, Tutorial, 工具使用

📝 Summary:

Simon Willison launches Datasette Agent — an extensible AI assistant for querying Datasette data through a conversational interface. Supports plugin extensions for charts, image generation, and code execution. Includes a live demo using Gemini 3.1 Flash-Lite and commands for running local models. Also discusses future directions like a personal AI assistant called Claw.

💡 Why Read:

Simon Willison is one of the clearest thinkers in developer tools. This is a practical product launch with a working demo and a plugin architecture. If you use Datasette or care about AI + databases, this shows one clean way to bridge them. The local model support is a nice touch for privacy-conscious teams.

4. MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

📍 Source: microsoft | ⭐⭐⭐⭐ | 🏷️ Agent, Computer Use, 小模型, Agentic Workflow, 微软

📝 Summary:

Microsoft Research releases a trio of components for small-model agents: MagenticLite (cross-browser and local file system agent app), MagenticBrain (planning, coding, delegation), and Fara1.5 (computer use model, 9B params, nearly doubling performance on web navigation tasks). The core insight: tool orchestration and action matter more than knowledge. The system is designed through co-design of model, application, and runtime framework.

💡 Why Read:

This is Microsoft's bet on small, efficient agents — a direct counter to the "bigger is better" trend. The 9B model nearly doubling web navigation performance is a strong signal. If you're working on edge deployment, cost-sensitive agent systems, or just want to see where efficient agent design is heading, this is worth your time.

5. An Interview with Parallel Founder Parag Agarwal About Valuing Content on the Agentic Web

📍 Source: Stratechery | ⭐⭐⭐⭐ | 🏷️ Agent, Strategy, 商业模式, Insight

📝 Summary:

A strategic interview with Parallel founder Parag Agarwal on how to value and incentivize content creation in an agent-driven web. His core thesis: content value should shift from human consumption to machine (agent) consumption. Discusses micropayment models based on agent interactions, Twitter algorithms, and content distribution. This is a forward-looking piece on the economics of the agentic web, not technical details.

💡 Why Read:

This is the kind of piece that makes you rethink your assumptions. If agents are going to consume content directly — reading articles, watching videos, analyzing data — then the entire content economy needs a new model. Agarwal's ideas are speculative but grounded. Stratechery's analysis adds useful context. Read this if you care about where the money flows in an AI-native internet.

🎙️ Podcast Picks

Giving Agents Computers — Ivan Burazin, Daytona

📍 Source: Latent Space | ⭐⭐⭐⭐⭐ | 🏷️ Infra, Agent, LLM | ⏱️ 1:10:27

Daytona CEO Ivan Burazin discusses the new compute requirements for AI agents: from human dev environments to composable, stateful, fast-booting sandboxes. Daytona runs its own scheduler on bare metal — 60ms per sandbox, 850K daily for their largest customer. RL/eval workloads grew from 0% to 50%. Covers why agents need Windows/macOS, why CLI may beat MCP, why Kubernetes is wrong for this, and why the future AI cloud may look like Stripe.

💡 Why Listen: If you're building agent infrastructure, this is the most data-rich conversation you'll hear this week. The 60ms boot time, 50% RL workload growth, and anti-Kubernetes argument are all worth the listen. The "AI cloud as Stripe" thesis is a genuinely new way to think about the market.

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768

📍 Source: TWIML AI | ⭐⭐⭐⭐⭐ | 🏷️ LLM, Agent, Research | ⏱️ 1:06:23

Jure Leskovec introduces AI Virtual Cell multi-scale modeling and relational deep learning. The core: Kumo's Relational Foundation Model (RFM2) treats enterprise databases as graphs, training neural networks directly on raw multi-table data for zero-shot prediction. Discusses deployments at Reddit, DoorDash, explainability via attention mechanisms, and integration with agent systems.

💡 Why Listen: Leskovec is a Stanford professor and Kumo's chief scientist. This is a rare look at how LLM techniques apply to structured enterprise data — a huge but underserved market. If you work with databases, recommendation systems, or enterprise AI, this is directly relevant.

Hermes Agent: Agents that grow with you

📍 Source: Practical AI | ⭐⭐⭐⭐⭐ | 🏷️ Agent, LLM, Open Source | ⏱️ 51:42

Nous Research co-founder and CTO Jeffrey Quesnelle deep-dives into Hermes Agent: self-improving AI agents, recursive learning systems, and how AI tools blur the line between software and autonomous collaborators. Discusses the difference between models and frameworks (harnesses), the evolving role of developers, and what makes humans unique in an AI-accelerated world.

💡 Why Listen: Nous Research is one of the most interesting open-source AI labs. This conversation goes beyond "how to build an agent" into "what happens when agents improve themselves." The model vs. framework distinction is a useful mental model for anyone building agent systems.

The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman

📍 Source: No Priors | ⭐⭐⭐⭐⭐ | 🏷️ Infra, Interview, Funding | ⏱️ 30:33

Cerebras founder Andrew Feldman tells the story: betting on wafer-scale AI computing, surviving the hard years before demand caught up, engineering breakthroughs achieving 20x GPU inference speed, and closing a $20B deal with OpenAI in four weeks. Discusses why AI needs "specialist Davids" against tech giants, open source, and post-training workloads.

💡 Why Listen: 30 minutes, packed with story and insight. The $20B OpenAI deal in four weeks is a wild anecdote. The "specialist David vs. Goliath" framing is a useful lens for the AI chip market. If you care about AI infrastructure, this is a must-listen.

🐙 GitHub Trending

ChromeDevTools/chrome-devtools-mcp

⭐ 40,541 | 🗣️ TypeScript | 🏷️ MCP, Agent, DevTool

An official MCP server that lets coding agents (Antigravity, Claude, Cursor) control, debug, and analyze browsers via the Chrome DevTools Protocol. Provides performance tracing, network analysis, screenshots, console message inspection — all through Puppeteer for reliable automation. This is the missing link between agents and browser-level debugging.

💡 Why Star: This is the official bridge between AI coding agents and browser DevTools. If your agents need to debug web apps, analyze performance, or automate browser tasks, this is the standard tool. 40K stars in a short time says everything about demand.

multica-ai/multica

⭐ 30,818 | 🗣️ TypeScript | 🏷️ Agent, DevTool, Framework

An open-source hosted agent platform that turns coding agents into real team members. Supports task assignment, progress tracking, skill reuse, and works with Claude Code, Codex, and more. Uses a Squads routing layer for multi-agent orchestration with full lifecycle management. Solves the fragmentation problem in agent tooling.

💡 Why Star: If you're managing multiple agents across a team, this is the platform you've been waiting for. It turns agents from one-off tools into managed, reusable team assets. The multi-agent orchestration via Squads is a clean design. 30K stars confirms the pain point is real.

antoinezambelli/forge

⭐ 1,522 | 🗣️ Python | 🏷️ LLM, Agent, Framework

A reliability layer for self-hosted LLMs doing tool calling and multi-step agent workflows. Uses guards (parse rescue, retry prompts, step forcing) and context management (VRAM-aware budgets, hierarchical compression) to boost 8B local models to frontier-level performance on complex agent tasks. Supports Ollama, llama.cpp, and can run as a WorkflowRunner, middleware, or proxy server.

💡 Why Star: This directly addresses the #1 problem with local models: reliability. If you're running agents on self-hosted LLMs and hitting failures, Forge's guard system is a practical fix. The VRAM-aware context management is a nice touch for resource-constrained setups.

google/adk-samples

⭐ 9,382 | 🗣️ Python | 🏷️ Agent, Framework, DevTool

Google's official ADK sample repository with multi-language agent examples (Python, TypeScript, Go, Java). Covers real-world scenarios: customer service, data analysis, financial advisor, multi-agent orchestration. Built on the Agent Development Kit, ready to run. The best starting point for learning Google's agent framework.

💡 Why Star: Google's official samples are the fastest way to learn ADK. The multi-language support is rare and valuable for polyglot teams. Some samples are simple, but the breadth of scenarios makes this a useful reference.

microsoft/markitdown

⭐ 124,447 | 🗣️ Python | 🏷️ LLM, DevTool, Data

A lightweight Python tool from Microsoft's AutoGen team that converts PDFs, Office docs, images, audio, and more into Markdown — designed for LLM and text analysis pipelines. Supports OCR, speech transcription, YouTube link parsing. Output preserves headings, lists, tables. Integrates with AutoGen ecosystem. Dual mode: CLI and Python API.

💡 Why Star: 124K stars says it all. This is the standard tool for document preprocessing in LLM pipelines. If you're building RAG systems, agent tools, or any text analysis pipeline, this saves you from writing format-specific parsers. The AutoGen integration is a bonus.