AI Tech Daily - 2026-05-25 | Recsys Frontier

type

Post

status

Published

date

May 25, 2026 05:00

slug

ai-daily-en-2026-05-25

summary

📊 Today's Overview

Today's report covers a mix of big-picture strategy and hands-on tools. The standout is Ben Evans' deep dive on AI job exposure, which challenges the popular "exposed or not" charts with historical data and counterintuitive logic. On the ground, we see real cost pain: Microsoft banned Claude Code for internal engineers due to runaway token bills, and Uber burned through its entire 2025 AI budget by April. Featured articles: 2, GitHub projects: 2, Papers: 0, KOL tweets: 23.

🔥 Trend Insights

The Real Cost of AI Adoption: Two major signals emerged today. Microsoft revoked Claude Code access for thousands of engineers after token costs spiraled out of control. Uber's CTO revealed their full-year AI budget was exhausted by April — 84% of engineers use AI, 70% of commits are AI-generated, and heavy users cost $500–$2000 per month. Even Nvidia's VP admits compute costs far exceed salaries. The message: AI adoption at scale is expensive, and the economics are still being figured out.

AI-First Engineering Goes Mainstream: The "Harness" paradigm — where AI writes 99% of code and humans review — is no longer theoretical. A podcast from CreaoAI details 3-8 daily production deployments with AI-generated code. Meanwhile, a new paper shows DeepMind's AI agent autonomously solved 9 Erdős open problems (including two unsolved for 56 years) for a few hundred dollars each. The shift from "AI assists humans" to "AI leads, humans verify" is accelerating.

The Job Impact Debate Heats Up: David Sacks argues AI has increased GitHub commits 14x year-over-year while software engineering jobs have risen, questioning the "mass unemployment" narrative. Ben Evans' essay uses historical data (accounting automation, internet vs. media) to argue automation can increase jobs via price elasticity (Jevons paradox). But consulting firms like McKinsey face pricing pressure as clients question the value of human advice vs. AI. The debate is shifting from "will jobs disappear?" to "how will work transform?"

🐦 X/Twitter Highlights

AI/科技信息日报 | 2026-05-25

📊 本期收录：16 条推文（合并后） | 16 位作者

📈 热点与趋势

微软因成本禁止内部工程师使用 Claude Code，Uber 全年 AI 预算 4 月耗尽 – Microsoft 曾为数千工程师提供 Claude Code 访问，因 token 账单失控取消几乎所有许可。Uber CTO 表示全年预算 4 月已用完，84% 工程师使用 AI，70% 提交代码来自 AI，重度用户月消费 $500–$2000。Nvidia VP Bryan Catanzaro 也承认其团队计算成本远高于员工薪资 @Ric_RTP（独立博主）

David Sacks（前 PayPal COO / 云基础设施 CEO）：AI 使 GitHub 提交量年增 14 倍，软件工程师岗位反升 – AI 降低编码成本，催生更多应用和岗位，质疑"AI 造成大量失业"的说法 @DavidSacks

AI 使咨询公司客户质疑人类建议价值，McKinsey 等企业重新定价 – 据 Polymarket 报道，传统咨询公司正面临 AI 带来的定价压力 @Polymarket

Cathie Wood（ARK Invest 创始人）预测 AI agent 将推动 GPU:CPU 从 4–5:1 降至 1:1 – 引用 OpenAI CFO Sarah Fryer 观点，认为 agentic AI 激活 CPU 需求，Intel 等公司受益 @MilkRoadAI

🔧 工具与产品

Together AI 发布 Blackwell 优化推理栈，在 Artificial Analysis 多项第一 – 含新 attention kernel，在 Kimi 2.6 和 MiniMax 等模型上速度领先其他 GPU 端点 @vipulved

Tom Dörr（独立开发者）发布从零构建 AI Agent 教程和自托管编排工具 – 教程从第一原则出发，编排工具无外部依赖 @tom_doerr | @tom_doerr

OpenClaw 2026.5.22 发布：模型加载延迟降至 5ms，npm 锁定依赖 – 启动路径优化，Windows 安装路径加固 @openclaw

CodeWhale 发布：面向开源/开放权重模型的 agent harness – 原名 deepseek-tui，目标是成为开源模型 agent 黄金标准 @goodhunt

StepFun 推出基于 Step Plan 的会议笔记助手 – 粘贴杂乱笔记，自动提取待办和行动项，使用 Step 3.5 Flash 模型 @StepFun_ai

基于 Bittensor 的 ChatGPT 替代上线 Alpha：成本仅 1/250 – 支持文件问答、持久记忆、无审查，使用 chutes.ai 子网 @jaltucher

⚙️ 技术实践

Percy Liang 团队预注册 129B MoE 损失 2.252，实际训练落地 2.234 – 1e23 FLOPs 的运行证明可提前预测模型性能 @percyliang

DeepMind AI agent 自主解决 9 个 Erdős 开放问题，含 44 个 OEIS 猜想 – 包含两个 56 年未解问题，每个问题成本数百美元，全程 LLM-Lean 自动形式化验证 DeepMind | @AISafetyMemes | @Cointelegraph | @AcerFur

SOUL.md 文件定义：AI agent 身份与原则的 8 个关键部分 – 包括 identity、core truths、worldview、voice 等，30–80 行即可改变 agent 行为 @akshay_pachaar

RACO 论文获 ICML2026 Oral（Top 0.7%）：LLM 多目标微调冲突规避优化 – 提出反直觉的理论加速和更优 Pareto 前沿 @PeterLauLukCh

新预印本研究进化编码 Agent 演变过程 – 标题《What Do Evolutionary Coding Agents Evolve?》，论文与博客跟进 @maxzimmerberlin

InsForge Skills+CLI 优化 Claude Code：token 从 10.4M 降至 3.7M，成本 $9.21→$2.81 – 本地开源，通过 context engineering 实现 0 错误 @RodmanAi

⭐ Featured Content

1. Predicting AI job exposure

📍 Source: Ben Evans | ⭐⭐⭐⭐⭐ | 🏷️ Survey, 趋势判断, Insight, 反直觉观点

📝 Summary:

This essay challenges the popular quantified analysis of AI job exposure. By revisiting 100 years of automation in accounting and the internet's impact on media/record labels, Ben Evans makes three counterintuitive points. First, automation can increase jobs through price elasticity (Jevons paradox). Second, job content transforms over time — the same job title hides completely different work. Third, business models can be disrupted from below: your job might be safe, but the role your company depends on might not be. The real shock often comes from indirect paths no one predicted.

💡 Why Read:

If you're tired of the same "X% of jobs exposed to AI" charts, this is for you. Evans uses real history — not speculation — to show why those charts are misleading. The accounting example alone is worth the read: automation didn't kill bookkeepers, it created more of them. This is the kind of strategic thinking that's rare in daily AI chatter. Five minutes will change how you think about AI's impact on work.

2. Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

📍 Source: MarkTechPost | ⭐⭐⭐ | 🏷️ Tutorial, 工具使用, LLM, Infra

📝 Summary:

A hands-on tutorial for building a full Langfuse observability pipeline. It covers tracing with decorators, manual RAG tracing, prompt management, evaluation scoring, and dataset experiments — all with code examples. Good for developers who need a quick start with Langfuse, but the content is a compilation with limited depth.

💡 Why Read:

If you're using Langfuse (or planning to), this tutorial gets you up and running fast. The code examples are practical and cover the key features. But don't expect deep analysis — it's a straightforward walkthrough, not a strategic guide.

🎙️ Podcast Picks

E238｜聊聊Harness时代AI-First的组织架构：从信任人到信任AI

📍 Source: 硅谷101 | ⭐⭐⭐⭐⭐ | 🏷️ Agent, LLM, Product | ⏱️ 1:05:20

📝 Summary:

This episode dives deep into the Harness Engineering paradigm. The guest from CreaoAI shares how their agent system achieves 99% AI-written code with 3-8 daily production deployments. Key insights: AI-First means letting AI lead productivity, not just using AI tools. Organizational transformation hinges on trusting AI. Product managers may be replaceable. Junior engineers adapt faster, while senior engineers' core value shifts to spotting flaws in AI's planning and judging value. The discussion covers agent system design, feedback loops, and auto-bug-fixing — all from real production experience.

💡 Why Listen:

This is the most concrete AI-First engineering discussion I've heard. The guest doesn't just talk theory — they've built a system where AI writes 99% of code and it works. If you're building agent systems or thinking about how to restructure your team for AI, this is essential listening. The "trust AI" vs. "trust humans" framing is particularly sharp.

🐙 GitHub Trending

Aider-AI/aider

⭐ 0 | 🗣️ Python | 🏷️ LLM, DevTool, Agent

📝 Summary:

Aider is a terminal-based AI pair programming tool. It supports multiple LLMs (GPT-4, Claude, etc.) and can auto-edit code, run commands, and manage git commits. It understands your codebase context to help implement features, fix bugs, or refactor. Key features: automatic git management, multi-file editing, deep terminal integration.

💡 Why Star:

Aider is one of the most mature terminal AI coding assistants out there. It works with all major LLMs, is fully open-source, and can be self-hosted. If you spend time in the terminal and want AI to handle the grunt work, this is worth a try.

onyx-dot-app/onyx

⭐ 0 | 🗣️ Python | 🏷️ LLM, Agent, DevTool

📝 Summary:

Onyx is an open-source AI platform that provides smart chat with all major LLMs. It supports multi-model switching, context management, and a plugin system. Built for developers and enterprises, it can power custom AI assistants or customer service bots. The standout feature is its highly extensible plugin architecture and unified LLM interface abstraction, reducing integration costs.

💡 Why Star:

If you're building a multi-LLM application, Onyx solves the integration headache. The plugin system means you can extend it without forking. It's production-ready and immediately deployable — a solid foundation for any AI chat product.