AI Tech Daily - 2026-05-26

AI hit major milestones today: OpenAI and Google DeepMind both cracked decades-old Erdős math problems — the first time AI has made such a fundamental mathematical breakthrough. On the efficiency front, HRM-Text trained a SOTA 1B model for just $1,500, challenging the scaling law orthodoxy, while De

AI Tech Daily - 2026-05-25

Today's report covers a mix of big-picture strategy and hands-on tools. The standout is Ben Evans' deep dive on AI job exposure, which challenges the popular "exposed or not" charts with historical data and counterintuitive logic. On the ground, we see real cost pain: Microsoft banned Claude Code fo

AI Tech Daily - 2026-05-24

Today's AI landscape is dominated by a single, loud signal: every major model lab is pivoting to become an agent lab. From OpenAI's subtle shift to DeepSeek's new "Harness" team, the race is no longer about the best model — it's about the best agent system. We also see a flurry of open-source releas

AI Weekly 2026-W21

Only one narrative thread matters for 2026-W21: agents have formally shifted from "model capability" to "system infrastructure." Google I/O 2026 was the explosion point — Gemini 3.5 Flash packages "frontier intelligence + action" into an API that runs 4x faster at half the cost, Managed Agents lets developers define agents in YAML and deploy into a cloud sandbox, and Antigravity pushes agents into the desktop and background. But Google isn't alone: Qwen3.7-Max landed the same week with 35-hour autonomous execution, Daytona's sandbox infrastructure hits 850k runs per day, and IBM/Hugging Face's Open Agent Leaderboard evaluates full agent systems for the first time, not just models. Three signals point to the same judgment — agents are climbing the infrastructure steep from demo to deployment. The framework layer (Langflow, Multica, 12-Factor Agents) tackles orchestration and observability, the sandbox layer (Daytona, Alibaba Cloud AgentRun, AWS blog solution) handles security and state management, and the evaluation layer (Open Agent Leaderboard, Cameron Wolfe guide) answers "how do I know my agent is good?" Meanwhile, NVIDIA, Together AI, Amazon, and other labs released a dense set of training/inference optimization papers — IXT, Dynatrain, CODA, DualKV — that push efficiency boundaries at the system level. The second thread: autonomous scientific discovery moves from academic speculation to verifiable results. An OpenAI model autonomously solved a discrete geometry conjecture posed by Erdős in 1946 for the first time — Sam Altman called it "a big milestone." Meta FAIR's AIRA system had agents autonomously design neural network architectures that outperform Llama 3.2. These events are few but high-signal: not "AI assists scientists," but "AI as discoverer." One bottom-layer warning this week: the ROPE mechanism's limitations in long contexts were formally proven (arxiv) by UIUC & Amazon AGI, suggesting the current positional encoding paradigm may need fundamental re

RecSys Weekly 2026-W21

This week in recommendation systems research clusters around three technical fronts: generative recommendation moves from "proving feasibility" to "industrial deployment and optimization," debiasing and calibration shift from single methods to fusion frameworks, and search/retrieval systems make concrete advances in cold start and heterogeneous acceleration. Generative recommendation enters the industrial deep end: Four deployment papers from Kuaishou, Tencent, and Meituan cover core pain points — reasoning enhancement (RPORec), long-term interest modeling (GenLI), and world knowledge integration (LWGR). The common thread: the core question for generative recommendation has shifted from "can it work?" to "how do we stably and controllably replace or augment the traditional pipeline?" Debiasing and calibration moves from "correcting the mean" to "governing the distribution." ByteDance's PEARL, Kuaishou's DADF, and Pinterest's PRL-PUTS each deliver production-grade solutions from contrasting perspectives: percentile comparison, residual correction, and utility weight tuning. PEARL's Watch Duration +2.10% and DADF's time spent +0.347% show that distribution-level bias correction still has substantial headroom. Search retrieval systems focus on cold start and system efficiency. Taobao's GrowthGR (new item GMV +5.3%) and Airbnb's synthetic data framework (query length KL divergence down to 0.66) demonstrate the engineering potential of LLMs + counterfactual inference for cold start. HUAWEI and JD.com's Ascend-RaBitQ pushes NPU acceleration for billion-scale vector search to 4.6x, setting a new hardware-algorithm co-optimization baseline for large-scale retrieval.

AI Tech Daily - 2026-05-23

Today's report covers 8 articles (5 featured), 19 KOL tweets, 2 GitHub projects, and 2 podcast episodes. The big theme: specialization is beating scale — from a 3B model outperforming frontier APIs in OCR to diffusion models offering 6.5x speed gains over autoregressive generation. Meanwhile, AI's h

AI Tech Daily - 2026-05-22

Today's AI landscape is dominated by Agent infrastructure — from how to provision compute for agents, to building multi-agent systems, to the economic models of an agent-driven web. We cover 19 articles (5 featured), 5 GitHub projects, 4 podcast episodes, and 30 KOL tweets. The big theme: agents are

AI Tech Daily - 2026-05-21

Google I/O 2026 dominates today's coverage, with Gemini 3.5 Flash, Omni, and Antigravity 2.0 leading the pack. But the real story is deeper: AI agents are reshaping everything from cloud infrastructure (Railway's "Agent-Native Cloud") to research workflows (Karpathy's autoresearch). On the research

AI Tech Daily - 2026-05-20

Today's AI landscape is dominated by Google's massive I/O 2026 announcements, with the Gemini 3.5 series, Managed Agents, and Gemini Omni marking a clear shift toward agentic AI. The big picture: Google is betting big on agents that can act, not just think. Meanwhile, the open-source ecosystem respo

AI Tech Daily - 2026-05-19

Today's AI landscape is dominated by two big themes: Agent evaluation is getting serious, and Agent infrastructure is going mainstream. We've got 18 articles total, with 5 featured in depth. The standout is the Open Agent Leaderboard from IBM & Hugging Face — a 5-star resource that finally benchmark

AI Tech Daily - 2026-05-18

Today's AI landscape is dominated by the agent economy going mainstream. On-chain data shows Venice AI pulling in $835K monthly revenue, while x402 protocol has processed 47 million agent-to-agent transactions. Meanwhile, OpenAI is restructuring around an "agentic future," and Vercel Labs launched a

RecSys Weekly 2026-W20

This week's recommendation systems research breaks down along three technical fronts: generative recommendation architectures moving from tokenizer optimization to inference efficiency; LLM-enhanced recommendation evolving from isolated auxiliary modules to agents with memory and reasoning; and system-level quantization and thread orchestration emerging as the real bottleneck for production deployment. Theme 1 "Decoupling and Acceleration in Generative Recommendation": Alibaba deployed CQ-SID / EG-GRPO on TmallAPP, using category-aware semantic IDs and expert-guided reinforcement learning to achieve +1.15% GMV, with generative retrieval contributing 72.63% of purchases. Tencent and Tsinghua's AsymRec proposed an asymmetric continuous-discrete framework that replaces symmetric quantization with multi-expert projections, averaging 15.8% improvement. Meituan's DIG embeds the tokenizer into a discriminative ranking model for end-to-end training, improving both retrieval and ranking. Snap's SID-MLP distills the Transformer decoder into an MLP, achieving 8.74x speedup with no loss in accuracy. The common thread: generative recommendation is transitioning from "can run" to "runs stably and fast," with the core tactic being decoupling input/output representations and replacing overly dense structures. Theme 2 "LLM Recommendation Toward Reasoning and Memory": Microsoft Research's PGR introduced look-ahead guided retrieval, using Tree-of-Thought to expand query steps, achieving nearly 3x recall improvement on MemoryQuest. Meituan's RecRM-Bench provides 1 million structured entries covering four reward dimensions (instruction following, fact consistency, etc.) for agent-based recommendation systems. SDAR (Meituan) uses gated auxiliary objectives to stabilize On-Policy Self-Distillation (OPSD), outperforming GRPO by 7–10% on ALFWorld, Search-QA, and WebShop. The difference: PGR focuses on look-ahead reasoning before retrieval; SDAR focuses on training stability. But the shared

1
...
45678
...
15