深度网络依赖LayerNorm(RMSNorm),这创造了局部的尺度不变性(Scale Invariance),它带了独特的梯度动力学(Gradient Dynamics)。在这个独特的动力学场域中,我们关于机器学习的直觉被颠覆了,Norm的物理含义从特征强度表示变成了学习进度的旋钮,Norm理论上稳步增加,SGD自带学习率衰减,但是刹车踩的太狠导致了学习的早停,而Weight Decay从正则化项进化为有效学习率的动态调节阀。AdamW如何成为标配:Adam做到了梯度的步长恒定,有效学习率的平缓刹车;Warmup来处理训练早期的权重过小(梯度爆炸)和二阶矩估计不准的问题;AdamW修正了L2正则的问题,引入Weight Decay,把“方向更新”和“进度控制”拆成两个干净的旋钮。
从精排切换成深度学习以来,工业界一直会把排序的模型结构研究切分成基本的两部分,序列处理和特征交叉,甚至有一些公司的排序组,下面都拆成两个Team分别处理行为序列和特征交叉。从最早的时候,比如序列用DIN来处理,序列就被压成了一个或多个向量表征,再参与与其他特征的交叉。我们可以理解成MLP(concat(DIN, Features)),发展到今天大多数的模型研究,还是分立地把MLP换成DCN,增加个LHUC,复杂化为Rank Mixer或Transformer,把DIN叠加MHA,直接换成Transformer,可以写成RankMixer(concat(Transformer, Features))。 从MLP(concat(DIN, Features))到RankMixer(concat(Transformer, Features)),本质没有变,就是序列处理和特征交叉是一个隐式的两阶段处理,序列被压缩到Vector Space才和特征发生交叉。而LLM的有趣之处,就是在Next Token Prediction利用到的交叉发生在词序列的Token Space之中,它能启发推荐排序模型的,就是每一个特征的交叉应该发生在用户序列的Token Space之中。
This week in AI centers on a single core narrative: capability breakthroughs at the massive infrastructure layer are accelerating the shift from lab to production. OpenAI dropped two bombs on the same day — its in-house inference chip Jalapeño and GPT-5.6 Sol — covering the full stack from hardware to model. These aren't isolated launches; they're coordinated moves up and down the stack: the chip optimizes inference cost, the model pushes the capability ceiling, and both share the same infrastructure. The second thread is Agent engineering moving from experiments to production governance. Stripe published a real-world case on financial compliance agents, AWS posted three consecutive blogs on MCP agent layers and data governance, and GitHub shared benchmarking data on Copilot's agentic harness. Meanwhile, Anthropic's Claude Slack Tag positions the LLM as a persistent organizational member — Karpathy called it "the third major LLM UI/UX design paradigm." Agents are no longer one-shot conversations but continuously running roles inside companies. The third thread is post-training evolving from manual exploration to automated, systematic processes. Amazon released A-Evolve, achieving autonomous post-training on a 30B model with no human intervention. OpenAI verified that beneficial-behavior RL generalizes out-of-distribution durably. Qwen's landmark language world model provides a scalable training environment for agent RL. These works collectively signal: RL is no longer just a fine-tuning step after SFT — it's becoming the main engine for expanding model capabilities.
Of the 12 papers this week, industrial deployments dominate — 8 come from first-tier platforms like YouTube, TikTok, Kuaishou, Tencent, and Walmart, all with online A/B experiment metrics. Research clusters around three overlapping directions: generative recommendation with LLM augmentation, GPU acceleration for large-scale retrieval, and industrial system architecture and attribution optimization. Generative recommendation moves from "generating item IDs" to "generating physical items": Kuaishou's RaG unifies generative recommendation with video generation, achieving +1.87% ad revenue on a 400M DAU platform. YouTube's TokenMinds extends Semantic ID from the item side to the user side, producing both discrete user tokens and dense embeddings, covering full user traffic. Both routes point to the same judgment — generative recommendation is moving from offline consistency verification to online revenue realization. User modeling accelerates its shift from dense vectors to discrete semantic IDs: Kuaishou and YouTube published SID-based frameworks almost simultaneously. This isn't just a change in representation form — it means that the underlying token space of recommendation systems is beginning to align with that of the LLM world, substantially lowering the cost of cross-scenario unification (short-form video / long-form video, recommendation / advertising). Industrial attribution and scaling methodology move toward precision: TikTok's Attribution Correction Framework aligns causal experiments with daily production attribution, reducing measured cannibalization by roughly 15 percentage points. Tencent's NOVA uses an agent to automate architecture evolution, achieving +2.02% GMV on L3 tasks online. Kuaishou's UniFormer proposes a model-centric scaling framework that explicitly decomposes the modeling space into feature and task dimensions. Together, these three reveal a pattern: as model architectures converge, engineering automation and measurement accuracy become th
AI infrastructure hit new milestones today: Microsoft's $7.3B Fairwater campus links hundreds of thousands of Blackwell GPUs into a single supercomputer via 800G Ethernet. DeepSeek V4's DSpark framework slashes inference latency by 80% with full-stack open source, while SubQ's dynamic sparse attenti
A massive day in AI: OpenAI previewed GPT-5.6 Sol with a new architecture and 1M context, but the release was held back by the Commerce Department requiring per-customer approval — a regulatory first that could reshape how frontier models ship. Meanwhile, GLM-5.2 became the first open-weight model t
Agent infrastructure funding hit new highs today: Sail raised $80M for long-running agent inference, and PimDeWitte closed $320M at a $2.3B valuation for world model data. SWE-bench Pro replaced the compromised SWE-bench Verified, while OpenAI's economic report revealed Codex consumes 99.8% of its o
AI infrastructure is heating up fast. OpenAI and Broadcom released Jalapeño, their first custom LLM inference chip, claiming 4x throughput and 5x energy efficiency over GPUs. Cursor is training a 1.5 trillion-parameter model from scratch on xAI's Colossus cluster — an app-layer company going full-st