AI infrastructure is heating up fast. OpenAI and Broadcom released Jalapeño, their first custom LLM inference chip, claiming 4x throughput and 5x energy efficiency over GPUs. Cursor is training a 1.5 trillion-parameter model from scratch on xAI's Colossus cluster — an app-layer company going full-st
AI hit multiple milestones today. OpenAI's GPT-5 cracked a three-year immunology mystery, while GPT-5.5-Cyber launched a "Patch the Planet" initiative to fix open-source bugs. Anthropic released Claude Tag, turning the assistant into a persistent Slack team member — Andrej Karpathy called it the thi
AI security took center stage today. OpenAI released GPT-5.5-Cyber with SOTA performance on CyberGym, while the Five Eyes intelligence alliance issued a rare joint warning that AI models could launch devastating cyberattacks within months. Cursor announced a partnership with SpaceX to train new AI m
AI infrastructure security took center stage today as researchers revealed AgentJacking — a novel supply-chain attack that exploits public Sentry keys to hijack coding agents like Claude Code and Cursor. Meanwhile, Tesla quietly filed a "MEGAPOD" trademark hinting at turning its Supercharger network
Google DeepMind dropped a bombshell with a 57-page ASI roadmap, formally defining Superhuman AI as output exceeding tens of thousands of top experts working for a decade. Meta AI released SAGE-OPD, a selective distillation framework that boosts agent task success rates by 13.3% — a practical fix for
The clearest narrative in 2026-W25: open-source model frontiers have shifted from catching up to running alongside closed-source models — and in some dimensions, surpassing them. Four models launched this week: GLM-5.2, DeepSeek-V4, Nemotron 3 Ultra, and Ling-2.6. Parameter counts range from 284B to 1.6T, all support 1M token context windows, and all are open-source. Community benchmarks and independent analysis report that these models now match GPT-5.5 and Opus 4.8 on knowledge work, coding, and scientific reasoning — and are cheaper. The second theme: Agent infrastructure is moving from scattered tools to platforms. Amazon Bedrock AgentCore Harness went GA — two API calls to deploy a production-grade Agent. Cursor launched Origin, a Git replacement designed for Agent workloads. Meanwhile, Agent evaluation methodology is shifting from aggregate leaderboards to predictive validity — an IBM paper directly challenges whether static leaderboards transfer to deployment scenarios. The third theme: micro-innovations in inference efficiency are accelerating. Pine AI proposes an editable/composable KV cache paradigm, reducing p90 TTFT by 53–398x. LMSYS used SGLang-JAX to optimize a 1T-parameter MoE model on TPUs, cutting prefill by 53%. Jeff Dean published the evolution of TPUs from v2 to Ironwood — 30x energy efficiency gains. The combination of hardware and algorithm innovations is making 1M token inference economically viable. Additionally, regulatory tensions escalated sharply this week — Anthropic restricted use of the Fable model, then the US Commerce Department imposed export license requirements on Fable and Mythos. Andrew Ng argues this will accelerate the AI sovereignty movement. Healthcare also saw multiple product-level advances, from rare disease diagnosis to full-body ultrasound CT.
This week's recommendation systems research clusters around three themes: full lifecycle co-design for large-scale graph retrieval, Transformer-based sequence modeling deployed across platforms, and a shift from DNN to Transformer-native architectures for multi-task ranking. Meta, Airbnb, Alibaba, Shopee, and NetEase Cloud Music all published online deployment work with specific AB metrics. Thread 1 (End-to-end design of large-scale graph systems): Meta's RankGraph-2 (Meta) couples graph construction, representation learning, and online serving into a joint optimization. On a billion-node graph, it reduces compute cost by 83%, achieves 3.8x the recall of GAT+Deep Graph Infomax, and lifts online CTR by +0.96% and CVR by +2.75%. Along the same line, HighLevel's ScoreGate (HighLevel) uses a statistical fusion of two scores to adaptively control the number of retrieved chunks in RAG. In production, it cuts tokens by 34.8% while maintaining recall between 97.77% and 99.34%. Thread 2 (Generative recommendation moves from theory to production): Airbnb's JourneyFormer (Airbnb) deploys a Transformer-based sequence model in search ranking to handle long, sparse user behavior. Alibaba's OneBar (Alibaba) uses an end-to-end generative framework for video e-commerce query recommendation, achieving a 21.67% GMV lift. Both point to the same direction: generative recommendation needs engineering trade-offs under real constraints (cold start, latency, sparse labels) rather than chasing offline metrics alone. Thread 3 (Transformer-native paradigm for multi-task ranking): Shopee's OneRank (Shopee) eliminates the encoder-predictor separation, embedding task-private channels and gradient isolation inside the Transformer. Online CTR is up +1.2%, CVR +0.8%. NetEase Cloud Music's PIANO (NetEase Cloud Music) uses a learnable [CLS] token for list-level multi-objective re-ranking, lifting CTR by +0.62% and CVR by +4.45%. Both demonstrate that internalizing multi-objective reasoning into the Tr
AI hit a major inflection point today. DeepSeek dropped DeepSeek-V4, a 1.6T MoE model that slashes long-context costs by 3.7x and beats GPT-5.4 — all open-source. Meanwhile, Subquadratic claims to have cracked the O(n²) attention bottleneck, and GLM-5.2 is now the first open model that independent d
AI hit multiple inflection points today. Anthropic's Claude Opus 4.7 autonomously controlled a robot 20x faster than humans, while Qualcomm is reportedly acquiring Tenstorrent for $8-10B to challenge NVIDIA's inference dominance with RISC-V. Noam Shazeer — one of the "Attention is All You Need" auth
AI hit multiple inflection points today. Noam Shazeer, co-author of the original Transformer paper, left Google for OpenAI — a decade-long pursuit finally realized. Vercel launched its eve agent framework with a full stack of components, while AWS and Hugging Face both unveiled critical agent infras
Today marks a seismic shift in AI infrastructure and industry structure. SpaceX acquired Cursor for $60B in the largest startup M&A of 2026, signaling AI coding tools have become critical infrastructure. On the model front, Zhipu AI open-sourced GLM-5.2 (744B params, MIT license) topping the Artific