RecSys Weekly 2026-W23
2026-6-6
| 2026-6-6
字数 3796阅读时长 10 分钟
type
Post
status
Published
date
Jun 6, 2026 07:03
slug
rec-weekly-en-2026-W23
summary
This week's research in recommendation systems falls along three technical threads. Thread 1: Generative recommendation moves from functioning to stability — semantic IDs and reasoning become the industrial focus. Pinterest's UniPinRec unifies retrieval and ranking end-to-end (online engagement +1%, latency -11.1%), pushing generative recommendation beyond just retrieval. Kuaishou's OneReason (online deployment) reveals why reasoning mode fails in generative recommendation — missing both perception and cognition factors — and proposes a three-level CoT format plus specialized-unified training. Both point to the same conclusion: the core bottleneck in generative recommendation has shifted from model architecture to data format (semantic IDs) and system coordination. Thread 2: Cross-domain cold start moves from feature transfer to learning transfer — LLMs as cross-domain bridges begin large-scale deployment. Kuaishou's RGCD-Rep (serving 400M+ users) uses MLLM reasoning distillation to transfer short-video user interest to live streaming, with significant cold-start engagement gains. Meta's Quantizing Intent paper (online AUC +1.522% for cold start) quantifies organic feed behavior into semantic IDs for ad ranking, proving that behavioral richness determines cross-domain transfer quality. Both reveal that the key to cross-domain transfer isn't aligning features — it's building transferable semantic representations. Thread 3: LLM/Agent-enhanced recommendation moves toward industry differentiation — from general retrieval to deep adaptation in vertical scenarios. Li Auto's HPRO (132-day A/B, sales +9.5%) introduces preference optimization for lead scoring, solving sparse supervision and funnel hierarchy. Kuaishou's Taiji (CTR +12.4%, revenue +15.2%) proposes Pareto-optimal policy optimization, finding the optimal trade-off between semantics and IDs. Syft's DynTree (survival rate improved 1.5x) uses offline agent tree-building plus online lightweight subtree selection for
tags
Recommendation Systems
Weekly
Papers
category
Rec Tech Report
icon
📚
password
priority
1

Weekly Overview

This week's research in recommendation systems falls along three technical threads.
Thread 1: Generative recommendation moves from functioning to stability — semantic IDs and reasoning become the industrial focus. Pinterest's UniPinRec unifies retrieval and ranking end-to-end (online engagement +1%, latency -11.1%), pushing generative recommendation beyond just retrieval. Kuaishou's OneReason (online deployment) reveals why reasoning mode fails in generative recommendation — missing both perception and cognition factors — and proposes a three-level CoT format plus specialized-unified training. Both point to the same conclusion: the core bottleneck in generative recommendation has shifted from model architecture to data format (semantic IDs) and system coordination.
Thread 2: Cross-domain cold start moves from feature transfer to learning transfer — LLMs as cross-domain bridges begin large-scale deployment. Kuaishou's RGCD-Rep (serving 400M+ users) uses MLLM reasoning distillation to transfer short-video user interest to live streaming, with significant cold-start engagement gains. Meta's Quantizing Intent paper (online AUC +1.522% for cold start) quantifies organic feed behavior into semantic IDs for ad ranking, proving that behavioral richness determines cross-domain transfer quality. Both reveal that the key to cross-domain transfer isn't aligning features — it's building transferable semantic representations.
Thread 3: LLM/Agent-enhanced recommendation moves toward industry differentiation — from general retrieval to deep adaptation in vertical scenarios. Li Auto's HPRO (132-day A/B, sales +9.5%) introduces preference optimization for lead scoring, solving sparse supervision and funnel hierarchy. Kuaishou's Taiji (CTR +12.4%, revenue +15.2%) proposes Pareto-optimal policy optimization, finding the optimal trade-off between semantics and IDs. Syft's DynTree (survival rate improved 1.5x) uses offline agent tree-building plus online lightweight subtree selection for time-sensitive news retrieval. These works show that LLM applications in recommendation are moving from general-purpose solutions to scenario-specific customization.

Generative Recommendation and Semantic IDs: Alignment, Encoding, System Coordination

This week's generative recommendation output is dense: industrial deployment papers cover everything from semantic ID generation to end-to-end system deployment, while academic work offers new insights into time-aware diffusion and encoder design.
Unified retrieval and ranking: UniPinRec's one model, two stages
UniPinRec (Pinterest) — This is the first work to unify the full stack of retrieval and ranking within an industrial system. Previously, retrieval and ranking each trained separate models. Though they shared a Transformer architecture, input formats, training processes, and serving stacks were independent. UniPinRec uses a single shared Transformer to encode user behavior sequences, then branches into two task heads: retrieval (ANN dot product) and ranking (cross-attention). Three key techniques make it work: (1) Masked Action Modeling (MAM) eliminates sequence interleaving, enabling weight sharing without doubling the context length; (2) mixed training samples pair behavior sequences with feed view impression lists, satisfying both objectives; (3) cross-stage KV cache sharing reuses user history computed during retrieval directly for ranking, reducing total FLOPs compared to serving two separate models. Online results: engagement ~+1%, end-to-end latency down 11.1%, QPS up 63.6%. In contrast, prior work like GRank only unified candidate generation for generative retrieval, and DualGR focused on long/short interest models — neither achieved full-stack unification.
Why reasoning mode fails: OneReason's perception-cognition framework
OneReason (Kuaishou) — Kuaishou previously released OneRec-Think and OpenOneRec, but all experiments showed reasoning mode (think first, then answer) is no better than non-reasoning mode. OneReason finds the root cause: effective reasoning requires both perception (grounding tokens to semantics) and cognition (reorganizing behavior sequences into coherent interest points). It proposes a three-level cognition-enhanced CoT: item-level, interest-level, and task-level. During pre-training it does strong perception learning (item-textual alignment), then uses specialized-unified training recipes in the RL stage. The system is deployed online across multiple businesses — short video, live streaming, ads, e-commerce — and open-sourced 8B/0.8B models. Related work OneRec-Think already demonstrated that reasoning is useful in generative recommendation, but OneReason provides a more detailed causal analysis.
SID encoder problem: PrefixMem makes LLMs see semantic IDs like images
PrefixMem (Pinterest) — Semantic IDs (SIDs) are discrete hierarchical codes for items. Their hierarchical structure means the meaning of higher-level tokens depends on the prefix context. But existing systems directly add SID tokens to the vocabulary, forcing the LLM to learn these context dependencies from scratch. PrefixMem draws inspiration from visual encoders in multimodal LLMs. It designs a lightweight SID encoder based on a prefix n-gram memory table, providing structured, prefix-conditioned representations for each SID token. On Pinterest's large-scale data, deepest-level SID accuracy improves by 46%, and full-SID retrieval recall improves by 22%. This suggests that SID, as an independent modality, needs a dedicated encoder — similar to how vision needs ViT and audio needs Whisper encoders. Earlier work like LETTER only focused on quantization quality with RQ-VAE regularization, not on how LLMs use SID structure.
Time-aware diffusion: TDPM makes diffusion recommendation respect time
TDPM — Diffusion models applied to recommendation (e.g., DiffRec, DreamRec) typically treat all items in a user's history equally. But user preferences are time-dependent: old items and recent ones contribute differently to current decisions. TDPM decouples user preferences into periodic preferences (long-term stable) and point preferences (recent triggering events), then applies time-aware noise scheduling to SID tokens during the diffusion process. On three datasets including Amazon Beauty, HR@20 improves on average 29.21% and NDCG@20 improves 25.45%. This is similar to DSIN which splits interests by session, but TDPM integrates it into a diffusion framework.
SID quality diagnosis: DRQ quantization framework and Shopee case study
DRQ (Shopee) — It's hard to diagnose why semantic IDs fail: is it low codebook utilization? Unstable decision boundaries? Or geometric distortion in the embedding space? This paper proposes two diagnostic metrics — expected codeword overlap and effective codebook capacity — and uses them to explain RQ-VAE failure modes. Based on that, it proposes Decoupled Residual Quantization (DRQ), which decouples continuous geometric reconstruction from discrete distribution matching. On Shopee's industrial dataset, DRQ outperforms RQ-VAE across three metrics: symbol robustness, reconstruction fidelity, and behavior-aware soft matching. As a case study, it provides the semantic ID community with a diagnostic tool — more interpretable than GateSID's adaptive gating.
Query-supervised adversarial quantization: DSIRM's hierarchical prefix matching
DSIRM (Alibaba) — Existing SID generation relies on unsupervised quantization, which cannot ensure that items with similar query intent share the same SID. DSIRM injects query-item interaction supervision into residual quantization. It uses query-bridged contrastive quantization to make quantized SIDs query-aware in their semantic partitioning. It also uses an LLM to explicitly predict item SIDs from query text, handling tail queries and ambiguous intent. On Tmall production data, offline AUC +1.54%, online UCTR +0.13%, UCTCVR +0.25%. This continues the RQ-VAE regularization thread from LETTER but adds query-level supervisory signals.
  • Takeaway: Generative recommendation is entering deep water. UniPinRec's full-stack unification and OneReason's perception-cognition framework provide industrial deployment templates. But the encoder design (PrefixMem) and diagnostic tools (DRQ) show that foundational components still need polishing.
  • What to watch: Can PrefixMem's gains replicate across more LLM families? Will UniPinRec's unified paradigm become the standard architecture for the next generation of recommender systems?

Cross-Domain Recommendation and Cold Start: Semantic Transfer Becomes Core Capability

This week's cross-domain recommendation papers don't emphasize models themselves. Instead, they focus on "how to use semantic signals to bridge domain gaps."
Short video to live streaming cross-domain bridge: RGCD-Rep's reasoning distillation
RGCD-Rep (Kuaishou) — Short video has dense behavioral data; live streaming is a core conversion scenario but data-sparse. RGCD-Rep uses a frozen teacher MLLM to generate structured cross-domain reasoning knowledge (e.g., "user frequently likes pet videos → may be interested in pet products in live streaming"), then distills it into a lightweight student MLLM. It then decomposes item representations into transferable representations and domain residuals — the transferable part is shared across domains, the domain residual captures domain-specific signals. After offline computation, these representations are integrated into downstream retrieval tasks. The system serves 400M+ users in Kuaishou's live streaming recommendation, with significant A/B test improvements on core metrics. Related work SemaCDR also uses LLMs for cross-domain semantic transfer, but RGCD-Rep's decomposition strategy and industrial deployment are new.
Asymmetric graph architecture: Shallow-RHS enables "instant embedding" for cold-start content
Shallow-RHS (Tubi) — Streaming platform Tubi has hard constraints on cold-start content: new content must immediately have independent embeddings for ANN retrieval, and device embeddings also need to be suitable for nearest neighbor search. Shallow-RHS constructs an asymmetric link prediction graph architecture: the left side (device) uses time-sensitive viewing history message passing to capture collaborative signals; the right side (content) is intentionally "shallow" — no ID embeddings, no subgraphs, no neighborhood aggregation — it only encodes intrinsic features (title, description, category, etc.). After training, the content encoder can continuously generate embeddings for new content (implicit graph completion). This pattern extends to device cold start by using demographic features to build group embeddings. In online experiments, content cold-start engagement improves by 3.5% and device cold-start engagement by 4.2%. It inherits the heterogeneous graph approach from Personalized Audiobook Recommendations at Spotify, but the asymmetric design is simpler.
Synthetic data-driven cross-domain transfer: SCALR
SCALR (Amazon) — This paper models cross-domain event transfer as synthetic data generation: given user events in a source domain, generate the likelihood of user interaction with target domain items. This step essentially estimates a conditional probability. Downstream models train on these synthetic events as a model-agnostic cross-domain learning objective. Online A/B tests show statistically significant improvements. This is one of the few works bringing the "synthetic data" idea from the LLM domain into cross-domain recommendation, complementary to Unified Supervision for Walmart's graded label approach for positive/negative samples.
Behavioral semantic ID quantization for ad ranking: RQ-FSQ and Hierarchical Discrete Embedding
Quantizing Intent (Meta) — Core finding: behavioral embeddings from organic feed activity carry much stronger cross-domain transfer signals (AUC +0.213%) than user profile text (+0.036%) or activity fine-tuned LLM embeddings (+0.107%). But behavioral embeddings are high-dimensional and storage-heavy. The authors propose RQ-FSQ (Residual Finite Scalar Quantization) to quantize pre-trained embeddings, matching dense embedding AUC at 30x storage compression. They then introduce a Hierarchical Discrete Embedding module that trains multi-level SIDs end-to-end using a prefix n-gram sparse embedding table. In Meta's ad ranking system, cold-start user AUC improves by +1.522%. Unlike GateSID's semantic-collaborative alignment approach, this work compresses behavioral signals through quantization density and hierarchical prefixes.
  • Takeaway: The decisive factor in cross-domain cold start has shifted from "aligning features" to "building transferable semantic representations" — RGCD-Rep uses MLLM reasoning distillation, Meta uses behavioral quantized SIDs. Both directions emphasize signal quality over quantity.
  • What to watch: Can RQ-FSQ's quantization quality hold in a more general multi-domain setting? Will SCALR's synthetic data generation become a standard component for cross-domain recommendation?

LLM/Agent-Enhanced Retrieval and Ranking: Industry Differentiation Accelerates

This week, two trends emerge in LLM application to recommendation: first, LLMs as enhancers deeply integrated with ID systems (Taiji); second, LLMs adapted to unique vertical needs (HPRO's sales leads, DynaTree's news retrieval).
Pareto-optimal semantic-ID trade-off: Taiji's POPO
Taiji (Kuaishou) — There's an inherent conflict between the semantic space of LLMs and the ID space of recommender systems during RL alignment: semantic rewards (e.g., content understanding) and recommendation preference rewards (e.g., CTR) often pull in different directions. Taiji proposes Pareto Optimal Policy Optimization (POPO), which adaptively adjusts cross-domain reward weights and theoretically guarantees reaching Pareto optimality. In the SFT stage, it uses reverse engineering reasoning and open rejection sampling to generate high-quality, domain-specific CoT data. Deployed on Kuaishou's ad platform (400M+ DAU), results show CTR +12.4%, revenue +15.2%. Compared to OneRec-Think's reasoning framework, Taiji's key innovation in the RL stage is theoretically proving the trade-off boundary between semantics and IDs.
Hierarchical preference ranking optimization: HPRO for sales lead scoring
HPRO (Li Auto) — Car sales lead scoring differs fundamentally from e-commerce recommendation: decision cycles are long (months), the funnel has multiple stages (test drive → order → delivery), and supervision is sparse (only a few "deal" labels). HPRO builds on an LLM discriminative framework. It uses a margin-aware Bradley-Terry formula to convert sparse binary labels into dense, funnel-aware preference pairs, leveraging both pointwise and pairwise supervision. On Li Auto's data, AUC reaches 0.8161, top-leads precision improves 39.7%, and a 132-day online A/B test yields a 9.5% sales lift. This continues the preference optimization thread from DPO, but adapts it to the multi-level funnel structure.
Agent tree-building + online selection: DynaTree for time-sensitive news retrieval
DynaTree (Syft) — The core pain point of existing agent RAG in news retrieval: every query requires semantic expansion to reasoning iterations, which takes too long and can't adapt to news timeliness. DynaTree decouples this process: offline, multiple agents collaborate to build a reusable retrieval tree (materializing the query theme's semantic space); online, only lightweight subtree selection is needed (using a temporal localization evaluation agent to assess each subtree), without rerunning agent reasoning. In Syft's production system online A/B test, survival rate improves from 0.32-0.53 to 0.59-0.73, consistently outperforming existing retrievers. Compared to Search-P1's path-centric reward, DynaTree focuses more on adapting to temporal changes.
Rejection signals as a resource: R3 for agent skill routing
R3 (Tencent) — Agent skill retrieval differs from document retrieval: not only must each query-skill pair be individually relevant, but the selected set of skills must work together to complete a task (skill compatibility). R3 uses the LLM's own rejection signals as "compatibility" supervision (when the LLM refuses to use a set of skills, it means they are not suitable to retrieve together). It builds the R3-Skill bilingual benchmark (10,246 skills, 41,592 queries, 32,828 rejection annotations) and explicitly trains skill compatibility in a two-stage system (R3-Embedding + R3-Reranker). It significantly outperforms traditional retrievers like BM25 and DPR. This differs from Toolformer's tool-use approach, focusing on the retrieval stage rather than the reasoning stage.
User state prefixes: TAP-PER achieves 130x parameter compression
TAP-PER (Microsoft) — LLM personalization either retrieves user history to build a prompt (depends on retrieval quality) or stores an independent adapter per user (storage grows linearly). TAP-PER uses two lightweight prefixes to learn user state and query condition representations, replacing explicit prompt construction and independent adapters. It outperforms baselines like RAG and OPPU across all six LaMP tasks. Parameters per user are 130x fewer than OPPU, and total parameters are halved compared to PER-PCS. This inherits the Prefix Tuning idea but applies it to recommendation personalization.
Building e-commerce attribute systems from scratch: BEATS deployed at Rakuten Taiwan
BEATS (Rakuten Taiwan) — E-commerce platforms in emerging markets often only have category hierarchies, lacking structured attribute systems (e.g., "material: cotton"). BEATS uses multi-stage LLM generation plus human verification to iteratively build attribute systems from scratch. At Rakuten Taiwan, it covers 9 major categories, 2,694 subcategories, and 67,277 attributes, with 5.4M+ products annotated. The generated attribute labels are used directly in dense retrieval and ranking models, outperforming the original catalog. Similar to EviSnap's facet cards approach, but BEATS focuses more on large-scale production pipelines.
Data-centric numerical reasoning: DCRC for financial QA
DCRC (Tencent) — LLM hallucinations on numerical reasoning in financial QA. DCRC starts from data: it constructs adversarial data (with controlled noise), trains a data-centric orchestration agent (generates verifiable reasoning programs from queries plus documents), then compiles and executes. On the FinQA benchmark, accuracy improves by 12.4%. It's deployed in Tencent's Yuanbao financial QA system. Unlike CoT, DCRC uses program synthesis to ensure auditability.
Inference-free sparse multimodal retrieval: V-SPLADE
V-SPLADE (NAVER) — Visual document retrieval typically requires VLM encoding for queries (high latency) or OCR + BM25 (low quality). V-SPLADE uses caption-gated token supervision so that visual sparse representations learn to activate retrieval-relevant vocabulary dimensions. During training, captions generated by a VLM serve as lexical cues; during inference, no encoding is needed (pure sparse index). On an 18.7M document corpus, R@5 is twice that of dense retrieval at the same scale. This extends SPLADE to the multimodal domain.
Adaptive retriever combinations: Retriever Portfolios
Retriever Portfolios (Google Research/EPFL) — Facing heterogeneous queries (factual to multi-hop reasoning), no single retriever covers everything. This method uses an expected best-of-k optimization objective to automatically select a small subset of diverse retrievers from a large pool. It outperforms single retrievers and naive multi-retriever setups on multiple QA benchmarks, and supports parallel retrieval to reduce latency. Unlike Adaptive-RAG, it fixes the set of retrievers rather than dynamically adjusting retrieval strategy.
  • Takeaway: LLM/Agent applications in recommendation are moving from "general retrieval enhancement" to "industry differentiation" — car sales leads require funnel-aware ranking, news retrieval needs time sensitivity, and ad ranking demands Pareto-optimal trade-offs between semantics and IDs.
  • What to watch: Can Taiji's POPO theory generalize to more multi-objective scenarios? Can HPRO's hierarchical preference optimization be applied to e-commerce multi-stage funnels (exposure → click → purchase)?

Sequence Recommendation and Representation Learning: Long Tail, Sparsity, Multi-Behavior

This week's academic work on sequence recommendation tackles three problems: heterogeneity in long-tail signals, multi-rate temporal scaling, and multi-behavior noise.
Long-tail problem in black-box distillation: BAHSD's adaptive hierarchical distillation
BAHSD — Sequence recommenders are often deployed as black-box APIs. External teams want to replicate their capability through knowledge distillation. But under long-tail distributions, head sequences (dense behavior) suffer from teacher preference solidification, while tail sequences (sparse behavior) yield flat noisy predictions. BAHSD proposes multi-scale consistency probing to automatically quantify signal reliability. For high-confidence signals, it uses dynamic temperature KL divergence (mitigating solidification); for low-confidence signals, it uses ranking consistency and InfoNCE contrastive learning (noise-robust). On three datasets, average improvement 4.98% over teacher; tail user improvement over 80%. This follows the UnKD distillation paradigm but solves the signal heterogeneity problem.
Multi-rate temporal aggregation: MARS's density-adaptive dual encoder
MARS — Transformers in sequence recommendation use positional self-attention; state space models use a single implicit decay. Neither explicitly models multi-scale temporal structure. MARS is an encoder-agnostic aggregation operator: it generates K summaries of different time scales from real timestamps and fuses them with context-adaptive gating. Most interesting observation: sparse data favors Transformers (MARS-T), dense data favors Mamba (MARS-M). The model automatically selects based on the average sequence length in the training set. On five benchmarks, HR@10 is best in all cases, with average gain +19.7%. MARS-M on ML-1M uses 42% fewer FLOPs than SIGMA.
Metric space reasoning: MeRa validates the necessity of spatial constraints
MeRa — In spatial prediction (e.g., next location recommendation), does latent reasoning help? Experiments show that without metric space priors, latent reasoning actually hurts performance. MeRa introduces a lightweight module based on distance-aware attention modulation, explicitly converting coordinate distances into biases. On the GETNext backbone, NDCG@10 gap between reasoning with vs. without metric bias is 4.5%. The paper also proves that reasoning under metric space constraints converges to a unique fixed point. Experiments on three spatial prediction benchmarks (Gowalla, Foursquare, WeChat) achieve best results.
Multi-behavior spectral filtering: SpectraMB's debiasing approach
SpectraMB — Multi-behavior recommendation (e.g., modeling click, favorite, add-to-cart simultaneously) must handle two types of heterogeneity: intra-behavior representation entanglement (shared propagation introduces noise) and inter-behavior reliability differences. SpectraMB performs dynamic spectral filtering in the feature dimension, reparameterizing embeddings into a feature-frequency space. Under target behavior supervision, it learns view-adaptive spectral modulation — no manual frequency thresholding needed. It then uses global context attention to assess consistency of each behavior with the global representation for reliability-aware fusion. On Yelp, Taobao, and Tmall datasets, HR@10 improves up to 12.4%, NDCG@10 up to 11.8%.
Semantic factor learning: SaFeAU's false negative mitigation
SaFeAU — In collaborative filtering, items not interacted with are treated as negative samples, but many are actually latent positives (false negatives). SaFeAU uses Semantic Factor Routing (SFR) to decouple item representations into multiple independent semantic factors, then uses Semantic Factor Matching (SFM) to identify items from non-interacted sets that share semantic factors with positive samples, marking them as potential positives. On four sparse datasets, SaFeAU improves Recall@20 by 5-10% on average, while computational efficiency beats graph methods like LightGCN. This continues the alignment-uniformity framework from DirectAU but adds semantic factors.
  • Takeaway: Sequence recommendation is moving from "unified Transformer/S4" toward "density-adaptive" — MARS selects encoder based on data density, BAHSD adjusts distillation strategy based on signal confidence, SpectraMB does spectral filtering based on reliability.
  • What to watch: Can MARS's density-adaptive design scale to larger industrial sequence lengths (1000+)? Can SaFeAU's semantic factors transfer across domains?

Directions to Watch

1. Semantic ID encoders and diagnostic tools
PrefixMem points out that semantic IDs, as an independent modality, need dedicated encoders — consistent with the logic of visual encoders in multimodal LLMs. DRQ provides diagnostic tools. As generative recommendation rolls out in industry, SID quality control and efficient encoding will become infrastructure-level problems. What to watch: Can PrefixMem maintain gains on larger LLMs (e.g., 7B+)? Can DRQ's diagnostic metrics help auto-tune SID generators?
2. Theorizing the LLM-recommendation trade-off
Taiji's POPO gives the Pareto-optimal solution for semantic-ID reward trade-offs; HPRO's hierarchical preference optimization adapts to multi-level funnels. This signals that "multi-objective alignment" in recommendation is moving from empirical tuning to theoretical analysis. What to watch: Can POPO's Pareto proof extend to three or more objectives? Can HPRO's hierarchical preference pairs nest into chain-based recommendation scenarios (e.g., interest → purchase → repurchase)?
3. Synthetic data industrialization in recommendation
SCALR models cross-domain event transfer as synthetic data generation; BAHSD uses synthetic teacher signals for long-tail distillation. Synthetic data has proven effective in the LLM domain but is still early in recommendation. What to watch: The correlation between synthetic data fidelity and downstream recommendation metrics, and how to use causal methods to ensure synthetic data doesn't introduce bias.

Paper Roundup

Generative Recommendation and Semantic IDs
OneReason — Kuaishou proposes a perception-cognition dual-factor framework with three-level CoT and specialized-unified RL training. Deployed across multiple businesses, open-sourced 8B/0.8B models.
UniPinRec — Pinterest achieves full-stack unified retrieval and ranking with Masked Action Modeling, mixed training, and cross-stage KV cache sharing. Online engagement +1%, latency -11.1%, QPS +63.6%.
DSIRM — Alibaba proposes query-bridged contrastive quantization, LLM predicts item SIDs, hierarchical prefix matching. Online UCTR +0.13%, UCTCVR +0.25%.
TDPM — Proposes time-aware diffusion with preference decoupling. On three datasets, HR@20 average improvement 29.21%.
DRQ — Shopee proposes decoupled residual quantization and diagnostic framework (expected codeword overlap, effective codebook capacity). Outperforms RQ-VAE on industrial datasets.
PrefixMem — Pinterest proposes a dedicated prefix n-gram encoder for SIDs. Deepest SID accuracy up 46%, recall up 22%.
Cross-Domain Recommendation and Cold Start
RGCD-Rep — Kuaishou proposes MLLM reasoning distillation plus transferable representation decomposition. Deployed in live streaming recommendation, serving 400M+ users.
Shallow-RHS — Tubi proposes asymmetric graph architecture for cold-start content embeddings. Online content cold-start engagement +3.5%, device cold-start +4.2%.
SCALR — Amazon proposes synthetic data-driven cross-domain event transfer. Online A/B test shows statistically significant improvement.
Quantizing Intent — Meta proposes RQ-FSQ quantization plus hierarchical discrete embeddings. 30x compression matches dense AUC, cold-start user AUC +1.522%.
LLM/Agent-Enhanced Retrieval and Ranking
Taiji — Kuaishou proposes Pareto Optimal Policy Optimization (POPO) with reverse engineering CoT and open rejection sampling. CTR +12.4%, revenue +15.2%.
HPRO — Li Auto proposes hierarchical preference ranking optimization with margin-aware Bradley-Terry. AUC 0.8161, sales +9.5% (132-day A/B).
DynaTree — Syft proposes offline agent tree-building plus online subtree selection for news retrieval. Survival rate improves from 0.32-0.53 to 0.59-0.73.
R3 — Tencent proposes Reject-as-Resource Retriever, using LLM rejection signals to train skill compatibility. Builds R3-Skill bilingual benchmark.
TAP-PER — Microsoft proposes user state prefix and query condition prefix for LLM personalization. Parameters per user reduced 130x, outperforms baselines on all LaMP tasks.
BEATS — Rakuten Taiwan uses multi-stage LLM generation plus human verification to build e-commerce attribute system from scratch. Covers 9 categories, 2,694 subcategories, 67,277 attributes.
DCRC — Tencent proposes data-centric reasoning compiler with adversarial data, multi-stage training, and compile-execute. FinQA accuracy +12.4%.
V-SPLADE — NAVER proposes inference-free multimodal sparse retrieval with caption-gated token supervision. R@5 doubles on 18.7M document corpus.
**[Retriever Portfolios](https://
  • Recommendation Systems
  • Weekly
  • Papers
  • AI Weekly 2026-W23AI Tech Daily - 2026-06-06
    Loading...