NLP——LLM-Rubric-RL相关总结

注:本文包含 AI 辅助创作


整体说明

  • 在 2025 年的 LLM RL 研究中,利用 Rubrics(评分细则/量规) 来构建更精细、可解释的奖励函数已经成为主流趋势之一
  • 这种方法主要解决传统标量奖励(Scalar Reward)无法提供细粒度指导的问题
  • 本文总结相关领域具有代表性的论文,论文详细阅读见其他内容

RaR


DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research


RubricRL: Simple Generalizable Rewards for Text-to-Text Generation


AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction


(Rubicon) Reinforcement Learning with Rubric Anchors


(RuscaRL) Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning


Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning


RLAC: Reinforcement Learning with Adversarial Critic for Dynamic Rubric Generation


PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling


Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training


Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling


(Self-Rewarding Rubrics) Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning


QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA


AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning


(DeepSeek-GRM)Inference-Time Scaling for Generalist Reward Modeling