Jiahong 的个人博客

凡事预则立,不预则废


  • Home

  • Tags

  • Archives

  • Navigation

  • Search

RLTag

NLP——LLM对齐微调-Skywork-Reward

NLP——LLM-as-a-judge

NLP——DeepSeek-GRM

NLP——LLM对齐微调-DPO

NLP——LLM对齐微调-GRPO

NLP——LLM对齐微调-OpenRubrics

NLP——LLM对齐微调-Pass@k-Training

NLP——LLM对齐微调-RuscaRL

RL——DDPO

RL——Decision-Transformer

1…678…12
Joe Zhou

Joe Zhou

Stay Hungry. Stay Foolish.

638 posts
53 tags
GitHub E-Mail
© 2026 Joe Zhou
Powered by Hexo
|
Theme — NexT.Gemini v5.1.4