NLP——OpenR1项目相关笔记

本文主要介绍 LLM OpenRLHF 库的使用

参考链接：
- 开源代码：github.com/huggingface/open-r1

安装问题

安装时发现 uv 安装比较奇怪，不好下载，所以使用 conda 管理环境
执行安装 pip install flash-attn --no-build-isolation 时收到 pip 包的报警（flash-attn 库不符合新规范），且出现编译卡住，整个电脑无法动弹的情况，修改为 pip install flash-attn --no-build-isolation --use-pep517 后解决（安装需要编译很久 5-10min）

GRPO 运行问题

运行步骤1，用一块显卡启动服务器

1	CUDA_VISIBLE_DEVICES=0 trl vllm-serve --model /home/jiahong/llm/model/DeepSeek-R1-Distill-Qwen-1.5B

运行后会开启端口等待

运行步骤2，用剩余的显卡启动训练（由于剩下只有一张显卡，所以并行数量也设置为1）

CUDA_VISIBLE_DEVICES=1 ACCELERATE_LOG_LEVEL=info \
	accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes 1 \
	src/open_r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config_demo_local.yaml \
	--vllm_server_host 127.0.0.1

注意：原始项目中没有 --vllm_server_host 127.0.0.1 这一项配置，若缺失可能出现 Server is not up yet. 错误
详情见：连接 why why why??? INFO - trl.extras.vllm_client - Server is not up yet. Retrying in 2.0 seconds…? #568 中的回答