通义智文开源QwenLong-L1: 迈向长上下文大推理模型的强化学习

在这里插入图片描述

🎉 动态

2025年5月26日: 🔥 我们正式发布🤗QwenLong-L1-32B——首个采用强化学习训练、专攻长文本推理的LRM模型。在七项长文本文档问答基准测试中，QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰LRM，达到与Claude-3.7-Sonnet-Thinking持平的水准，展现了当前最先进长文本推理模型的领先实力。

2025年5月26日: 🔥 我们同步开源🤗DocQA-RL-1.6K专项强化学习数据集，包含1,600道涵盖数学演算、逻辑推理和多跳推理等领域的文档问答题目。

📚 简介

在本研究中，我们提出了QwenLong-L1，这是一种新颖的强化学习（RL）框架，旨在促进LRM从短上下文熟练度向稳健的长上下文泛化过渡。在我们的初步实验中，我们展示了短上下文和长上下文推理RL训练动态之间的差异。

在这里插入图片描述

我们的框架通过强化学习训练中的渐进式上下文扩展，增强了短上下文语言推理模型（LRM）的性能。该框架包含三个核心组件：用于初始化稳健策略的预热监督微调（SFT）阶段；通过课程引导的强化学习阶段实现从短上下文到长上下文的稳定适应；以及难度感知的回溯采样机制，通过动态调整各阶段训练复杂度来激励策略探索。我们整合了包括GRPO和DAPO在内的最新强化学习算法，结合基于规则和基于模型的二元结果奖励混合函数，以平衡精确率与召回率。在策略优化过程中，通过战略性利用群体相对优势，引导LRM学习对实现稳健长上下文锚定和卓越推理能力至关重要的有效推理模式。

在这里插入图片描述

🎯 模型发布

我们发布了🤗 QwenLong-L1-32B，这是首个通过强化学习训练、专为长文本推理设计的长上下文语言推理模型。在七项长文本文档问答基准测试中，QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰语言推理模型，达到与Claude-3.7-Sonnet-Thinking相当的水准，展现出当前最先进语言推理模型中的领先性能。

以下是评估结果。

在这里插入图片描述

🛠️ 要求

# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1

# Install requirements
pip3 install -r requirements.txt

# Install verl
cd verl
pip3 install -e .

# Install vLLM
pip3 install vllm==0.7.3 

# Install flash-attn
pip3 install flash-attn --no-build-isolation

🚀 快速入门

以下是如何使用 🤗 Transformers 运行该模型:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Tongyi-Zhiwen/QwenLong-L1-32B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
template = """Please read the following text and answer the question below.

<text>
$DOC$
</text>

$Q$

Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>" 
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=10000,
    temperature=0.7,
    top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151649 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

🗂️ 数据集

为了构建一个具有挑战性的可验证长文本推理强化学习数据集，我们开发了🤗DocQA-RL-1.6K，该数据集包含跨三个推理领域的1600个文档问答问题：

(1) 数学推理：我们使用DocMath数据集中的600道问题，这些问题要求对财务报告等长专业文档进行数值推理。对于DocMath数据集，我们从其验证集中每个子集抽取75%条目用于训练，25%用于评估；

(2) 逻辑推理：我们采用DeepSeek-R1合成了600道多选题，这些问题需要对我们精选的法律、金融、保险和生产领域真实文档进行逻辑分析；

(3) 多跳推理：我们从MultiHopRAG选取200个样本，从Musique选取200个样本，重点关注跨文档推理。

请下载以下数据集并放入./datasets/目录用于训练和评估。

强化学习训练数据：🤗DocQA-RL-1.6K

评估数据：🤗docmath、🤗frames、🤗longbench

💻 训练

我们为单阶段强化学习训练提供了基于DAPO的基础演示代码。

首先，我们应该启动一个本地验证器。

export CUDA_VISIBLE_DEVICES=0

vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \
    --host 0.0.0.0 \
    --port 23547

然后，我们开始使用4个节点进行强化学习训练。

export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"

ray_start_retry() {
    while true; do
        ray start --address="${MASTER_IP}:${PORT}"
        if [ $? -eq 0 ]; then
            break
        fi
        echo "Failed to connect to master, retrying in 5 seconds..."
        sleep 5
    done
}

check_ray_status() {
    until ray status >/dev/null 2>&1; do
        echo "Waiting for Ray cluster to be ready..."
        sleep 5
    done
}

if [ "$RANK" == "0" ]; then
    echo "Starting HEAD node..."
    ray start --head --port=${PORT}
    
    check_ray_status
    echo "Ray head node started successfully"

else
    echo "Starting WORKER node..."
    ray_start_retry
    
    check_ray_status
    echo "Successfully joined Ray cluster"
fi

if [ "$RANK" == "0" ]; then
    bash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
else
    sleep 30d
fi

wait

实践演示

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "Tongyi-Zhiwen/QwenLong-L1-32B"

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    quantization_config=quantization_config,
)

# prepare the model input
template = """Please read the following text and answer the question below.

<text>
$DOC$
</text>

$Q$

Format your response as follows: "Therefore, the answer is (insert answer here)"."""

# 填充上下文和问题
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.

Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.

Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.

Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.

Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.

The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.

Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""

question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""

# 构建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=10000,
    temperature=0.7,
    top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151649 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

输出：

thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.

Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.

Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.

Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. 

Wait, the question is about the main types and their advantages. Let me list them out:

1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.

I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.

Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).