通义智文开源QwenLong-L1: 迈向长上下文大推理模型的强化学习

news2025/5/31 20:54:41

在这里插入图片描述

🎉 动态

2025年5月26日: 🔥 我们正式发布🤗QwenLong-L1-32B——首个采用强化学习训练、专攻长文本推理的LRM模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰LRM,达到与Claude-3.7-Sonnet-Thinking持平的水准,展现了当前最先进长文本推理模型的领先实力。

2025年5月26日: 🔥 我们同步开源🤗DocQA-RL-1.6K专项强化学习数据集,包含1,600道涵盖数学演算、逻辑推理和多跳推理等领域的文档问答题目。

📚 简介

在本研究中,我们提出了QwenLong-L1,这是一种新颖的强化学习(RL)框架,旨在促进LRM从短上下文熟练度向稳健的长上下文泛化过渡。在我们的初步实验中,我们展示了短上下文和长上下文推理RL训练动态之间的差异。

在这里插入图片描述

我们的框架通过强化学习训练中的渐进式上下文扩展,增强了短上下文语言推理模型(LRM)的性能。该框架包含三个核心组件:用于初始化稳健策略的预热监督微调(SFT)阶段;通过课程引导的强化学习阶段实现从短上下文到长上下文的稳定适应;以及难度感知的回溯采样机制,通过动态调整各阶段训练复杂度来激励策略探索。我们整合了包括GRPO和DAPO在内的最新强化学习算法,结合基于规则和基于模型的二元结果奖励混合函数,以平衡精确率与召回率。在策略优化过程中,通过战略性利用群体相对优势,引导LRM学习对实现稳健长上下文锚定和卓越推理能力至关重要的有效推理模式。

在这里插入图片描述

🎯 模型发布

我们发布了🤗 QwenLong-L1-32B,这是首个通过强化学习训练、专为长文本推理设计的长上下文语言推理模型。在七项长文本文档问答基准测试中,QwenLong-L1-32B性能超越OpenAI-o3-mini和Qwen3-235B-A22B等旗舰语言推理模型,达到与Claude-3.7-Sonnet-Thinking相当的水准,展现出当前最先进语言推理模型中的领先性能。

以下是评估结果。

在这里插入图片描述

🛠️ 要求

# Create the conda environment
conda create -n qwenlongl1 python==3.10
conda activate qwenlongl1

# Install requirements
pip3 install -r requirements.txt

# Install verl
cd verl
pip3 install -e .

# Install vLLM
pip3 install vllm==0.7.3 

# Install flash-attn
pip3 install flash-attn --no-build-isolation

🚀 快速入门

以下是如何使用 🤗 Transformers 运行该模型:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Tongyi-Zhiwen/QwenLong-L1-32B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
template = """Please read the following text and answer the question below.

<text>
$DOC$
</text>

$Q$

Format your response as follows: "Therefore, the answer is (insert answer here)"."""
context = "<YOUR_CONTEXT_HERE>" 
question = "<YOUR_QUESTION_HERE>"
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=10000,
    temperature=0.7,
    top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151649 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

🗂️ 数据集

为了构建一个具有挑战性的可验证长文本推理强化学习数据集,我们开发了🤗DocQA-RL-1.6K,该数据集包含跨三个推理领域的1600个文档问答问题:

(1) 数学推理:我们使用DocMath数据集中的600道问题,这些问题要求对财务报告等长专业文档进行数值推理。对于DocMath数据集,我们从其验证集中每个子集抽取75%条目用于训练,25%用于评估;

(2) 逻辑推理:我们采用DeepSeek-R1合成了600道多选题,这些问题需要对我们精选的法律、金融、保险和生产领域真实文档进行逻辑分析;

(3) 多跳推理:我们从MultiHopRAG选取200个样本,从Musique选取200个样本,重点关注跨文档推理。

请下载以下数据集并放入./datasets/目录用于训练和评估。

强化学习训练数据:🤗DocQA-RL-1.6K

评估数据:🤗docmath、🤗frames、🤗longbench

💻 训练

我们为单阶段强化学习训练提供了基于DAPO的基础演示代码。

首先,我们应该启动一个本地验证器。

export CUDA_VISIBLE_DEVICES=0

vllm serve "Qwen/Qwen2.5-1.5B-Instruct" \
    --host 0.0.0.0 \
    --port 23547

然后,我们开始使用4个节点进行强化学习训练。

export PROJ_DIR="<YOUR_PROJ_DIR_HERE>"
export MASTER_IP="<YOUR_MASTER_IP_HERE>" # ray master ip
export NNODES=4 # total GPU nodes
export NODE_RANK=${RANK} # rank of current node
export PORT=6382
export WANDB_API_KEY="<YOUR_WANDB_API_KEY_HERE>"
export WANDB_PROJECT="QwenLong-L1"
export LLM_JUDGE=Y # 'Y': LLM JUDGE, 'N': RULE BASED
export VLLM_ATTENTION_BACKEND=FLASH_ATTN
# verifier
export VERIFIER_PATH="Qwen/Qwen2.5-1.5B-Instruct"
export VERIFIER_HOST="<YOUR_VERIFIER_HOST_HERE>"
export VERIFIER_PORT="23547"

ray_start_retry() {
    while true; do
        ray start --address="${MASTER_IP}:${PORT}"
        if [ $? -eq 0 ]; then
            break
        fi
        echo "Failed to connect to master, retrying in 5 seconds..."
        sleep 5
    done
}

check_ray_status() {
    until ray status >/dev/null 2>&1; do
        echo "Waiting for Ray cluster to be ready..."
        sleep 5
    done
}

if [ "$RANK" == "0" ]; then
    echo "Starting HEAD node..."
    ray start --head --port=${PORT}
    
    check_ray_status
    echo "Ray head node started successfully"

else
    echo "Starting WORKER node..."
    ray_start_retry
    
    check_ray_status
    echo "Successfully joined Ray cluster"
fi

if [ "$RANK" == "0" ]; then
    bash ${PROJ_DIR}/scripts/rl_4nodes_dapo.sh 2>&1 | tee ${PROJ_DIR}/logs/rl_log_$(date +%Y%m%d_%H%M%S).txt &
else
    sleep 30d
fi

wait

实践演示

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "Tongyi-Zhiwen/QwenLong-L1-32B"

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    quantization_config=quantization_config,
)

# prepare the model input
template = """Please read the following text and answer the question below.

<text>
$DOC$
</text>

$Q$

Format your response as follows: "Therefore, the answer is (insert answer here)"."""

# 填充上下文和问题
context = """
Renewable energy sources are crucial for addressing climate change and reducing dependence on fossil fuels. Solar power is one of the most abundant and widely available renewable energy sources. It converts sunlight directly into electricity using photovoltaic (PV) cells or indirectly through concentrated solar power (CSP) systems.

Wind energy is another rapidly growing renewable source. Wind turbines capture the kinetic energy of moving air and convert it into electrical energy. Onshore wind farms are more common, but offshore wind farms are becoming increasingly popular due to stronger and more consistent wind resources.

Hydroelectric power is generated by harnessing the energy of flowing water in rivers or dams. It is one of the oldest and most established renewable energy technologies, providing a reliable and flexible source of electricity.

Biomass energy uses organic materials such as wood, agricultural waste, and dedicated energy crops to produce heat, electricity, or biofuels. It is considered renewable because the carbon dioxide released during combustion is offset by the carbon dioxide absorbed during the growth of the biomass feedstock.

Geothermal energy taps into the Earth's internal heat to generate electricity or provide direct heating and cooling. It is a reliable and consistent energy source, particularly in regions with high geothermal activity.

The transition to renewable energy is driven by several factors, including environmental concerns, energy security, and technological advancements. However, challenges remain, such as the intermittency of solar and wind power, high initial costs, and the need for energy storage solutions.

Despite these challenges, the global renewable energy market is expanding rapidly. Many countries have set ambitious renewable energy targets, and investments in renewable energy technologies continue to grow. The International Renewable Energy Agency (IRENA) projects that renewable energy could account for up to 86% of global electricity generation by 2050 if current trends continue.
"""

question = """
What are the main types of renewable energy sources mentioned in the text, and what are their respective advantages?
"""

# 构建提示
prompt = template.replace('$DOC$', context.strip()).replace('$Q$', question.strip())
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=10000,
    temperature=0.7,
    top_p=0.95
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151649 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151649)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

输出:
thinking content: Okay, let me try to figure this out. The question is asking for the main types of renewable energy sources mentioned in the text and their respective advantages. First, I need to go through the text again to identify each renewable energy source and note down their advantages.

Starting with the first paragraph, it mentions solar power. The text says solar power uses photovoltaic cells or CSP systems. The advantage here is that it's abundant and widely available. Then, wind energy is next. The text talks about onshore and offshore wind farms. The advantage for wind is that offshore farms have stronger and more consistent winds, which I think makes them more reliable.

Hydroelectric power is mentioned next. It's described as one of the oldest and most established, providing reliable and flexible electricity. So the advantage here is reliability and flexibility. Biomass energy uses organic materials and is considered renewable because the CO2 released is offset by the growth of the feedstock. The advantage here is that it's renewable in terms of carbon balance.

Geothermal energy is next, using the Earth's internal heat. The advantage is that it's reliable and consistent, especially in areas with high geothermal activity. 

Wait, the question is about the main types and their advantages. Let me list them out:

1. Solar power: Abundant and widely available.
2. Wind energy: Stronger and more consistent winds offshore.
3. Hydroelectric power: Reliable and flexible.
4. Biomass energy: Carbon neutrality (offset by growth).
5. Geothermal energy: Reliable and consistent.

I think that's all the main types mentioned. The text also mentions challenges like intermittency for solar and wind, but the question is about advantages, so I should focus on the positive aspects each has. I need to make sure I didn't miss any. Let me check the text again.

Yes, the text lists solar, wind, hydroelectric, biomass, and geothermal. Each has their specific advantages as I noted. So the answer should list each type with their respective advantages.
</think>
content: Therefore, the answer is Solar power (abundant and widely available), wind energy (stronger and consistent offshore winds), hydroelectric power (reliable and flexible), biomass energy (carbon neutrality through growth of feedstock), and geothermal energy (reliable and consistent internal heat).

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2392131.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

低代码——表单生成器以form-generator为例

主要执行流程说明&#xff1a; 初始化阶段 &#xff1a; 接收表单配置对象formConf深拷贝配置&#xff0c;初始化表单数据和验证规则处理每个表单组件的默认值和特殊配置&#xff08;如文件上传&#xff09; 渲染阶段 &#xff1a; 通过render函数创建el-form根组件递归渲染表…

linux centos 服务器性能排查 vmstat、top等常用指令

背景:项目上经常出现系统运行缓慢,由于数据库服务器是linux服务器,记录下linux服务器性能排查常用指令 vmstat vmstat介绍 vmstat 命令报告关于内核线程、虚拟内存、磁盘、陷阱和 CPU 活动的统计信息。由 vmstat 命令生成的报告可以用于平衡系统负载活动。系统范围内的这…

LiveGBS国标视频平台收流模式:UDP、TCP被动与TCP主动传输模式之差异剖析

LiveGBS国标视频平台收流模式&#xff1a;UDP、TCP被动与TCP主动传输模式之差异剖析 1、背景2、信令传输3、视频流传输3.1、UDP传输模式3.2、TCP被动传输模式3.3、TCP主动传输模式 4、WEB配置流传输模式4.1、编辑模式4.2、下拉切换模式 5、搭建GB28181视频直播平台 1、背景 在…

Tomcat 使用与配置全解

一、 Tomcat简介 Tomcat服务器是Apache的一个开源免费的Web容器。它实现了JavaEE平台下部分技术规范&#xff0c;属于轻量级应用服务器。 1. Tomcat版本 Tomcat版本 JDK版本 Servlet版本 JSP版本 10.0.X 8 and later 5.0 3.0 9.0.x 8 and later 4.0 2.3 8.0.x 7…

aws instance store 的恢复

1: aws instance store 要在launch instance 才可以创建,而且,通过snapshot 恢复后,instance store 里面的数据会丢失。 下面是创建instance store 的过程,和通过两种方式恢复,发现/etc/fstab 不同的写法,有的不能启动: [root@ip-xx ~]# lsblk NAME MAJ:MIN RM …

EasyRTC音视频实时通话助力微信小程序:打造低延迟、高可靠的VoIP端到端呼叫解决方案

一、方案概述​ 在数字化通信浪潮下&#xff0c;端到端实时音视频能力成为刚需。依托庞大用户生态的微信小程序&#xff0c;是实现此类功能的优质载体。基于WebRTC的EasyRTC音视频SDK&#xff0c;为小程序VoIP呼叫提供轻量化解决方案&#xff0c;通过技术优化实现低延迟通信&a…

STM32 SPI通信(软件)

一、SPI简介 SPI&#xff08;Serial Peripheral Interface&#xff09;是由Motorola公司开发的一种通用数据总线四根通信线&#xff1a;SCK&#xff08;Serial Clock&#xff09;、MOSI&#xff08;Master Output Slave Input&#xff09;、MISO&#xff08;Master Input Slav…

每日刷题c++

快速幂 #include <iostream> using namespace std; #define int long long int power(int a, int b, int p) {int ans 1;while (b){if (b % 2){ans * a;ans % p; // 随时取模}a * a;a % p; // 随时取模b / 2;}return ans; } signed main() {int a, b, p;cin >> a …

ChemDraw 2023|Win英文|化学结构编辑器|安装教程

软件下载 【名称】&#xff1a;ChemDraw 2023 【大小】&#xff1a;1.34G 【语言】&#xff1a;英文界面 【安装环境】&#xff1a;Win10/Win11 【夸克网盘下载链接】&#xff08;务必手机注册&#xff09;&#xff1a; https://pan.quark.cn/s/320bcb67da80 【网站下载…

4.1.1 Spark SQL概述

Spark SQL是Apache Spark的一个模块&#xff0c;专门用于处理结构化数据。它引入了DataFrame这一编程抽象&#xff0c;DataFrame是带有Schema信息的分布式数据集合&#xff0c;类似于关系型数据库中的表。用户可以通过SQL、DataFrames API和Datasets API三种方式操作结构化数据…

redis五种数据结构详解(java实现对应的案例)

一、简述 Redis是一款高性能的键值对存储数据库&#xff0c;它支持五种基本数据类型&#xff0c;分别是字符串(String)、列表(List)、哈希(Hash)、集合(Set)、有序集合(Sorted Set)。 二、五种基本数据类型 2.1 字符串(String) String是Redis最基本的类型&#xff0c;一个key对…

React 生命周期与 Hook:从原理到实战全解析

&#x1f49d;&#x1f49d;&#x1f49d;欢迎莅临我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里可以感受到一份轻松愉快的氛围&#xff0c;不仅可以获得有趣的内容和知识&#xff0c;也可以畅所欲言、分享您的想法和见解。 持续学习&#xff0c;不断…

【机器学习基础】机器学习入门核心算法:逻辑回归(Logistic Regression)

机器学习入门核心算法&#xff1a;逻辑回归&#xff08;Logistic Regression&#xff09; 一、算法逻辑1.1 基本概念1.2 Sigmoid函数1.3 决策边界 二、算法原理与数学推导2.1 概率建模2.2 损失函数推导2.3 梯度下降优化2.4 正则化处理 三、模型评估3.1 常用评估指标3.2 ROC曲线…

智能仓储落地:机器人如何通过自动化减少仓库操作失误?

仓库作业的速度和准确性至关重要&#xff0c;尤其是在当前对无差错、高效作业的要求达到前所未有的环境下。每一个错误&#xff0c;无论是物品放错位置还是库存差异&#xff0c;都会在供应链中产生连锁反应&#xff0c;造成延误、增加成本&#xff0c;并最终影响客户满意度。 …

[低代码表单生成器设计基础]ElementUI中Layout布局属性Form表单属性详解

Layout 布局 ElementUI 的 Layout 布局系统基于 24 栏栅格设计&#xff0c;提供了灵活的响应式布局能力&#xff0c;适用于各种页面结构的构建。(CSDN) &#x1f4d0; 基础布局结构 ElementUI 的布局由 <el-row>&#xff08;行&#xff09;和 <el-col>&#xff0…

从“被动养老”到“主动健康管理”:平台如何重构代际关系?

在老龄化与数字化交织的背景下&#xff0c;代际关系的重构已成为破解养老难题的关键。 传统家庭养老模式中&#xff0c;代际互动多表现为单向的“赡养-被赡养”关系。 而智慧养老平台的介入&#xff0c;通过技术赋能、资源整合与情感连接&#xff0c;正在推动代际关系向“协作…

贪心算法应用:最大匹配问题详解

Java中的贪心算法应用:最大匹配问题详解 贪心算法是一种在每一步选择中都采取当前状态下最优的选择,从而希望导致结果是全局最优的算法策略。在Java中,贪心算法可以应用于多种问题,其中最大匹配问题是一个经典的应用场景。下面我将从基础概念到具体实现,全面详细地讲解贪…

爬虫IP代理效率优化:策略解析与实战案例

目录 一、代理池效率瓶颈的根源分析 二、六大核心优化策略 策略1&#xff1a;智能IP轮换矩阵 策略2&#xff1a;连接复用优化 策略3&#xff1a;动态指纹伪装 策略4&#xff1a;智能重试机制 三、典型场景实战案例 案例1&#xff1a;电商价格监控系统 案例2&#xff1a…

豆瓣电视剧数据工程实践:从爬虫到智能存储的技术演进(含完整代码)

通过网盘分享的文件&#xff1a;资料 链接: https://pan.baidu.com/s/1siOrGmM4n-m3jv95OCea9g?pwd4jir 提取码: 4jir 1. 引言 1.1 选题背景 在影视内容消费升级背景下&#xff0c;豆瓣电视剧榜单作为国内最具影响力的影视评价体系&#xff0c;其数据价值体现在&#xff1a…

基于微信小程序的漫展系统的设计与实现

博主介绍&#xff1a;java高级开发&#xff0c;从事互联网行业六年&#xff0c;熟悉各种主流语言&#xff0c;精通java、python、php、爬虫、web开发&#xff0c;已经做了六年的毕业设计程序开发&#xff0c;开发过上千套毕业设计程序&#xff0c;没有什么华丽的语言&#xff0…