2023-ICLR-ReAct 首次结合Thought和Action提升大模型解决问题的能力

news2025/6/1 4:37:55

关于普林斯顿大学和Google Research, Brain Team合作的一篇文章, 在语言模型中协同Reasoning推理和Action行动。

论文地址：https://arxiv.org/abs/2210.03629
代码：https://github.com/ysymyth/ReAct.git
其他复现 langchain ：https://python.langchain.com/api_reference/langchain/agents/langchain.agents.agent.AgentExecutor.html#

作者们注意到，尽管LLMs在理解和生成方面表现出色，但它们在推理和行动方面的能力通常是分开研究的。他们提出，通过交互式的方式生成推理痕迹（reason）和任务特定行动（act），可以更有效地结合这两种能力，从而提高模型的 interpretability、trustworthiness 和解决复杂任务的能力。

数据:

HotpotQA: 多跳问题回答基准，需要模型跨越多个Wikipedia页面进行推理。
FEVER: 事实验证基准，模型必须基于Wikipedia页面验证声明的真实性。
方法：
Standard（标准提示）：删除ReAct轨迹中的所有思想、行动、观察等步骤。
CoT（思想链提示）：删除行动和观察，保留思想，并作为仅用于推理的基线。
CoT-SC（self-consistency）：利用自一致性[1]方法，在推理期间抽样21个CoT轨迹，解码温度为0.7，并采用大多数投票得到答案。
Act：仅仅保留Agent提示（Act），它删除了ReAct轨迹中的Thought思维过程，可以初步认为其类似于WebGPT。
ReAct：本文的Thought + Action结合的方法。
ReAct → CoT-SC：当ReAct未能在给定步骤内返回答案时，返回CoT-SC结果。
CoT-SC → ReAct：当n个CoT-SC样本中的大多数答案少于n/2次（即内部知识可能无法自信地支持任务）时，返回ReAct结果。
微调 (Finetuning)
- 使用3000个由ReAct生成的正确答案轨迹来微调较小的语言模型。‘’

Langchain 中实现的Prompt

PREFIX = """Answer the following questions as best you can. You have access to the following tools:""" FORMAT_INSTRUCTIONS = """Use the following format: 
shell
Question: the input question you must answer 
Thought: you should always think about what to do 
Action: the action to take, should be one of [{tool_names}] 
Action Input: the input to the action 
Observation: the result of the action 
... (this Thought/Action/Action Input/Observation can repeat N times) 
Thought: I now know the final answer 
Final Answer: the final answer to the original input question""" 
SUFFIX = """Begin!

Question: {input} 
Thought:{agent_scratchpad}"""

示例

from langchain.agents import initialize_agent
from langchain.llms import OpenAI
from langchain.tools import BaseTool


# 搜索工具
class SearchTool(BaseTool):
    name = "Search"
    description = "如果我想知道天气，'鸡你太美'这两个问题时，请使用它"
    return_direct = True  # 直接返回结果

    def _run(self, query: str) -> str:
        print("\nSearchTool query: " + query)
        return "这个是一个通用的返回"

    async def _arun(self, query: str) -> str:
        raise NotImplementedError("暂时不支持异步")


# 计算工具
class CalculatorTool(BaseTool):
    name = "Calculator"
    description = "如果是关于数学计算的问题，请使用它"

    def _run(self, query: str) -> str:
        print("\nCalculatorTool query: " + query)
        return "100"

    async def _arun(self, query: str) -> str:
        raise NotImplementedError("暂时不支持异步")


llm = OpenAI(temperature=0.5)
tools = [SearchTool(), CalculatorTool()]
agent = initialize_agent(
    tools, llm, agent="zero-shot-react-description", verbose=True)

print("问题：")
print("答案：" + agent.run("查询这周天气"))
print("问题：")
print("答案：" + agent.run("告诉我'鸡你太美'是什么意思"))
print("问题：")
print("答案：" + agent.run("告诉我'hello world'是什么意思"))
print("问题：")
print("答案：" + agent.run("告诉我10的3次方是多少?"))