【LangChain】输出解析器（Output Parsers）完全指南

news2026/5/14 18:06:52

LangChain 输出解析器Output Parsers完全指南2026 年最新版 | 覆盖所有内置解析器完整代码示例一、什么是输出解析器输出解析器是 LangChain 中连接自由文本 LLM与结构化程序的桥梁。LLM 天生输出自然语言但应用程序需要 JSON、列表、日期等结构化数据。解析器负责将原始文本转换为可直接使用的 Python 对象。二、解析器全景图类别解析器用途复杂度基础文本StrOutputParser提取纯文本内容 ⭐列表CommaSeparatedListOutputParser逗号分隔列表 ⭐NumberedListOutputParser编号列表 ⭐MarkdownListOutputParserMarkdown 列表 ⭐结构化JsonOutputParser解析为 JSON 字典 ⭐⭐PydanticOutputParser解析为 Pydantic 对象类型安全 ⭐⭐⭐专用DatetimeOutputParser日期时间格式 ⭐⭐EnumOutputParser枚举值约束 ⭐⭐XMLOutputParserXML 格式输出 ⭐⭐容错OutputFixingParser自动修复格式错误 ⭐⭐⭐RetryOutputParser/RetryWithErrorOutputParser带上下文的重试修复 ⭐⭐⭐⭐工具调用JsonOutputKeyToolsParser解析 OpenAI 工具调用 ⭐⭐⭐PydanticToolsParserPydantic 工具参数解析 ⭐⭐⭐三、基础解析器详解与示例StrOutputParser — 字符串解析器最基础的解析器从AIMessage中提取纯文本内容。fromlangchain_core.output_parsersimportStrOutputParserfromlangchain_openaiimportChatOpenAIfromlangchain_core.promptsimportChatPromptTemplate llmChatOpenAI(modelgpt-4o-mini,temperature0)# 构建链提示 → 模型 → 解析器promptChatPromptTemplate.from_template(用一句话解释{concept})chainprompt|llm|StrOutputParser()resultchain.invoke({concept:神经网络})print(result)# 输出: 神经网络是一种模拟人脑神经元连接方式的机器学习模型...print(type(result))# class str特点去除了AIMessage包装直接返回字符串适合简单问答场景。CommaSeparatedListOutputParser — CSV 列表解析器要求模型输出逗号分隔的内容自动转为 Python 列表。fromlangchain_core.output_parsersimportCommaSeparatedListOutputParserfromlangchain_core.promptsimportChatPromptTemplate parserCommaSeparatedListOutputParser()promptChatPromptTemplate.from_messages([(system,你是一个分类助手。{format_instructions}),(human,列出{topic}的主要类型不要编号用逗号分隔)]).partial(format_instructionsparser.get_format_instructions())chainprompt|llm|parser resultchain.invoke({topic:Python Web 框架})print(result)# 输出: [Django, Flask, FastAPI, Tornado, Bottle]print(type(result))# class listget_format_instructions()自动生成提示告诉模型你的输出应该是一个逗号分隔的列表。NumberedListOutputParser — 编号列表解析器解析带编号的列表如1. xxx 2. xxx。fromlangchain_core.output_parsersimportNumberedListOutputParser parserNumberedListOutputParser()promptChatPromptTemplate.from_template( 列出{topic}的5个优点使用编号格式。 {format_instructions} )chainprompt.partial(format_instructionsparser.get_format_instructions())|llm|parser resultchain.invoke({topic:微服务架构})print(result)# 输出: [独立部署, 技术栈灵活, 扩展性强, 故障隔离, 团队自治]MarkdownListOutputParser — Markdown 列表解析器解析 Markdown 格式的无序列表- item或* item。fromlangchain_core.output_parsersimportMarkdownListOutputParser parserMarkdownListOutputParser()promptChatPromptTemplate.from_template( 用 Markdown 列表格式列出{topic}的核心特性。 {format_instructions} )chainprompt.partial(format_instructionsparser.get_format_instructions())|llm|parser resultchain.invoke({topic:Docker})print(result)# 输出: [容器化, 轻量级, 可移植, 版本控制, 资源隔离]四、结构化解析器详解与示例JsonOutputParser — JSON 解析器将 LLM 输出解析为 Python 字典。可配合 Pydantic 模型生成格式说明。fromlangchain_core.output_parsersimportJsonOutputParserfrompydanticimportBaseModel,Field# 方式一无 Schema直接解析为 dictparserJsonOutputParser()promptChatPromptTemplate.from_messages([(system,提取信息并以 JSON 返回。{format_instructions}),(human,介绍一下日本包含名称、人口、大洲)]).partial(format_instructionsparser.get_format_instructions())chainprompt|llm|parser resultchain.invoke({})print(result)# 输出: {name: Japan, population: 125000000, continent: Asia}print(type(result))# class dict# 方式二带 Pydantic Schema仅生成格式说明返回仍是 dictclassCountryInfo(BaseModel):name:strField(description国家名称)population:intField(description人口数量)continent:strField(description所在大洲)parser_with_schemaJsonOutputParser(pydantic_objectCountryInfo)# 生成的格式说明更详细但返回仍是 dict 而非 CountryInfo 对象PydanticOutputParser — Pydantic 解析器强烈推荐最强大、最安全的解析器。将输出直接转为类型安全的 Pydantic 对象自动校验字段类型和必填项。frompydanticimportBaseModel,FieldfromtypingimportList,Optionalfromlangchain_core.output_parsersimportPydanticOutputParser# 定义数据结构classActionItem(BaseModel):task:strField(description任务描述)assignee:strField(description负责人)classMeetingSummary(BaseModel):title:strField(description会议标题)key_decisions:List[str]Field(description关键决策)action_items:List[ActionItem]Field(description行动项)parserPydanticOutputParser(pydantic_objectMeetingSummary)promptChatPromptTemplate.from_messages([(system,你是会议纪要助手。从会议记录中提取结构化信息。 {format_instructions}),(human,{transcript})]).partial(format_instructionsparser.get_format_instructions())chainprompt|llm|parser meeting_notes 3月10日站会。出席Alice, Bob, Carol。 Alice 说数据管道已完成待审查。 Bob 提到生产环境 API 速率限制有问题。决定API 调用实现指数退避。 Carol 周五前完成重试逻辑代码。 Bob 下周二前搭建监控面板。 Alice 审查 Carol 的 PR。 summarychain.invoke({transcript:meeting_notes})print(type(summary))# class __main__.MeetingSummaryprint(f标题:{summary.title})print(f决策:{summary.key_decisions})foriteminsummary.action_items:print(f 任务:{item.task}- 负责人:{item.assignee})输出标题: 3月10日团队站会决策: [API调用实现指数退避] 任务: 完成重试逻辑代码 - 负责人: Carol 任务: 搭建监控面板 - 负责人: Bob 任务: 审查Carol的PR - 负责人: AlicePydantic 的优势自动类型转换如字符串125000000→ 整数125000000必填字段校验缺少字段会报错字段约束如ge1, le5限制评分范围DatetimeOutputParser — 日期时间解析器将 LLM 输出解析为 Pythondatetime对象。fromlangchain.output_parsersimportDatetimeOutputParserfromlangchain_core.promptsimportChatPromptTemplate parserDatetimeOutputParser()promptChatPromptTemplate.from_messages([(system,提取日期时间信息。{format_instructions}),(human,会议定在下周三下午三点)]).partial(format_instructionsparser.get_format_instructions())chainprompt|llm|parser resultchain.invoke({})print(result)# 输出: 2026-05-20 15:00:00print(type(result))# class datetime.datetimeEnumOutputParser — 枚举解析器强制输出必须是预定义枚举值之一。fromlangchain.output_parsersimportEnumOutputParserfromenumimportEnumclassSentiment(str,Enum):POSITIVEpositiveNEGATIVEnegativeNEUTRALneutralparserEnumOutputParser(enumSentiment)promptChatPromptTemplate.from_template( 分析以下评论的情感倾向。只能从 positive/negative/neutral 中选择。评论{review} )chainprompt|llm|parser resultchain.invoke({review:产品质量非常好物流也很快})print(result)# Sentiment.POSITIVEprint(type(result))# enum Sentimentprint(result.value)# positiveXMLOutputParser — XML 解析器解析 XML 格式的输出。fromlangchain_core.output_parsersimportXMLOutputParser parserXMLOutputParser()promptChatPromptTemplate.from_messages([(system,以 XML 格式返回结果。{format_instructions}),(human,提取以下信息书名《三体》作者刘慈欣年份2008)]).partial(format_instructionsparser.get_format_instructions())chainprompt|llm|parser resultchain.invoke({})print(result)# 输出: {book: {title: 三体, author: 刘慈欣, year: 2008}}五、容错解析器生产环境必备LLM 有时会输出格式错误的 JSON缺少逗号、多余注释等。容错解析器自动修复这些问题。OutputFixingParser — 自动修复解析器当主解析器失败时调用另一个 LLM 来修复格式。fromlangchain.output_parsersimportOutputFixingParserfromlangchain_core.output_parsersimportPydanticOutputParserfrompydanticimportBaseModel,FieldclassRecipe(BaseModel):name:strField(description菜品名称)ingredients:List[str]Field(description食材列表)prep_time:intField(description准备时间分钟)base_parserPydanticOutputParser(pydantic_objectRecipe)# 用修复解析器包装基础解析器fixing_parserOutputFixingParser.from_llm(parserbase_parser,llmChatOpenAI(modelgpt-4o-mini),# 用于修复的 LLM)# 模拟一个格式错误的输出bad_output{name: 宫保鸡丁, ingredients: [鸡肉, 花生, 辣椒] prep_time: 30}# 注意缺少逗号try:resultfixing_parser.parse(bad_output)print(f修复成功:{result})exceptExceptionase:print(f修复失败:{e})工作原理修复解析器收到错误输出后将其发送给 LLM 并附加指令请修正这个 JSON 的语法错误。RetryOutputParser / RetryWithErrorOutputParser — 重试解析器比OutputFixingParser更强大不仅发送错误输出还附带原始提示和错误信息让 LLM 在完整上下文中重新生成。fromlangchain_core.output_parsersimport(PydanticOutputParser,RetryWithErrorOutputParser)fromlangchain_core.promptsimportPromptTemplateclassProductReview(BaseModel):product_name:strField(description产品名称)rating:intField(description评分 1-5,ge1,le5)summary:strField(description简短总结)llmChatOpenAI(modelgpt-4o-mini,temperature0)main_parserPydanticOutputParser(pydantic_objectProductReview)# 用重试解析器包装retry_parserRetryWithErrorOutputParser.from_llm(parsermain_parser,llmllm,max_retries2# 最多重试 2 次)prompt_template 分析以下产品评论并提取信息。 {format_instructions} 评论文本 {review_text} promptPromptTemplate(templateprompt_template,input_variables[review_text],partial_variables{format_instructions:main_parser.get_format_instructions()})# 使用 parse_with_prompt 方法传入提示对象review不错的手机4星。相机很好。希望存储空间更大。prompt_valueprompt.format_prompt(review_textreview)llm_outputllm.invoke(prompt_value)try:resultretry_parser.parse_with_prompt(llm_output.content,prompt_value)print(f产品:{result.product_name})print(f评分:{result.rating})exceptExceptionase:print(f重试后仍失败:{e})RetryWithErrorOutputParservsOutputFixingParserOutputFixingParser只给 LLM 错误输出让它猜着修RetryWithErrorOutputParser给 LLM 原始提示错误输出错误信息在完整上下文中重新生成成功率更高。六、工具调用解析器用于解析支持 Function Calling 的模型如 OpenAI、Claude的工具调用结果。JsonOutputKeyToolsParserfromlangchain_core.output_parsers.openai_toolsimportJsonOutputKeyToolsParser# 当模型返回 tool_calls 时提取指定工具的参数parserJsonOutputKeyToolsParser(key_nameget_weather)# 通常配合 with_structured_output 使用PydanticToolsParserfromlangchain_core.output_parsers.openai_toolsimportPydanticToolsParser# 将工具调用参数直接解析为 Pydantic 对象七、现代推荐做法with_structured_output()2026 年最新趋势对于支持原生结构化输出的模型GPT-4o、Claude 3、Gemini优先使用.with_structured_output()而非传统的 Output Parser 。fromlangchain_openaiimportChatOpenAIfrompydanticimportBaseModel,FieldclassMovieReview(BaseModel):title:strField(description电影标题)rating:floatField(description评分 0-10)summary:strField(description一句话总结)recommended:boolField(description是否推荐)llmChatOpenAI(modelgpt-4o-mini,temperature0)# 直接绑定结构化输出无需解析器structured_llmllm.with_structured_output(MovieReview)resultstructured_llm.invoke(评价电影《盗梦空间》)print(type(result))# class __main__.MovieReviewprint(result.title)# Inceptionprint(result.rating)# 8.8print(result.recommended)# True优势使用模型原生的 Function Calling / JSON Mode可靠性更高无需在提示中写复杂的格式说明省去解析步骤直接返回 Pydantic 对象何时还用传统 Parser使用不支持结构化输出的模型如本地小模型需要复杂的后处理逻辑如从混合文本中提取 JSON需要容错修复机制八、自定义解析器继承BaseOutputParser实现专属逻辑。fromlangchain_core.output_parsersimportBaseOutputParserfromtypingimportAnyimportreimportjsonclassMarkdownJsonExtractor(BaseOutputParser):从 Markdown 代码块中提取 JSONdefparse(self,text:str)-Any:# 匹配 json ... 块matchre.search(rjson\s*(.*?)\s*,text,re.DOTALL)ifnotmatch:raiseValueError(f未找到 JSON 代码块:{text[:100]}...)json_strmatch.group(1)try:returnjson.loads(json_str)exceptjson.JSONDecodeErrorase:raiseValueError(fJSON 解析失败:{e})defget_format_instructions(self)-str:return请将 JSON 输出包裹在 json\n...\n代码块中。# 使用custom_parserMarkdownJsonExtractor()llm_output 分析完成以下是结果 json{sentiment:positive,confidence:0.95,keywords:[质量,服务]}如有疑问请告诉我。“”result custom_parser.parse(llm_output)print(result)输出: {‘sentiment’: ‘positive’, ‘confidence’: 0.95, ‘keywords’: [‘质量’, ‘服务’]}--- ## 九、选型指南 | 场景 | 推荐方案 | 理由 | |------|----------|------| | 简单文本问答 | StrOutputParser | 最轻量 | | 逗号分隔标签 | CommaSeparatedListOutputParser | 一行代码搞定 | | 需要类型安全 | PydanticOutputParser | 自动校验 IDE 提示 | | 快速原型 | JsonOutputParser | 无需定义模型 | | 生产环境高可靠 | with_structured_output() RetryWithErrorOutputParser | 双重保障 | | 老旧模型/本地模型 | PydanticOutputParser OutputFixingParser | 容错修复 | | 混合格式提取 | 自定义 BaseOutputParser | 完全可控 | --- ## 十、完整对比总结 | 解析器 | 输入 | 输出 | 自动格式说明 | 容错能力 | 2026 推荐度 | |--------|------|------|-------------|---------|------------| | StrOutputParser | AIMessage | str | ❌ | ❌ | ⭐⭐⭐ | | CommaSeparatedListOutputParser | str | list | ✅ | ❌ | ⭐⭐⭐ | | JsonOutputParser | str | dict | ✅ | ❌ | ⭐⭐ | | PydanticOutputParser | str | Pydantic 对象 | ✅ | ❌ | ⭐⭐⭐⭐ | | DatetimeOutputParser | str | datetime | ✅ | ❌ | ⭐⭐ | | EnumOutputParser | str | Enum | ✅ | ❌ | ⭐⭐ | | XMLOutputParser | str | dict | ✅ | ❌ | ⭐ | | OutputFixingParser | str | 同包装解析器 | 同包装 | ✅ | ⭐⭐⭐⭐ | | RetryWithErrorOutputParser | str prompt | 同包装解析器 | 同包装 | ✅✅ | ⭐⭐⭐⭐⭐ | | with_structured_output() | - | Pydantic 对象 | 原生支持 | 模型级 | ⭐⭐⭐⭐⭐ | --- **核心原则**2026 年的 LangChain 开发**能用 with_structured_output() 就不用传统 Parser**必须用 Parser 时**生产环境务必加一层容错解析器**RetryWithErrorOutputParser[^1^][^2^][^5^]。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2605271.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！