用AI智能体制作在线课程

news2026/5/5 14:18:41

输入框里有一行字教我如何为LLM应用构建生产级检索系统。十分钟后管道返回一个目录course/ ├── syllabus.md ├── lectures/ │ ├── 01_what_retrieval_actually_does.md │ ├── 02_chunking_strategies_that_dont_ruin_recall.md │ ├── 03_embedding_models_that_arent_oai_ada_002.md │ ├── 04_vector_stores_and_when_they_are_overkill.md │ ├── 05_hybrid_search_bm25_and_dense_together.md │ ├── 06_rerankers_and_what_they_fix.md │ ├── 07_query_rewriting_and_when_to_skip_it.md │ ├── 08_evaluation_beyond_vibe_check.md │ ├── 09_grounding_and_citation_discipline.md │ ├── 10_freshness_and_re_indexing.md │ ├── 11_latency_and_cost_tuning.md │ └── 12_failure_modes_and_how_to_ship_with_them.md ├── quizzes/ │ ├── module_1.json │ ├── module_2.json │ └── module_3.json ├── assignments/ │ ├── module_2_build_a_reranker.md │ └── module_3_ship_a_grounded_answer_pipeline.md └── slides/ └── course.pdf第一个模块的教学大纲看起来像这样Module 1: Retrieval without the Buzzwords (3 lessons) Lesson 1: What retrieval actually does Objectives: - Explain what the retrieval step is for (not what embeddings are). - Distinguish retrieval from generation and from ranking. - Identify where retrieval belongs in the RAG pipeline.这个输出的API花费约0.26美元挂钟时间14分钟。根据iSpring的2026年定价指南一个1级定制电子学习模块——带旁白的幻灯片配静态图形和简单知识检查——每完成一小时成本在250到500美元之间。这个管道本身并不能发布一个完成的500美元课程。它发布的是一个足够快的第一稿编辑可以在一个下午内完成它。这篇文章涵盖了如何构建这个管道。我们将使用LangGraph中的四个代理和一个反模板化审查器。我们将在规划和写作之间设置一个人工审批关卡。我们将通过一个贯穿始终的实例一直跟踪到真正的PDF并仔细研究这个架构所防止的确切失败模式。1、为什么课程生成是一个管道而不是一个提示大多数开发者尝试通过编写一个大型提示来构建课程生成器。他们告诉一个语言模型来写一个关于某个主题的完整课程。教学大纲看起来不错第一个模块也不错。到第三个模块时文本开始漂移。到第四个模块时模型与第一个模块矛盾了。语调游移不定学习目标悄悄消失了。没有任何单一上下文窗口既足够长以容纳整个课程又足够聚焦以使每节课都有冲击力。课程生成几乎完美地适合多代理分解。各个阶段干净地可分离。教学大纲、讲义、评估和生产直接映射到真实的教学设计阶段。ADDIE框架分析、设计、开发、实施、评估长期以来就使用这些完全相同的接缝。每个阶段有不同的最优模型。教学大纲需要一个推理密集型模型而讲义正文散文需要便宜快速的模型。幻灯片生产还需要确定性代码。Instructional Agents论文Yao等人亚利桑那州立大学EACL 2026证明了这是有效的。他们通过人类专家评估了跨越五门大学级计算机科学课程的多代理框架。他们的完全副驾驶模式持续优于自主生成。我们将构建该架构的面向生产的改编版本。2、架构我们的系统使用四个代理、一个审查器和一个人工审批关卡。课程代理接收一个输入句子并输出结构化教学大纲。内容代理接收一个课程大纲并输出一个完整讲义脚本。这在所有课程中并行运行。评估代理接收一个模块并输出测验和作业。生产代理是确定性Python。它接收所有输出并组装幻灯片和PDF。反模板化审查器在每次内容生成后被调用。它返回结构化纠正或批准。我们使用相对便宜的模型来完成此任务。一个监督节点将它们连接起来。我们使用显式路由进行确定性阶段转换使用动态并行扇出进行写作阶段。每个代理都读写共享状态。状态模式是管道的骨架。我们使用Pydantic模型来定义数据契约。from typing import Annotated, Literal, TypedDict from pydantic import BaseModel, Field BloomLevel Literal[ remember, understand, apply, analyze, evaluate, create ] class LearningObjective(BaseModel): id: str Field(descriptionshort stable id, e.g. lo-2-3) text: str Field(descriptionmeasurable, learner-centered verb phrase) bloom: BloomLevel module_id: str class Lesson(BaseModel): id: str module_id: str title: str objectives: list[str] Field(descriptionLearningObjective ids covered) prerequisites: list[str] Field(default_factorylist) outline: list[str] Field(descriptionhook, concept, mechanism, example, ...) class Module(BaseModel): id: str title: str summary: str lessons: list[str] class Syllabus(BaseModel): topic: str audience: str prerequisites: list[str] modules: list[Module] lessons: list[Lesson] objectives: list[LearningObjective] class Lecture(BaseModel): lesson_id: str title: str body_markdown: str checklist: list[str] rewrite_count: int 0 class ReviewIssue(BaseModel): category: Literal[ filler_phrase, buzzword, missing_specificity, uniform_rhythm, vague_conclusion, lexical_repetition, ] example: str Field(descriptionliteral quote from the draft) fix: str Field(descriptionone-sentence correction instruction) class ReviewVerdict(BaseModel): lesson_id: str approved: bool severity: Literal[low, medium, high] low issues: list[ReviewIssue] Field(default_factorylist) class Quiz(BaseModel): module_id: str questions: list[dict] class Assignment(BaseModel): module_id: str brief_markdown: str rubric: list[str] def merge_dicts(left: dict, right: dict) - dict: return {**left, **right} class CourseState(TypedDict, totalFalse): topic: str audience: str syllabus: Syllabus human_feedback: str lectures: Annotated[dict[str, Lecture], merge_dicts] verdicts: Annotated[dict[str, ReviewVerdict], merge_dicts] quizzes: Annotated[dict[str, Quiz], merge_dicts] assignments: Annotated[dict[str, Assignment], merge_dicts] output_dir: str export_bundle_path: str我们对讲义使用自定义字典归约器而不是简单的列表追加。如果讲义被审查器拒绝并重写新草稿会按其课程ID覆盖旧草稿而不是在状态中堆积重复项。这保持了下游生产步骤的最终负载干净。教学大纲持有一个带有显式模块ID反向引用的扁平课程列表。这种扁平形状使后续的并行扇出变成简单的一行代码。2.1 课程代理课程代理是大多数初次尝试课程生成器做对最少但最肤浅的部分。我们需要仔细地做。一个朴素的提示要求模型写一个包含十二节课的教学大纲。我们的课程代理首先推导可衡量的学习目标。然后将它们分组到模块中再选择教授它们的课程。这在机械层面强制执行了Quality Matters Rubric的对齐标准。每个学习目标都带有布鲁姆分类法级别标签。选项是记忆、理解、应用、分析、评估和创造。这迫使评估代理稍后产生真正的问题混合而不是十个多项选择词汇检查。我们在提示中强制执行硬上限。我们将课程限制为三个模块和总共十五节课。超过这个数量系统就开始生成垃圾来填充空间。教学大纲起草后图形暂停。人类审查教学大纲然后批准它或要求修改。from langchain_google_genai import ChatGoogleGenerativeAI from langgraph.types import Command, interrupt CURRICULUM_PROMPT You are a Curriculum Architect. Given one sentence of intent, produce a full course syllabus designed *backwards from learning objectives*. Hard constraints: - 3 modules total. 10 to 15 lessons total. Do not exceed. - Every learning objective is measurable and learner-centered (starts with a Bloom verb). - Every module has at least one objective at Apply level or higher. - Every lesson maps to 1 to 3 objectives from its module. - Prerequisites surfaced explicitly at course level. Topic: {topic} Audience: {audience} Prior reviewer feedback (apply it verbatim, do not argue with it): {feedback} def curriculum_node(state: CourseState) - Command: llm ChatGoogleGenerativeAI(modelgemini-2.5-pro, temperature0.3) architect llm.with_structured_output(Syllabus) syllabus: Syllabus architect.invoke([ { role: system, content: CURRICULUM_PROMPT.format( topicstate[topic], audiencestate.get(audience, working software engineers), feedbackstate.get(human_feedback) or none — first draft, ), } ]) decision interrupt({ phase: syllabus_review, topic: state[topic], syllabus: syllabus.model_dump(), options: [approve, revise], }) if decision.get(action) approve: return Command( update{syllabus: syllabus, human_feedback: }, gotofan_out, ) return Command( update{human_feedback: decision.get(feedback, )}, gotocurriculum, )interrupt()调用暂停图形。它返回调用者在恢复时传递的任何值。负载形状是图形和用户界面之间的契约。修改路径循环回到课程节点。下一次运行从状态中读取人类反馈模型会看到它。这是定向修订。完整状态由检查点器持久化所以人类可以花几个小时来审查教学大纲图形会完美恢复。2.2 内容代理课程代理产生十二个课程大纲。内容代理并行写入所有十二个。顺序生成大约需要十五分钟的写作。并行生成大约需要九十秒。每节课都是用聚焦于自身目标的独立上下文生成的。这完全消除了长上下文漂移。from langgraph.types import Send def fan_out_to_content_agents(state: CourseState) - list[Send]: Conditional edge: one Send per lesson, dispatched in one superstep. syllabus state[syllabus] objectives_by_id {o.id: o for o in syllabus.objectives} return [ Send( content, { lesson: lesson.model_dump(), objectives: [ objectives_by_id[oid].model_dump() for oid in lesson.objectives if oid in objectives_by_id ], prerequisites: lesson.prerequisites, topic: state[topic], }, ) for lesson in syllabus.lessons ] def fan_out_node(state: CourseState) - dict: Identity node — exists only to be the source of the conditional edge. return {}每个Send是一个对命名节点的调度调用带有自定义负载。当fan_out_to_content_agents返回一个包含十二个Send对象的列表时LangGraph在同一个超级步骤中调度content的十二个独立运行。内容节点不接收完整的CourseState。它只接收其Send中的小字典。这就是消除长上下文漂移的机制。第七节课不知道第三节课的内容也无法与之矛盾因为模型看不到它。输出通过我们之前定义的merge_dicts归约器合并回全局状态。每次运行返回{lectures: {lesson_id: Lecture}}归约器将十二个单键字典组合成一个按课程ID键入的lectures字典。对同一状态槽的并行写入不会冲突因为每次运行都写入不同的键。fan_out_node是一个垫片。LangGraph中的条件边必须附加到源节点而我们希望图形可视化读起来清晰。将扇出直接附加到curriculum会在一个节点上纠缠两个不相关的分支——人工审批的修改循环和并行写作阶段——所以我们用一个空操作节点来保持图表清晰。2.3 反模板化审查器这是每个演示都跳过的部分。如果你从这篇文章中只取一个概念它应该是这一层。模板化内容包括通用填充短语如在当今不断变化的格局中或值得注意的是。它包括游戏规则改变者或无缝这样的流行词。它缺乏具体性。没有数字没有命名工具没有代码片段。句子长度都一样结论是只有时间会证明这样模糊的短语。我们的审查器是一个使用Gemini 2.5 Flash的结构化输出代理。它有一个任务。它接收一个生成的讲义并返回一个结构化裁决。它不在抽象中判断质量。它针对一个具体检查清单进行模式匹配。from langchain_google_genai import ChatGoogleGenerativeAI from langgraph.types import Command MAX_REWRITES 2 LESSON_PROMPT Write one lesson for a course on: {topic} Lesson title: {title} Learning objectives (must all be addressed, tagged with Bloom level): {objectives} Prerequisites the reader already has: {prereqs} Skeleton (use these exact H2 headings): ## Hook ## Concept ## Mechanism ## Worked example ## Common failure ## Summary ## Checklist Hard rules: - 800 to 1500 words. - At least two concrete, named artifacts (tools, libraries, papers, or code). - At least one code snippet OR one concrete numerical example. - No filler phrases (in todays evolving landscape, its worth noting). - No vague closers (the possibilities are endless). - Reviewer corrections to apply verbatim (if any): {corrections} def content_node(state: dict) - dict: llm ChatGoogleGenerativeAI(modelgemini-2.5-pro, temperature0.6) writer llm.with_structured_output(Lecture) corrections state.get(reviewer_corrections) or none — first draft rewrite_count state.get(rewrite_count, 0) lecture: Lecture writer.invoke([ { role: system, content: LESSON_PROMPT.format( topicstate[topic], titlestate[lesson][title], objectives\n.join( f- [{o[bloom]}] {o[text]} for o in state[objectives] ), prereqs, .join(state.get(prerequisites) or []) or none, correctionscorrections, ), } ]) lecture.lesson_id state[lesson][id] lecture.rewrite_count rewrite_count return {lectures: {lecture.lesson_id: lecture}} SLOP_PATTERNS [ in todays, ever-evolving, its worth noting, in conclusion, game-changer, revolutionary, seamless, cutting-edge, the possibilities are endless, only time will tell, dive deep, unlock the power, at the end of the day, ] REVIEWER_PROMPT You are an anti-slop reviewer. You do not judge quality in the abstract. You pattern-match against a concrete checklist and return structured issues. Reject the lecture if ANY of these are true: - Contains filler phrases (see regex hits below). - Uses buzzwords without concrete referents. - Any H2 section lacks a specific named tool, library, paper, number, or code. - Ends with a vague conclusion (endless, only time will tell, stay tuned). - Any phrase repeats 3 times across the lecture. Regex pre-check hits: {regex_hits} Lecture id: {lesson_id} Title: {title} Return a ReviewVerdict. Approve only if clean. def reviewer_node(state: dict) - Command: llm ChatGoogleGenerativeAI(modelgemini-2.5-flash, temperature0.0) judge llm.with_structured_output(ReviewVerdict) lesson_id state[lesson][id] lecture state[lectures][lesson_id] lowered lecture.body_markdown.lower() regex_hits [p for p in SLOP_PATTERNS if p in lowered] verdict: ReviewVerdict judge.invoke([ { role: system, content: REVIEWER_PROMPT.format( regex_hitsregex_hits or none, lesson_idlecture.lesson_id, titlelecture.title, ), }, {role: user, content: lecture.body_markdown}, ]) verdict.lesson_id lecture.lesson_id if verdict.approved or lecture.rewrite_count MAX_REWRITES: return Command(update{verdicts: {lesson_id: verdict}}, gotocollect) corrections \n.join(f- [{i.category}] {i.fix} for i in verdict.issues) return Command( update{ rewrite_count: lecture.rewrite_count 1, reviewer_corrections: corrections, }, gotocontent, ) def collect_node(state: CourseState) - dict: return {}正则预检故意简单。它对已知模板化短语进行字面子字符串匹配。这节省了语言模型在每次调用时重新推导什么是模板化的工作。它是一个廉价先验服务于一个昂贵模型。我们对重写使用硬上限。rewrite_count MAX_REWRITES分支是回退。如果模型两次失败我们记录裁决并继续。人类编辑稍后可以看到审查器放弃了哪些课程。循环对每个扇出分支是局部的。第三节课可能正在第二次重写而第七节课还在第一次。它们都在同一个超级步骤中独立收敛。收集节点是一个恒等节点。它充当十二个并行分支在评估运行之前重新汇合的连接点。2.4 评估代理评估代理在结构上与内容代理相同。它接收模块上下文并生成测验和作业。提示要求代理标记每个问题的布鲁姆级别并针对特定分布20%记忆、40%应用、30%分析、10%评估。在生产系统中你会在这里添加一个后生成验证器如果比例偏离目标太远或问题映射到缺失的目标ID则拒绝输出。我们还为每个模块生成一个多部分作业限定在应用级别或更高的目标。它包括一个评分标准教师可以据此评分。def assessment_node(state: CourseState) - dict: llm ChatGoogleGenerativeAI(modelgemini-2.5-pro, temperature0.4) quiz_writer llm.with_structured_output(Quiz) brief_writer llm.with_structured_output(Assignment) quizzes: dict[str, Quiz] {} assignments: dict[str, Assignment] {} for module in state[syllabus].modules: module_objectives [ o for o in state[syllabus].objectives if o.module_id module.id ] quiz quiz_writer.invoke( fWrite a 10-question quiz for module {module.title}. fDistribute Bloom levels: 20% remember/understand, 40% apply, f30% analyze, 10% evaluate/create. Each question maps to exactly fone objective id from: {[o.id for o in module_objectives]} ) quizzes[module.id] quiz if any(o.bloom in (apply, analyze, create) for o in module_objectives): assignment brief_writer.invoke( fWrite one multi-part assignment for module {module.title} fscoped to objectives at Apply or higher. Include a rubric. ) assignments[module.id] assignment return {quizzes: quizzes, assignments: assignments}2.5 使用Marp的确定性生产这是管道中唯一严格不需要LLM的部分。幻灯片组装是确定性的。我们使用Marp CLI。它接收结构化Markdown并生成PDF、PPTX或HTML幻灯片。代理从每节课的Hook、Summary和Checklist块中确定性地生成Marp Markdown。没有语言模型生成幻灯片内容。LLM生成的幻灯片布局是损坏幻灯片的最大来源。它们产生过大的文本、重叠的图像和项目符号泥石流。模板和确定性组装完全消除了这个失败类别。import re import subprocess from pathlib import Path MARP_HEADER --- marp: true theme: default paginate: true --- # {course_title} #### {audience} --- SECTION_RE re.compile(r^## (Hook|Summary|Checklist)\s*\n(.*?)(?\n## |\Z), re.MULTILINE | re.DOTALL) def lecture_to_slides_md(lecture: Lecture, module_title: str) - str: Deterministic: no LLM call. Extract fixed sections, template them. sections {m.group(1): m.group(2).strip() for m in SECTION_RE.finditer(lecture.body_markdown)} slides [ f## {module_title}\n### {lecture.title}, f### Why this lesson\n\n{sections.get(Hook, ).strip()}, f### The idea\n\n{sections.get(Summary, ).strip()}, ] checklist sections.get(Checklist, ).strip() if checklist: slides.append(f### Checklist\n\n{checklist}) return \n\n---\n\n.join(slides) \n\n---\n\n def build_course_deck( syllabus: Syllabus, lectures: dict[str, Lecture], out_dir: Path, ) - Path: out_dir.mkdir(parentsTrue, exist_okTrue) md_path out_dir / course.md pdf_path out_dir / course.pdf modules_by_id {m.id: m for m in syllabus.modules} parts [MARP_HEADER.format(course_titlesyllabus.topic, audiencesyllabus.audience)] for lesson in syllabus.lessons: lecture lectures.get(lesson.id) if lecture is None: continue module_title modules_by_id[lesson.module_id].title parts.append(lecture_to_slides_md(lecture, module_title)) md_path.write_text(\n.join(parts), encodingutf-8) subprocess.run( [marp, str(md_path), --pdf, -o, str(pdf_path), --allow-local-files], checkTrue, ) return pdf_path def production_node(state: CourseState) - dict: out_dir Path(state.get(output_dir) or ./course) pdf build_course_deck(state[syllabus], state[lectures], out_dir / slides) return {export_bundle_path: str(pdf)}正则依赖于前面提示所强制的确切H2标题。提示保证了下游解析器所依赖的结构。这是一个关键的设计模式。当下游消费者是模板化的时输出就足够确定性了。2.6 连接完整图形我们有五个节点、一个归约器驱动的扇出和一个人工审批关卡。我们将它们编译成一个图形。from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, START, StateGraph from langgraph.types import Command def build_course_graph(checkpointerNone) - StateGraph: g StateGraph(CourseState) g.add_node(curriculum, curriculum_node) g.add_node(fan_out, fan_out_node) g.add_node(content, content_node) g.add_node(reviewer, reviewer_node) g.add_node(collect, collect_node) g.add_node(assessment, assessment_node) g.add_node(production, production_node) g.add_edge(START, curriculum) g.add_conditional_edges(fan_out, fan_out_to_content_agents, [content]) g.add_edge(content, reviewer) g.add_edge(collect, assessment) g.add_edge(assessment, production) g.add_edge(production, END) return g.compile(checkpointercheckpointer or MemorySaver()) def run_course_pipeline(topic: str, audience: str, thread_id: str) - str: graph build_course_graph() config {configurable: {thread_id: thread_id}} result graph.invoke( {topic: topic, audience: audience, output_dir: f./course/{thread_id}}, configconfig, ) while __interrupt__ in result: payload result[__interrupt__][0].value print(f\n Review syllabus for: {payload[topic]} ) for mod in payload[syllabus][modules]: print(f Module: {mod[title]} — {len(mod[lessons])} lessons) choice input(\n[a]pprove / [r]evise: ).strip().lower() if choice.startswith(r): feedback input(What should change? ) result graph.invoke( Command(resume{action: revise, feedback: feedback}), configconfig, ) else: result graph.invoke( Command(resume{action: approve}), configconfig, ) return result[export_bundle_path] if __name__ __main__: path run_course_pipeline( topicTeach me how to build production-grade retrieval systems for LLM apps., audienceworking AI engineers who have shipped a RAG demo, thread_idcourse-2026-04-19, ) print(f\nDone. Course PDF at: {path})边列表清楚地展示了整个架构。从课程到扇出和从审查器到收集的转换由节点内的Command(goto...)指令驱动。这是现代LangGraph风格。while __interrupt__ in result循环处理人工审批契约。如果教学大纲被修改了三次循环就运行三次。线程ID在所有这些操作中保持状态固定。3、结束语如果你想在基本管道运行后继续构建你可以将讲义脚本输入文本转语音API并将其与FFmpeg配对以生成旁白MP4导出。你还可以添加一个对齐审计器——第二个审查器检查评估是否实际测量了声明的目标——如果输出要放在认证机构面前这值得做。课程创建并没有被这个管道取代。代理没有取代课程创建者。它取代了课程创建者时间线的前三周。原文链接用AI智能体制作在线课程 - 汇智网

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2585225.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！