Agent 记忆系统设计：短期、长期到知识图谱

news2026/4/16 22:26:55

一句话定义Agent 记忆系统让 AI 像人一样把「刚刚发生的」「学过的」「长期积累的」分层管理。类比人类的记忆分三层——工作记忆当前对话的上下文几分钟内、情节记忆某件具体的事比如「上周我们讨论过 XX 方案」、语义记忆知识图谱比如「TypeScript 的类型系统是这样的」。AI 的记忆系统设计和这个完全对应人类记忆Agent 对应实现方式工作记忆短期记忆Context Window消息列表情节记忆长期记忆向量数据库语义检索语义记忆知识图谱结构化知识存储图数据库第一层短期记忆——Context Window 的正确用法短期记忆就是 Context Window 里的消息列表但原样塞满不是最优解。大多数开发者的实现是直接追加// ❌ 原始实现无脑追加必然 OOMconstmessages: Message[] [];asyncfunctionchat(userInput: string) { messages.push({ role: user, content: userInput });const response await llm.invoke(messages); // 越来越长最终爆 context messages.push({ role: assistant, content: response.content });return response.content;}问题很明显聊 20 轮之后Context Window 满了要么报错要么模型开始遗忘早期内容。正确做法滑动窗口摘要压缩保留近期上下文的同时不丢失历史。import { ChatOpenAI } fromlangchain/openai;import { SystemMessage, HumanMessage, AIMessage } fromlangchain/core/messages;const llm newChatOpenAI({ model: gpt-4o-mini });// 核心动态管理消息列表asyncfunctionmanageHistory(history: Message[], maxTokens 4000): PromiseMessage[] {const currentTokens estimateTokens(history);if (currentTokens maxTokens) { return history; // 还没满原样返回 }// 保留最近 10 条5轮对话防止丢失即时上下文const recent history.slice(-10);const older history.slice(0, -10);// 用模型压缩历史比规则截断效果好 30%const summary await llm.invoke([ newSystemMessage( 将以下对话历史压缩成 200 字以内的摘要保留关键决策、用户偏好和重要结论 ), newHumanMessage( older.map((m) ${m.role}: ${m.content}).join(\n) ), ]);// 摘要作为系统消息放最前面return [ newSystemMessage(对话历史摘要${summary.content}), ...recent, ];}// 粗估 token 数4个字符约等于1个tokenfunctionestimateTokens(messages: Message[]): number {return messages.reduce((sum, m) sum m.content.length / 4, 0);}核心短期记忆不是越长越好滑动窗口摘要压缩是平衡成本和质量的正确姿势。第二层长期记忆——向量数据库实现语义检索长期记忆解决的问题是「我一个月前告诉过你的事你现在能不能想起来」实现原理很直接——把历史对话或重要信息向量化存储需要时按语义相似度检索。关键点在于存什么、什么时候存、存多少。import { OpenAIEmbeddings } fromlangchain/openai;import { MemoryVectorStore } fromlangchain/vectorstores/memory;import { Document } fromlangchain/core/documents;// 初始化向量存储生产环境用 Pinecone / Weaviate / Chromaconst embeddings newOpenAIEmbeddings();const vectorStore newMemoryVectorStore(embeddings);// ✅ 选择性存储只存有价值的信息asyncfunctionsaveToLongTermMemory(content: string,metadata: { type: preference | fact | decision | task; importance: high | medium | low; userId: string; }) {// 低重要度的内容不存节省存储和检索噪音if (metadata.importance low) return;await vectorStore.addDocuments([ newDocument({ pageContent: content, metadata: { ...metadata, timestamp: Date.now(), // 重要存储时间戳旧的记忆权重要打折 }, }), ]);}// ✅ 检索时加时间衰减越近的记忆越相关asyncfunctionrecallMemory(query: string, userId: string) {const results await vectorStore.similaritySearchWithScore( query, 5, // 取最相似的 5 条 { userId } // 只检索当前用户的记忆 );// 时间衰减30天前的记忆相关性打 0.7 折const now Date.now();const decayedResults results.map(([doc, score]) { const ageInDays (now - doc.metadata.timestamp) / (1000 * 60 * 60 * 24); const decayFactor ageInDays 30 ? 0.7 : 1.0; return { doc, score: score * decayFactor }; });// 只返回相关性 0.7 的结果避免噪音return decayedResults .filter(({ score }) score 0.7) .map(({ doc }) doc.pageContent);}// 在 Agent 回答前先检索相关记忆注入 contextasyncfunctionagentWithMemory(userInput: string, userId: string) {const memories awaitrecallMemory(userInput, userId);const systemPrompt memories.length 0 ? 你记得关于这个用户的以下信息\n${memories.join(\n)}\n\n基于这些记忆回答问题。 : 你是一个助手。;return llm.invoke([ newSystemMessage(systemPrompt), newHumanMessage(userInput), ]);}核心长期记忆不是把所有对话都存进去选择性存储时间衰减才能保持信噪比。第三层知识图谱——结构化知识的极致形态知识图谱解决的是「关系」问题——不只是记住事实还要记住事实之间的关联。举个例子「用户喜欢 React」「用户在做 AI 项目」「React 有 AI SDK」——如果这三条是孤立存储的Agent 无法推导出「可以向用户推荐 Vercel AI SDK」。但如果存在图里路径推理就能做到。实际开发中大多数项目用不到完整的图数据库Neo4j更常见的方案是用结构化 JSON 语义向量的混合存储// 混合记忆结构结构化属性语义描述interfaceMemoryNode {id: string;type: person | project | preference | event;attributes: Recordstring, unknown;relations: Array{ type: string; // likes, works_on, knows_about targetId: string; }; embedding?: number[]; // 可选语义向量用于模糊检索}classStructuredMemoryStore {private nodes newMapstring, MemoryNode();// 存储或更新节点upsert(node: MemoryNode) { const existing this.nodes.get(node.id); if (existing) { // 合并属性不覆盖防止丢失旧信息 this.nodes.set(node.id, { ...existing, attributes: { ...existing.attributes, ...node.attributes }, relations: [...newSet([...existing.relations, ...node.relations])], }); } else { this.nodes.set(node.id, node); } }// 图遍历找到所有2跳以内的相关节点getRelated(nodeId: string, depth 2): MemoryNode[] { const visited newSetstring(); constresult: MemoryNode[] []; consttraverse (id: string, currentDepth: number) { if (currentDepth 0 || visited.has(id)) return; visited.add(id); const node this.nodes.get(id); if (!node) return; result.push(node); node.relations.forEach(({ targetId }) { traverse(targetId, currentDepth - 1); }); }; traverse(nodeId, depth); return result; }// 序列化为 LLM 可读的 contexttoContextString(nodeId: string): string { const related this.getRelated(nodeId); return related .map((n) [${n.type}] ${JSON.stringify(n.attributes)}) .join(\n); }}// 使用示例const memStore newStructuredMemoryStore();// 存入用户节点memStore.upsert({id: user_james,type: person,attributes: { name: James, role: frontend_developer },relations: [ { type: works_on, targetId: project_ai_assistant }, { type: prefers, targetId: tech_react }, ],});memStore.upsert({id: tech_react,type: preference,attributes: { name: React, category: frontend_framework },relations: [ { type: has_ecosystem, targetId: tech_vercel_ai_sdk }, ],});// Agent 回答时注入用户的知识图谱上下文const context memStore.toContextString(user_james);// 输出[person] {name:James,role:frontend_developer}// [preference] {name:React,category:frontend_framework}// ...核心知识图谱的价值在于「关系推理」混合结构化 JSON 向量是大多数项目的性价比最高方案。三层记忆的协作完整的记忆感知 Agent把三层记忆整合起来才是一个真正有记忆的 Agentimport { ChatOpenAI } fromlangchain/openai;import { SystemMessage, HumanMessage } fromlangchain/core/messages;classMemoryAwareAgent {private llm newChatOpenAI({ model: gpt-4o });privateshortTermHistory: Message[] [];privatelongTermStore: LongTermMemoryStore;privateknowledgeGraph: StructuredMemoryStore;constructor(privateuserId: string) { this.longTermStore newLongTermMemoryStore(); this.knowledgeGraph newStructuredMemoryStore(); }asyncchat(userInput: string): Promisestring { // Step 1: 检索长期记忆并行执行不阻塞 const [longTermMemories, graphContext] awaitPromise.all([ this.longTermStore.recall(userInput, this.userId), Promise.resolve(this.knowledgeGraph.toContextString(this.userId)), ]); // Step 2: 构建系统 prompt注入记忆层 const systemPrompt this.buildSystemPrompt(longTermMemories, graphContext); // Step 3: 管理短期记忆滑动窗口 const managedHistory awaitmanageHistory(this.shortTermHistory); // Step 4: 调用模型 const messages [ newSystemMessage(systemPrompt), ...managedHistory, newHumanMessage(userInput), ]; const response awaitthis.llm.invoke(messages); // Step 5: 更新短期记忆 this.shortTermHistory.push( { role: user, content: userInput }, { role: assistant, content: response.contentasstring } ); // Step 6: 异步决策是否存入长期记忆不影响响应速度 this.asyncSaveMemory(userInput, response.contentasstring); return response.contentasstring; }privatebuildSystemPrompt(memories: string[], graphCtx: string): string { const parts [你是一个有记忆的 AI 助手。]; if (graphCtx) { parts.push(\n关于用户你知道\n${graphCtx}); } if (memories.length 0) { parts.push(\n相关的历史记忆\n${memories.join(\n)}); } return parts.join(\n); }// 异步存储不阻塞当前响应privateasyncasyncSaveMemory(input: string, output: string) { // 让模型判断这轮对话是否值得记忆 const shouldSave awaitthis.llm.invoke([ newSystemMessage( 判断以下对话是否包含值得长期记忆的信息用户偏好/重要决策/事实只回答 yes 或 no ), newHumanMessage(用户${input}\n助手${output}), ]); if ((shouldSave.contentasstring).toLowerCase().includes(yes)) { awaitthis.longTermStore.save( 用户说${input}助手回答${output}, { type: conversation, importance: high, userId: this.userId } ); } }}核心三层记忆各司其职检索并行化存储异步化才能做到有记忆而不慢。常见坑坑1把所有对话都存进向量库// ❌ 每轮都存存了一堆没用的async function onMessage(msg: Message) { await vectorStore.addDocuments([new Document({ pageContent: msg.content })]);}向量库里全是「好的」「明白了」「那接下来呢」这类废话检索出来全是噪音。// ✅ 让模型判断是否值得存const worthSaving await llm.invoke([ new SystemMessage(这句话有没有值得记忆的关键信息yes/no), new HumanMessage(msg.content),]);if (worthSaving.content yes) { await vectorStore.addDocuments([...]);}坑2检索不区分用户// ❌ 全局检索用户A的记忆跑到用户B那里const results await vectorStore.similaritySearch(query, 5);生产环境里多用户共用一个向量库不加过滤条件会导致记忆错位。// ✅ 检索时带 userId 过滤const results await vectorStore.similaritySearch(query, 5, { filter: { userId: currentUserId },});坑3短期记忆直接截断丢失关键上下文// ❌ 超长就直接砍掉前面的if (messages.length 20) { messages messages.slice(-20); // 可能把任务背景砍掉了}用摘要压缩而不是硬截断// ✅ 超长时先摘要再拼接最近内容const summary await summarizeOlderMessages(messages.slice(0, -20));messages [new SystemMessage(历史摘要${summary}), ...messages.slice(-20)];坑4记忆注入太多反而稀释了 Prompt// ❌ 检索 Top 20 条全部塞进 promptconst memories await vectorStore.similaritySearch(query, 20);const systemPrompt 你知道${memories.join(\n)};// 20条记忆用户问题对话历史 Context 直接爆炸 plaintext // ✅ 控制注入数量只用相关性 0.8 的最多 5 条const results await vectorStore.similaritySearchWithScore(query, 5);const relevant results .filter(([_, score]) score 0.8) .map(([doc]) doc.pageContent);坑5忘记给记忆加过期机制记忆不应该永久有效。用户一年前说「我不喜欢 Vue」不代表现在还这样。// ✅ 存储时加 TTL检索时跳过过期记忆await vectorStore.addDocuments([newDocument({ pageContent: content, metadata: { timestamp: Date.now(), ttlDays: 90, // 90天后过期 }, }),]);// 检索时过滤只取 90 天内的记忆const cutoff Date.now() - 90 * 24 * 60 * 60 * 1000;const results await vectorStore.similaritySearch(query, 5, {filter: { timestamp: { $gt: cutoff } },});选型参考需求推荐方案备注单用户简单 chatbotMemoryVectorStoreLangChain 内存版开发测试用不持久化多用户生产环境Chroma / Pinecone / WeaviateChroma 本地部署友好Pinecone 托管省心需要关系推理Neo4j 向量混合复杂度高一般 Agent 不需要企业内网知识库pgvectorPostgreSQL 插件已有 PG 的团队最低迁移成本学AI大模型的正确顺序千万不要搞错了2026年AI风口已来各行各业的AI渗透肉眼可见超多公司要么转型做AI相关产品要么高薪挖AI技术人才机遇直接摆在眼前有往AI方向发展或者本身有后端编程基础的朋友直接冲AI大模型应用开发转岗超合适就算暂时不打算转岗了解大模型、RAG、Prompt、Agent这些热门概念能上手做简单项目也绝对是求职加分王给大家整理了超全最新的AI大模型应用开发学习清单和资料手把手帮你快速入门学习路线:✅大模型基础认知—大模型核心原理、发展历程、主流模型GPT、文心一言等特点解析✅核心技术模块—RAG检索增强生成、Prompt工程实战、Agent智能体开发逻辑✅开发基础能力—Python进阶、API接口调用、大模型开发框架LangChain等实操✅应用场景开发—智能问答系统、企业知识库、AIGC内容生成工具、行业定制化大模型应用✅项目落地流程—需求拆解、技术选型、模型调优、测试上线、运维迭代✅面试求职冲刺—岗位JD解析、简历AI项目包装、高频面试题汇总、模拟面经以上6大模块看似清晰好上手实则每个部分都有扎实的核心内容需要吃透我把大模型的学习全流程已经整理好了抓住AI时代风口轻松解锁职业新可能希望大家都能把握机遇实现薪资/职业跃迁这份完整版的大模型 AI 学习资料已经上传CSDN朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费】

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2524670.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！