从零开始搭建向量数据库：基于 Xinference 和 Milvus 的文本搜索实践

news2026/4/2 6:15:26

引言

在 AI 和大数据时代，向量数据库正成为处理非结构化数据（如文本、图像）的利器。最近，我尝试用 Xinference 和 Milvus 搭建一个简单的文本搜索系统，从读取本地文本文件到实现交互式查询和高亮显示匹配结果，整个过程充满了挑战和乐趣。这篇文章将分享我的实践步骤、技术难点及解决方案，希望能帮助有同样兴趣的朋友少走弯路。

项目目标

我的目标是：

从本地 document.txt 文件读取多条文本数据（例如 name:liubao,age:32）。
使用 Xinference生成文本的嵌入向量。
将向量存储到 Milvus中，构建向量数据库。
实现交互式查询功能，返回相似文本并高亮匹配部分。

硬件环境：一台普通的 Windows 电脑（16GB 内存，无 GPU），纯 CPU 运行。16G很勉强，idea直接就干你一半内存，你气不气

技术选型

Xinference：一个轻量级推理框架，支持多种嵌入模型，我选择了 bge-small-en-v1.5，适合 CPU 环境。
Milvus：开源向量数据库，用于存储和搜索嵌入向量。
Python：核心编程语言，搭配 requests、pymilvus 和 colorama 等库。
Docker：运行 Milvus 服务。

实现步骤

1. 环境搭建

安装 Conda 和 Python：我创建了一个名为 xiangliang 的 Conda 环境，使用 Python 3.10。
```
conda create -n xiangliang python=3.10
conda activate xiangliang
```
安装 Xinference：
```
pip install xinference
```
注意：我原本也尝试用 Docker 运行 Xinference 时遇到启动问题，最终切换到本地 Conda 部署。
安装 Milvus：使用 Docker 部署 standalone 版本：
去下载docker-compose.yml，注意重命名改成docker-compose.yml，然后运行
```
docker compose up -d
#如果你是老版本的用docker-compose up -d
```
下载attu可视化向量数据库管理工具:Releases · zilliztech/attu · GitHub
默认直接登录就行

安装依赖：

pip install pymilvus requests torch --index-url https://download.pytorch.org/whl/cpu
pip install transformers colorama

2. 数据准备

我创建了一个 document.txt 文件，包含 10 条测试数据，例如：

name:liubao,age:32
name:zhangwei,age:25
name:lihua,age:40
name:wangming,age:28
name:chenxi,age:35
name:yangyang,age:22
name:zhaojie,age:45
name:liuyi,age:30
name:sunhao,age:27
name:zhouqi,age:33

这些数据模拟了简单的个人信息，用于测试搜索效果。

3. 初始化向量数据库

脚本 test.py 负责读取文件、生成向量并存储到 Milvus：

import requests
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
import re

# 读取文件
with open("document.txt", "r", encoding="utf-8") as file:
    lines = file.readlines()
texts = [re.sub(r'\s+', ' ', line).strip() for line in lines if line.strip()]

# Xinference 生成向量
model_url = "http://localhost:9997/v1/models"
payload = {"model_name": "bge-small-en-v1.5", "model_type": "embedding"}
response = requests.post(model_url, json=payload)
model_uid = response.json()["model_uid"]
embed_url = "http://localhost:9997/v1/embeddings"
embeddings = [requests.post(embed_url, json={"model": model_uid, "input": text}).json()["data"][0]["embedding"] for text in texts]

# 存储到 Milvus
connections.connect(host='localhost', port='19530')
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535)
]
collection = Collection(name="text_collection", schema=CollectionSchema(fields=fields))
collection.drop()  # 清理旧数据
collection = Collection(name="text_collection", schema=CollectionSchema(fields=fields))
ids = list(range(1, len(texts) + 1))
collection.insert([ids, embeddings, texts])
collection.create_index("embedding", {"metric_type": "COSINE", "index_type": "IVF_FLAT", "params": {"nlist": 1024}})
collection.load()

print("Inserted", collection.num_entities, "entities")

运行前启动 Xinference：

xinference-local

在xiangliang的虚拟环境中运行python test.py正常情况下会生成数据到向量数据库中

4. 实现交互式查询和高亮

脚本 query.py 提供交互式搜索功能，并高亮匹配结果：

import requests
from pymilvus import connections, Collection
import re
from colorama import init, Fore, Style

init()  # 初始化 colorama

connections.connect(host='localhost', port='19530')
collection = Collection(name="text_collection")
collection.load()

model_uid = "bge-small-en-v1.5-a5JDNlUy"
embed_url = "http://localhost:9997/v1/embeddings"

def highlight_match(text, query):
    pattern = re.compile(re.escape(query), re.IGNORECASE)
    return pattern.sub(f"{Fore.RED}{Style.BRIGHT}\\g<0>{Style.RESET_ALL}", text)

def search_query(query_text):
    payload = {"model": model_uid, "input": query_text}
    query_embedding = requests.post(embed_url, json=payload).json()["data"][0]["embedding"]
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"nprobe": 10}},
        limit=5,
        output_fields=["text"]
    )
    threshold = 0.7
    found = False
    for result in results[0]:
        similarity = result.distance
        if similarity >= threshold:
            text = result.entity.get("text")
            highlighted_text = highlight_match(text, query_text)
            print(f"Similarity: {similarity:.4f}, Text: {highlighted_text}...")
            found = True
    if not found:
        print(f"没有找到相似度高于 {threshold} 的结果")

while True:
    query = input("请输入查询词（输入 'exit' 退出）：")
    if query.lower() == "exit":
        break
    search_query(query)

可以看到根据搜索词，分值高的在前，匹配上的高亮了

5. 技术难点与解决方案

Xinference 模型选择：最初尝试用 all-MiniLM-L6-v2，但 Xinference 不支持，最终改为内置的 bge-small-en-v1.5。
连接问题：多次遇到 ConnectionRefusedError，通过确保 xinference-local 运行并检查端口解决。
高亮显示：Windows CMD 不支持 ANSI 码，引入 colorama 实现跨平台兼容。
单一结果问题：初始数据只有一条，查询总是返回它，后来增加了多条数据并设置相似度阈值。