理解 Elasticsearch 的评分机制和 Explain API

news2025/7/17 3:00:21

作者：来自 Elastic Kofi Bartlett

深入了解 Elasticsearch 的评分机制并探索 Explain API。

想获得 Elastic 认证吗？查看下一期 Elasticsearch Engineer 培训的时间！

Elasticsearch 拥有大量新功能，帮助你为你的使用场景构建最佳搜索方案。深入学习我们的示例 notebook，了解更多信息，开始免费的 cloud 试用，或者现在就在你的本地机器上尝试 Elastic 吧。

Elasticsearch 是一个强大的搜索引擎，它通过为索引中的每个文档计算一个评分来提供快速且相关的搜索结果。这个评分是在确定搜索结果排序时的关键因素。本文将深入探讨 Elasticsearch 的评分机制，并介绍有助于理解评分过程的 Explain API。

更多阅读，请参阅文章 “Elasticsearch：Explain API - 如何计算分数”。

Elasticsearch 中的评分机制

Elasticsearch 默认使用一种称为实际评分函数（BM25）的评分模型。这个模型基于概率信息检索理论，并考虑以下因素：词频、逆文档频率和字段长度归一化。我们简单介绍这些因素：

词频（TF）：表示一个词在文档中出现的次数。词频越高，说明这个词与该文档的关系越强。
逆文档频率（IDF）：衡量一个词在整个文档集合中的重要性。出现在很多文档中的词被认为不太重要，而出现在较少文档中的词则更重要。
字段长度归一化：考虑词出现的字段长度。较短字段的词会被赋予更高权重，因为在较短字段中出现的词更具代表性。

更多了解 TF/IDF，请阅读文章 “Elasticsearch：分布式计分 - TF-IDF”。

使用 Explain API

Elasticsearch 中的 Explain API 是一个理解评分过程的重要工具。它提供了关于某个特定文档的评分是如何计算的详细解释。要使用 Explain API，需要发送一个 GET 请求到以下端点：

GET /<index>/_explain/<document_id>

在请求体中，你需要提供想要了解评分的查询。以下是一个示例：

{
  "query": {
    "match": {
      "title": "elasticsearch"
    }
  }
}

Explain API 的响应将包括评分过程的详细拆解，包括各个因素（TF、IDF 和字段长度归一化）及其对最终评分的贡献。以下是一个示例响应：

{
  "_index": "example_index",
  "_type": "_doc",
  "_id": "1",
  "matched": true,
  "explanation": {
    "value": 1.2,
    "description": "weight(title:elasticsearch in 0) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 1.2,
        "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
        "details": [
          {
            "value": 2.2,
            "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
            "details": [
              {
                "value": 1,
                "description": "docFreq",
                "details": []
              },
              {
                "value": 1,
                "description": "docCount",
                "details": []
              }
            ]
          },
          {
            "value": 0.5,
            "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
            "details": [
              {
                "value": 1,
                "description": "termFreq=1.0",
                "details": []
              },
              {
                "value": 1.2,
                "description": "parameter k1",
                "details": []
              },
              {
                "value": 0.75,
                "description": "parameter b",
                "details": []
              },
              {
                "value": 1,
                "description": "avgFieldLength",
                "details": []
              },
              {
                "value": 1,
                "description": "fieldLength",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

在这个示例中，响应显示评分 1.2 是 IDF 值（2.2）和 tfNorm 值（0.5）的乘积。这个详细解释有助于理解影响评分的因素，并可用于微调搜索的相关性。