文章目录
- 官方文档
 - Bucket Selector
 - 1. 定义
 - 2. 工作原理
 - 3. 使用场景与示例
 - 使用场景
 - 官方案例
 - 示例2
 
- 4. 注意事项
 - 5. 总结
 

官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html


Bucket Selector
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-selector-aggregation.html


 
1. 定义
bucket_selector 是 ElasticSearch 中的一种聚合管道(Pipeline Aggregation),用于对已生成的聚合桶(bucket)进行后处理。它允许根据聚合结果动态过滤和选择桶,从而帮助用户更精确地控制查询结果。
2. 工作原理
- 输入:
bucket_selector接受一个或多个聚合桶作为输入。 - 脚本:通过脚本计算每个桶的条件,例如根据聚合值判断是否保留该桶。
 - 输出:返回符合条件的桶,其余的桶将被过滤掉。
 
buckets_path:用于指定脚本中的变量和对应的聚合路径。例如,可以定义一个路径映射,将某个子聚合的结果(如总和或计数)传递给变量。script:通过脚本来判断某个桶是否保留。脚本语言通常为Painless,脚本必须返回一个布尔值,true表示保留该桶,false表示过滤掉
3. 使用场景与示例
使用场景
- 当需要根据聚合的统计数据来决定显示哪些结果时。
 - 可以在进行多层聚合时,优化最终返回的桶。
 
官方案例
以下是一个实际应用:假设想筛选出每月销售额超过200的结果。可以使用如下DSL查询:
PUT /sales
{
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      },
      "price": {
        "type": "double"
      }
    }
  }
}
POST /sales/_bulk
{ "index": { "_id": "1" } }
{ "date": "2023-01-01", "price": 100.0 }
{ "index": { "_id": "2" } }
{ "date": "2023-01-15", "price": 150.0 }
{ "index": { "_id": "3" } }
{ "date": "2023-02-01", "price": 300.0 }
{ "index": { "_id": "4" } }
{ "date": "2023-02-10", "price": 50.0 }
{ "index": { "_id": "5" } }
{ "date": "2023-03-01", "price": 400.0 }
{ "index": { "_id": "6" } }
{ "date": "2023-03-15", "price": 250.0 }
{ "index": { "_id": "7" } }
{ "date": "2023-04-01", "price": 350.0 }
{ "index": { "_id": "8" } }
{ "date": "2023-04-10", "price": 200.0 }
{ "index": { "_id": "9" } }
{ "date": "2023-05-01", "price": 500.0 }
{ "index": { "_id": "10" } }
{ "date": "2023-05-15", "price": 300.0 }
POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "price"
          }
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "total_sales"
            },
            "script": "params.totalSales > 600"
          }
        }
      }
    }
  }
}
 
这个DSL是一个用于Elasticsearch的聚合查询,目的是统计每个月的销售总额,并筛选出销售总额超过600的月份。下
size: 0
- 作用:表示在查询结果中不返回文档本身,只返回聚合结果。这通常用于只关心聚合结果而不需要实际数据时。
 
aggs
- 作用:定义聚合操作的容器。这里的聚合是为了对销售数据进行统计。
 
sales_per_month
- 类型:
date_histogram- 功能:将数据按照时间段(这里是月份)进行分组。
 - 参数: 
    
field: 指定用于分组的日期字段,这里是date。calendar_interval: 设置时间间隔,这里是按月进行分组。
 
 
total_sales
- 类型:
sum- 功能:对每个月的销售额进行求和。
 - 参数: 
    
field: 指定需要求和的字段,这里是price。每个月的销售总额将被计算并存储在这个聚合中。
 
 
sales_bucket_filter
- 类型:
bucket_selector- 功能:用于根据特定条件过滤桶(即月份)。
 - 参数: 
    
buckets_path: 定义如何访问其他聚合的结果,这里指定了totalSales对应于total_sales聚合。script: 使用Painless脚本来判断是否保留桶。这里的条件是params.totalSales > 600,这意味着只有当某个月的总销售额超过600时,该月份的桶才会被保留。
 
 
总结
这个DSL的整体作用是:
- 按月份对销售数据进行分组。
 - 计算每个月的销售总额。
 - 只保留销售总额超过600的月份结果。
 
最终,这将返回一个包含销售额超过600的月份的聚合结果,方便进行数据分析。
 
示例2
假设有一个日志索引,包含用户活动的文档,我们想要按用户分组并计算每个用户的活动次数,但只希望返回活动次数超过10的用户。
 
PUT /user_activity
{
  "mappings": {
    "properties": {
      "user_id": {
        "type": "keyword"
      },
      "activity": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}
POST /user_activity/_bulk
{ "index": { "_id": "1" } }
{ "user_id": "user1", "activity": "login", "timestamp": "2024-10-01T10:00:00Z" }
{ "index": { "_id": "2" } }
{ "user_id": "user1", "activity": "view_page", "timestamp": "2024-10-01T10:05:00Z" }
{ "index": { "_id": "3" } }
{ "user_id": "user1", "activity": "logout", "timestamp": "2024-10-01T10:10:00Z" }
{ "index": { "_id": "4" } }
{ "user_id": "user1", "activity": "login", "timestamp": "2024-10-01T11:00:00Z" }
{ "index": { "_id": "5" } }
{ "user_id": "user1", "activity": "view_page", "timestamp": "2024-10-01T11:05:00Z" }
{ "index": { "_id": "6" } }
{ "user_id": "user1", "activity": "logout", "timestamp": "2024-10-01T11:10:00Z" }
{ "index": { "_id": "7" } }
{ "user_id": "user2", "activity": "login", "timestamp": "2024-10-01T12:00:00Z" }
{ "index": { "_id": "8" } }
{ "user_id": "user2", "activity": "upload_file", "timestamp": "2024-10-01T12:15:00Z" }
{ "index": { "_id": "9" } }
{ "user_id": "user3", "activity": "logout", "timestamp": "2024-10-01T12:30:00Z" }
{ "index": { "_id": "10" } }
{ "user_id": "user4", "activity": "login", "timestamp": "2024-10-01T13:00:00Z" }
{ "index": { "_id": "11" } }
{ "user_id": "user4", "activity": "logout", "timestamp": "2024-10-01T13:05:00Z" }
{ "index": { "_id": "12" } }
{ "user_id": "user5", "activity": "login", "timestamp": "2024-10-01T14:00:00Z" }
{ "index": { "_id": "13" } }
{ "user_id": "user5", "activity": "view_page", "timestamp": "2024-10-01T14:10:00Z" }
{ "index": { "_id": "14" } }
{ "user_id": "user5", "activity": "logout", "timestamp": "2024-10-01T14:15:00Z" }
{ "index": { "_id": "15" } }
{ "user_id": "user6", "activity": "login", "timestamp": "2024-10-01T15:00:00Z" }
{ "index": { "_id": "16" } }
{ "user_id": "user6", "activity": "view_page", "timestamp": "2024-10-01T15:05:00Z" }
{ "index": { "_id": "17" } }
{ "user_id": "user6", "activity": "logout", "timestamp": "2024-10-01T15:10:00Z" }
{ "index": { "_id": "18" } }
{ "user_id": "user7", "activity": "login", "timestamp": "2024-10-01T16:00:00Z" }
{ "index": { "_id": "19" } }
{ "user_id": "user7", "activity": "upload_file", "timestamp": "2024-10-01T16:05:00Z" }
{ "index": { "_id": "20" } }
{ "user_id": "user7", "activity": "logout", "timestamp": "2024-10-01T16:10:00Z" }
{ "index": { "_id": "21" } }
{ "user_id": "user8", "activity": "login", "timestamp": "2024-10-01T17:00:00Z" }
{ "index": { "_id": "22" } }
{ "user_id": "user8", "activity": "view_page", "timestamp": "2024-10-01T17:05:00Z" }
{ "index": { "_id": "23" } }
{ "user_id": "user8", "activity": "logout", "timestamp": "2024-10-01T17:10:00Z" }
{ "index": { "_id": "24" } }
{ "user_id": "user9", "activity": "login", "timestamp": "2024-10-01T18:00:00Z" }
{ "index": { "_id": "25" } }
{ "user_id": "user9", "activity": "view_page", "timestamp": "2024-10-01T18:05:00Z" }
{ "index": { "_id": "26" } }
{ "user_id": "user9", "activity": "logout", "timestamp": "2024-10-01T18:10:00Z" }
{ "index": { "_id": "27" } }
{ "user_id": "user10", "activity": "login", "timestamp": "2024-10-01T19:00:00Z" }
{ "index": { "_id": "28" } }
{ "user_id": "user10", "activity": "view_page", "timestamp": "2024-10-01T19:05:00Z" }
{ "index": { "_id": "29" } }
{ "user_id": "user10", "activity": "logout", "timestamp": "2024-10-01T19:10:00Z" }
{ "index": { "_id": "30" } }
{ "user_id": "user11", "activity": "login", "timestamp": "2024-10-01T20:00:00Z" }
{ "index": { "_id": "31" } }
{ "user_id": "user11", "activity": "view_page", "timestamp": "2024-10-01T20:05:00Z" }
{ "index": { "_id": "32" } }
{ "user_id": "user11", "activity": "logout", "timestamp": "2024-10-01T20:10:00Z" }
{ "index": { "_id": "33" } }
{ "user_id": "user12", "activity": "login", "timestamp": "2024-10-01T21:00:00Z" }
{ "index": { "_id": "34" } }
{ "user_id": "user12", "activity": "upload_file", "timestamp": "2024-10-01T21:05:00Z" }
{ "index": { "_id": "35" } }
{ "user_id": "user12", "activity": "logout", "timestamp": "2024-10-01T21:10:00Z" }
{ "index": { "_id": "36" } }
{ "user_id": "user13", "activity": "login", "timestamp": "2024-10-01T22:00:00Z" }
{ "index": { "_id": "37" } }
{ "user_id": "user13", "activity": "view_page", "timestamp": "2024-10-01T22:05:00Z" }
{ "index": { "_id": "38" } }
{ "user_id": "user13", "activity": "logout", "timestamp": "2024-10-01T22:10:00Z" }
{ "index": { "_id": "39" } }
{ "user_id": "user14", "activity": "login", "timestamp": "2024-10-01T23:00:00Z" }
{ "index": { "_id": "40" } }
{ "user_id": "user14", "activity": "view_page", "timestamp": "2024-10-01T23:05:00Z" }
{ "index": { "_id": "41" } }
{ "user_id": "user14", "activity": "logout", "timestamp": "2024-10-01T23:10:00Z" }
{ "index": { "_id": "42" } }
{ "user_id": "user15", "activity": "login", "timestamp": "2024-10-02T00:00:00Z" }
{ "index": { "_id": "43" } }
{ "user_id": "user15", "activity": "upload_file", "timestamp": "2024-10-02T00:05:00Z" }
{ "index": { "_id": "44" } }
{ "user_id": "user15", "activity": "logout", "timestamp": "2024-10-02T00:10:00Z" }
{ "index": { "_id": "45" } }
{ "user_id": "user16", "activity": "login", "timestamp": "2024-10-02T01:00:00Z" }
{ "index": { "_id": "46" } }
{ "user_id": "user16", "activity": "view_page", "timestamp": "2024-10-02T01:05:00Z" }
{ "index": { "_id": "47" } }
{ "user_id": "user16", "activity": "logout", "timestamp": "2024-10-02T01:10:00Z" }
{ "index": { "_id": "48" } }
{ "user_id": "user17", "activity": "login", "timestamp": "2024-10-02T02:00:00Z" }
{ "index": { "_id": "49" } }
{ "user_id": "user17", "activity": "upload_file", "timestamp": "2024-10-02T02:05:00Z" }
{ "index": { "_id": "50" } }
{ "user_id": "user17", "activity": "logout", "timestamp": "2024-10-02T02:10:00Z" }
POST /user_activity/_search
{
  "size": 0,
  "aggs": {
    "users": {
      "terms": {
        "field": "user_id"
      },
      "aggs": {
        "activity_count": {
          "value_count": {
            "field": "activity"
          }
        },
        "filtered_users": {
          "bucket_selector": {
            "buckets_path": {
              "count": "activity_count"
            },
            "script": "params.count > 5"
          }
        }
      }
    }
  }
}
 
- 总体结构
 
- 请求类型:
POST /user_activity/_search这表示我们正在对user_activity索引执行搜索请求。 - size:设置为 
0,表示不返回文档,仅返回聚合结果。 
- 聚合部分 (
 aggs)
- users:这是一个聚合的名称,用于聚合结果的分组。 
  
- terms:这是一个聚合类型,表示我们希望按 
user_id字段对文档进行分组。每个不同的user_id将生成一个桶。 
 - terms:这是一个聚合类型,表示我们希望按 
 
- 嵌套聚合
 
-  
activity_count:
- value_count:这是另一个聚合,计算每个用户桶中 
activity字段的文档数量。换句话说,它统计每个用户的活动次数。 
 - value_count:这是另一个聚合,计算每个用户桶中 
 -  
filtered_users:
- bucket_selector:这是一个聚合管道,允许对已生成的桶进行后处理。
 - buckets_path:这是一个路径定义,指定了输入到 
bucket_selector的聚合结果。在这里,count被定义为指向activity_count聚合的结果。 - script:这是一个条件语句,用于筛选桶。在这里,脚本判断活动次数是否大于5。只有符合这个条件的用户桶才会被保留。
 
 
总结
该DSL的主要功能是:
- 按 
user_id聚合用户活动数据。 - 计算每个用户的活动数量。
 - 仅筛选出活动次数大于5的用户。

 
4. 注意事项
- 由于
bucket_selector是在所有聚合完成后执行的,它不能减少执行时间。 - 在一些复杂场景中(如使用
cardinality聚合),需要确保将bucket_selector放置在正确的父级聚合中,否则会出现不兼容的错误 
5. 总结
bucket_selector 是 ElasticSearch 中用于后处理聚合桶的强大工具,可以根据聚合计算的结果动态地筛选出感兴趣的桶。它在数据分析和可视化中非常有用,可以帮助用户精准控制查询结果。










![[FE] React 初窥门径(四):React 组件的加载过程(render 阶段)](https://img-blog.csdnimg.cn/direct/67c64049147741939b85489caefbb597.png)








