目录
弃用Type
why
映射
查询 mapping of index
创建 index with mapping
添加 field with mapping
数据迁移
1.新建 一个 index with correct mapping
2.数据迁移 reindex data into that index
分词
POST _analyze
自定义词库
ik分词器
circuit_breaking_exception
弃用Type
ES 6.x 之前,Type 开始弃用
ES 7.x ,被弱化,仍支持
ES 8.x ,完全移除
弃用后,每个索引只包含一种文档类型
如果需要区分不同类型的文档,俩种方式:
- 创建不同的索引
- 在文档中添加自定义字段来实现。
why
Elasticsearch 的底层存储(Lucene)是基于索引的,而不是基于 Type 的。
在同一个索引中,不同 Type 的文档可能具有相同名称但不同类型的字段,这种字段类型冲突会导致数据不一致和查询错误。
GET /bank/_search
{
"query": {
"match": {
"address": "mill lane"
}
},
"_source": ["account_number","address"]
}
从查询语句可以看出,查询是基于index的,不会去指定type。如果有不同type的address,就会引起查询冲突。
映射
Mapping 定义 doc和field 如何被存储和被检索
Mapping(映射) 是 Elasticsearch 中用于定义文档结构和字段类型的机制。它类似于关系型数据库中的表结构(Schema),用于描述文档中包含哪些字段、字段的数据类型(如文本、数值、日期等),以及字段的其他属性(如是否分词、是否索引等)。
Mapping 是 Elasticsearch 的核心概念之一,它决定了数据如何被存储、索引和查询。
查询 mapping of index
_mapping
GET /bank/_mapping
{
"bank" : {
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"employer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
- text 可以添加子field ---keyword,类型是 keyword。keyword存储精确值
创建 index with mapping
Put /{indexName}
Put /my_index
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"city": {
"type": "keyword"
}
}
}
}
添加 field with mapping
- PUT /{indexName}/_mapping + mapping.properties请求体
PUT /my_index/_mapping
{
"properties": {
"state": {
"type": "keyword",
"index": false
}
}
}
- "index": false 该字段无法被索引,不会参与检索 默认true
数据迁移
ES不支持修改已存在的mapping。若想更新已存在的mapping,就要进行数据迁移。
1.新建 一个 index with correct mapping
PUT /my_bank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "keyword"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
2.数据迁移 reindex data into that index
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "my_bank"
}
}
- ES 8.0 弃用type参数
分词
将文本拆分为单个词项(tokens)
POST _analyze
标准分词器
POST _analyze
{
"analyzer": "standard",
"text": ["it's test data","hello world"]
}
Response
{
"tokens" : [
{
"token" : "it's",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "test",
"start_offset" : 5,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "data",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "hello",
"start_offset" : 15,
"end_offset" : 20,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "world",
"start_offset" : 21,
"end_offset" : 26,
"type" : "<ALPHANUM>",
"position" : 4
}
]
}
自定义词库
nginx/html目录下 创建es/term.text,添加词条
配置ik远程词库,/elasticsearch/config/analysis-ik/IKAnalyzer.cfg.xml
测试
POST _analyze
{
"analyzer": "ik_smart",
"text": "尚硅谷项目谷粒商城"
}
[尚硅谷,谷粒商城]为term.text词库中的词条
Response
{
"tokens" : [
{
"token" : "尚硅谷",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "项目",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "谷粒商城",
"start_offset" : 5,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 2
}
]
}
ik分词器
中文分词
github地址
https://github.com/infinilabs/analysis-ik
下载地址
bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/7.4.2
进入docker容器ES 下载 ik 插件
卸载插件
elasticsearch-plugin remove analysis-ik
测试
POST _analyze
{
"analyzer": "ik_smart",
"text": "我要成为java高手"
}
Response
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "要",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "成为",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "java",
"start_offset" : 4,
"end_offset" : 8,
"type" : "ENGLISH",
"position" : 3
},
{
"token" : "高手",
"start_offset" : 8,
"end_offset" : 10,
"type" : "CN_WORD",
"position" : 4
}
]
}
circuit_breaking_exception
熔断器机制被触发
{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [124604192/118.8mb], which is larger than the limit of [123273216/117.5mb], real usage: [124604192/118.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=1788/1.7kb, in_flight_requests=0/0b, accounting=225547/220.2kb]",
"bytes_wanted": 124604192,
"bytes_limit": 123273216,
"durability": "PERMANENT"
}
],
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [124604192/118.8mb], which is larger than the limit of [123273216/117.5mb], real usage: [124604192/118.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=1788/1.7kb, in_flight_requests=0/0b, accounting=225547/220.2kb]",
"bytes_wanted": 124604192,
"bytes_limit": 123273216,
"durability": "PERMANENT"
},
"status": 429
}
查看ES日志
docker logs elasticsearch
检查 Elasticsearch 的内存使用情况
GET /_cat/nodes?v&h=name,heap.percent,ram.percent
-
如果
heap.percent
或ram.percent
接近 100%,说明内存不足。
增加 Elasticsearch 堆内存
删除并重新创建容器 调整 -Xms
和 -Xmx
参数 256m
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
> -e "discovery.type=single-node" \
> -e ES_JAVA_OPTS="-Xms64m -Xmx256m" \
> -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
> -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
> -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
> -d elasticsearch:7.4.2