书山有路勤为径,学海无涯苦作舟。

0%

ES 备忘

Date Histogram时区:

遇到一个按时间统计数据不一致的bug,怀疑是时区的问题,设置下时区即可。

查询语句中:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"query": {},
"aggs": {
"dateAgg": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d", //1s/1m/1h/1d/1w/1M/1q/1y
"format": "yyyy-MM-dd",
"time_zone": "+08:00" //时区设置
}
}
}
}

Java 中:

1
AggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("dateAgg").field("@timestamp").dateHistogramInterval(DateHistogramInterval.DAY).format("yyyy-MM-dd").timeZone(DateTimeZone.forID("Asia/Shanghai"));

参考资料:官方文档


Date Histogram统计后再去重

查询语句:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"aggs": {
"dateAgg": {
"date_histogram": {
"field": "@timestamp",
"interval": "1d",
"format": "yyyy-MM-dd",
"time_zone": "+08:00"
},
"aggs": {
"distinctUserIdAgg": {
"cardinality": {
"field": "user_id", //去重字段
"precision_threshold": 1000 //精度,范围0-40000
}
}
}
}
}

Java中:

1
2
CardinalityAggregationBuilder userIdAggregationBuilder = AggregationBuilders.cardinality("userIdAgg").field("user_id").precisionThreshold(40000);
aggregationBuilder.subAggregation(userIdAggregationBuilder);

参考资料:官方文档


根据分数区间统计

例如要统计班级学生考试分数哪个区间的人最多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"query": {},
"aggs": {
"scoreAgg": {
"histogram": {
"field": "score", //根据哪个字段划分区间
"interval": 10, //划分区间大小
"order": {
"_count": "desc" //排序
}
}
}
}
}

参考资料:官方文档


根据分数区间统计后再去重

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"query": {},
"aggs": {
"scoreAgg": {
"histogram": {
"field": "score",
"interval": 10,
"order": {
"_count": "desc"
}
},
"aggs": {
"userIdAgg": {
"cardinality": {
"field": "user_id",
"precision_threshold": 1000
}
}
}
}
}
}

Fielddata is disabled on text fields by default. Set fielddata=true on xxx in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memor

这个错误的官方解释在这里: Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

解决方案:

1
2
3
4
5
6
7
8
9
curl -XPUT 'http://{ip}:{port}/{index_name}/_mapping/{type_name}' -d '
{
"properties": {
"{field_name}": {
"type": "text",
"fielddata": true
}
}
}'

  • type_name:ES中type的值
  • field_name:要top统计字段的名字

参考资料:官方文档

逐鹿IT, 猛猛如玉 wechat
扫一扫关注我,有惊喜不迷路
(转载本站文章请注明作者和出处: 逐鹿IT 猛猛如玉
网址: https://amonxu.com 微信公众号: itcraft
可以请我喝瓶水吗:)