Elasticsearch教程之七：Elasticsearch中的聚合功能(aggregation)

教程

发布日期: 2021-11-16

Elasticsearch可以说是企业级搜索应用的最佳选择。相较于传统的关系型数据库，Elasticsearch更适合于存储海量数据并对其进行全文检索和数据分析（聚合）。下面介绍如何使用Elasticsearch的孪生兄弟AWS OpenSearch中的聚合功能(aggregation)来实现facet。

Elasticsearch教程

Elasticsearch系列教程

配置相应的索引字段

第一个要解决的问题就是确保相应的字段类型是可以进行聚合操作的。比如：如果字段类型是text，可进行分词，但不能进行聚合。可以通过如下方式设置索引的mapping：

PUT testfacet/_mapping
{
  "properties" : {
    "score" : {
      "type" : "integer"
    },
    "name" : {
      "type" : "keyword"
    },
    "subject" : {
      "type" : "keyword"
    }
  }
}

其中mapping的类型可以是基本类型，也可以是复杂类型，比如数组和对象。

聚合查询

针对单一字段进行聚合查询

有了上面的mapping，就可以进行聚合查询了：

GET testfacet/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
      "categories": {
          "terms": {
              "field": "subject",
              "size": 10
          }
      }
  }  
}

在默认情况下，查询结果会返回前10条term，可通过更改size来确定返回的term数量。

其返回结果不仅包括对应的文档，同时还包含聚合信息：

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "FYANK30BOfUMQjxHKrbP",
        "_score" : 1.0,
        "_source" : {
          "name" : "Jack",
          "subject" : "Math",
          "score" : 19
        }
      },
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "FoANK30BOfUMQjxHVbaB",
        "_score" : 1.0,
        "_source" : {
          "name" : "Paul",
          "subject" : "Math",
          "score" : 19
        }
      },
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "F4ANK30BOfUMQjxHjrbr",
        "_score" : 1.0,
        "_source" : {
          "name" : "Lucy",
          "subject" : "Physics",
          "score" : 20
        }
      }
    ]
  },
  "aggregations" : {
    "categories" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Math",
          "doc_count" : 2
        },
        {
          "key" : "Physics",
          "doc_count" : 1
        }
      ]
    }
  }
}

多字段聚合查询

GET testfacet/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
      "subject_facet": {
        "terms": 
          {
            "field": "subject",
            "size": 10
          }
      },
      "name_facet": {
        "terms": 
          {
            "field": "name",
            "size": 10
          }
      },
      "score_ranges": {
        "range": {
          "field": "score",
          "ranges": [
            { "to": 80 },
            { "from": 80, "to": 90 },
            { "from": 90 }
          ]
        }
    }
      
  }  
}

其返回结果中的聚合部分就比较有用了，可以作为下一步前端中显示facet的数据源：

"aggregations" : {
    "subject_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Math",
          "doc_count" : 3
        },
        {
          "key" : "English",
          "doc_count" : 2
        },
        {
          "key" : "Physics",
          "doc_count" : 1
        },
        {
          "key" : "computing",
          "doc_count" : 1
        }
      ]
    },
    "score_ranges" : {
      "buckets" : [
        {
          "key" : "*-80.0",
          "to" : 80.0,
          "doc_count" : 3
        },
        {
          "key" : "80.0-90.0",
          "from" : 80.0,
          "to" : 90.0,
          "doc_count" : 2
        },
        {
          "key" : "90.0-*",
          "from" : 90.0,
          "doc_count" : 2
        }
      ]
    },
    "name_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Amy",
          "doc_count" : 3
        },
        {
          "key" : "Jack",
          "doc_count" : 1
        },
        {
          "key" : "Jim",
          "doc_count" : 1
        },
        {
          "key" : "Lucas",
          "doc_count" : 1
        },
        {
          "key" : "Lucy",
          "doc_count" : 1
        }
      ]
    }
}