Elasticsearch教程之七:Elasticsearch中的聚合功能(aggregation)


Elasticsearch可以说是企业级搜索应用的最佳选择。相较于传统的关系型数据库,Elasticsearch更适合于存储海量数据并对其进行全文检索和数据分析(聚合)。下面介绍如何使用Elasticsearch的孪生兄弟AWS OpenSearch中的聚合功能(aggregation)来实现facet。

Elasticsearch教程
Elasticsearch教程

Elasticsearch系列教程

配置相应的索引字段

第一个要解决的问题就是确保相应的字段类型是可以进行聚合操作的。比如:如果字段类型是text,可进行分词,但不能进行聚合。可以通过如下方式设置索引的mapping:

json
PUT testfacet/_mapping
{
  "properties" : {
    "score" : {
      "type" : "integer"
    },
    "name" : {
      "type" : "keyword"
    },
    "subject" : {
      "type" : "keyword"
    }
  }
}

其中mapping的类型可以是基本类型,也可以是复杂类型,比如数组和对象。

聚合查询

针对单一字段进行聚合查询

有了上面的mapping,就可以进行聚合查询了:

json
GET testfacet/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
      "categories": {
          "terms": {
              "field": "subject",
              "size": 10
          }
      }
  }  
}

在默认情况下,查询结果会返回前10条term,可通过更改size来确定返回的term数量

其返回结果不仅包括对应的文档,同时还包含聚合信息:

json
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "FYANK30BOfUMQjxHKrbP",
        "_score" : 1.0,
        "_source" : {
          "name" : "Jack",
          "subject" : "Math",
          "score" : 19
        }
      },
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "FoANK30BOfUMQjxHVbaB",
        "_score" : 1.0,
        "_source" : {
          "name" : "Paul",
          "subject" : "Math",
          "score" : 19
        }
      },
      {
        "_index" : "testfacet",
        "_type" : "_doc",
        "_id" : "F4ANK30BOfUMQjxHjrbr",
        "_score" : 1.0,
        "_source" : {
          "name" : "Lucy",
          "subject" : "Physics",
          "score" : 20
        }
      }
    ]
  },
  "aggregations" : {
    "categories" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Math",
          "doc_count" : 2
        },
        {
          "key" : "Physics",
          "doc_count" : 1
        }
      ]
    }
  }
}

多字段聚合查询

json
GET testfacet/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
      "subject_facet": {
        "terms": 
          {
            "field": "subject",
            "size": 10
          }
      },
      "name_facet": {
        "terms": 
          {
            "field": "name",
            "size": 10
          }
      },
      "score_ranges": {
        "range": {
          "field": "score",
          "ranges": [
            { "to": 80 },
            { "from": 80, "to": 90 },
            { "from": 90 }
          ]
        }
    }
      
  }  
}

其返回结果中的聚合部分就比较有用了,可以作为下一步前端中显示facet的数据源:

json
"aggregations" : {
    "subject_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Math",
          "doc_count" : 3
        },
        {
          "key" : "English",
          "doc_count" : 2
        },
        {
          "key" : "Physics",
          "doc_count" : 1
        },
        {
          "key" : "computing",
          "doc_count" : 1
        }
      ]
    },
    "score_ranges" : {
      "buckets" : [
        {
          "key" : "*-80.0",
          "to" : 80.0,
          "doc_count" : 3
        },
        {
          "key" : "80.0-90.0",
          "from" : 80.0,
          "to" : 90.0,
          "doc_count" : 2
        },
        {
          "key" : "90.0-*",
          "from" : 90.0,
          "doc_count" : 2
        }
      ]
    },
    "name_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Amy",
          "doc_count" : 3
        },
        {
          "key" : "Jack",
          "doc_count" : 1
        },
        {
          "key" : "Jim",
          "doc_count" : 1
        },
        {
          "key" : "Lucas",
          "doc_count" : 1
        },
        {
          "key" : "Lucy",
          "doc_count" : 1
        }
      ]
    }
}

嵌套facet的实现机制

在ES官网的这个页面说的很清楚:https://www.elastic.co/guide/en/app-search/current/hierarchical-facets-guide.html

说简单些:

  • 首先针对用户搜索关键字,返回第一个级别的facet列表
  • 当用户选择了某个(可以多选)一级facet项目时,重新构建搜索并将用户选择的一级facet项目作为filter,进行筛选
  • 显示二级facet,重复上面过程,继续细化搜索范围
  • 同理,可以一直这样进行下去,到三级,四级等等。

文章作者: 逻思
版权声明: 本博客所有文章除特別声明外,均采用 CC BY-NC-ND 4.0 许可协议。转载请注明来源 逻思 !