Elasticsearch中文分词+拼音分词无法生成一些相关的词汇拼音全拼

日期: 2024-04-06 11:11:06|浏览: 109|编号: 43209

友情提醒:信息内容由网友发布,请自鉴内容实用性。

Elasticsearch中文分词+拼音分词无法生成一些相关的词汇拼音全拼

中文分词+拼音分词无法生成一些相关的词汇拼音全拼

es版本:7.6.2 语言:php

这是自定义分词:

                    "analysis"=>[
                        "analyzer"=>[
                          "my_analyzer"=>[
                             "type"=>"custom",
                            "tokenizer"=>"ik_max_word",
                            "filter"=>"my_filter"
                          ]
                        ],
                        "filter"=>[
                          "my_filter"=>[
                            "type"=>"pinyin",
                            "first_letter" => "prefix",
                            "padding_char" => "",
                            "keep_separate_first_letter"=>false,
                            "keep_full_pinyin"=>true,
                            "keep_original"=>true,
                            "limit_first_letter_length"=>16,
                            "lowercase"=>true,
                            'keep_first_letter'=>true,
                            "remove_duplicated_term"=>true,
                            "keep_joined_full_pinyin"=>true  #这里设置为true
                          ]
                        ]
                    ],

搜索词为”腾讯“它会返回:

{
    "tokens": [
        {
            "token": "teng",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "xun",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "腾讯",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "tengxun",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "tx",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

搜索”微信“会返回:

{
    "tokens": [
        {
            "token": "wei",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "微",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "w",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "xin",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "信",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "x",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        }
    ]
}

就是不生成微信的全拼,导致输入”“搜索不到,中间加个空格就可以了”wei xin“,不止微信还有钉钉,微博等词。

我尝试修改的源码但未生效

不知道哪里的问题导致一些词无法生成全拼

提醒:请联系我时一定说明是从浚耀商务生活网上看到的!