Elasticsearch and Spanish Accents -


i trying use elasticsearch index data research paper. i'am figthing accents. intance, if use:

get /_analyze?tokenizer=standard&filter=asciifolding&text="boletínes de investigaciónes" get

{    "tokens": [       {          "token": "bolet",          "start_offset": 1,          "end_offset": 6,          "type": "<alphanum>",          "position": 1       },       {          "token": "nes",          "start_offset": 7,          "end_offset": 10,          "type": "<alphanum>",          "position": 2       },       {          "token": "de",          "start_offset": 11,          "end_offset": 13,          "type": "<alphanum>",          "position": 3       },       {          "token": "investigaci",          "start_offset": 14,          "end_offset": 25,          "type": "<alphanum>",          "position": 4       },       {          "token": "nes",          "start_offset": 26,          "end_offset": 29,          "type": "<alphanum>",          "position": 5       }    ] } 

and should

{    "tokens": [       {          "token": "boletines",          "start_offset": 1,          "end_offset": 6,          "type": "<alphanum>",          "position": 1       },       {          "token": "de",          "start_offset": 11,          "end_offset": 13,          "type": "<alphanum>",          "position": 3       },       {          "token": "investigacion",          "start_offset": 14,          "end_offset": 25,          "type": "<alphanum>",          "position": 4       }    ] } 

what should do?

to prevent tokens being formed, need use alternative tokenizer, e.g. try whitespace tokenizer.

alternatively use language analyzer , specify language.


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

nvd3.js - angularjs-nvd3-directives setting color in legend as well as in chart elements -