Elasticsearch and Spanish Accents -

- March 15, 2014

i trying use elasticsearch index data research paper. i'am figthing accents. intance, if use:

get /_analyze?tokenizer=standard&filter=asciifolding&text="boletínes de investigaciónes" get

{    "tokens": [       {          "token": "bolet",          "start_offset": 1,          "end_offset": 6,          "type": "<alphanum>",          "position": 1       },       {          "token": "nes",          "start_offset": 7,          "end_offset": 10,          "type": "<alphanum>",          "position": 2       },       {          "token": "de",          "start_offset": 11,          "end_offset": 13,          "type": "<alphanum>",          "position": 3       },       {          "token": "investigaci",          "start_offset": 14,          "end_offset": 25,          "type": "<alphanum>",          "position": 4       },       {          "token": "nes",          "start_offset": 26,          "end_offset": 29,          "type": "<alphanum>",          "position": 5       }    ] }

and should

{    "tokens": [       {          "token": "boletines",          "start_offset": 1,          "end_offset": 6,          "type": "<alphanum>",          "position": 1       },       {          "token": "de",          "start_offset": 11,          "end_offset": 13,          "type": "<alphanum>",          "position": 3       },       {          "token": "investigacion",          "start_offset": 14,          "end_offset": 25,          "type": "<alphanum>",          "position": 4       }    ] }

what should do?

to prevent tokens being formed, need use alternative tokenizer, e.g. try whitespace tokenizer.

alternatively use language analyzer , specify language.

Search This Blog

Hide

Elasticsearch and Spanish Accents -

Comments

Post a Comment

Popular posts from this blog

c# - how to use buttonedit in devexpress gridcontrol -

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - System.FormatException' occurred in MongoDB.Bson.dll - XXX is not a valid 24 digit hex string -