I was looking for a way to generate words and special characters as tokens from a url.
eg. I have a url https://www.google.com/
I want to generate tokens in elastic as https, www,google, com, :, /, /, ., ., /
CodePudding user response:
You can define custom analyzer with letter
tokenizer as shown below:
PUT index3
{
"settings": {
"analysis": {
"analyzer": {
"my_email": {
"tokenizer": "letter",
"filter": [
"lowercase"
]
}
}
}
}
}
Test API:
POST index3/_analyze
{
"text": [
"https://www.google.com/"
],
"analyzer": "my_email"
}
Output:
{
"tokens" : [
{
"token" : "https",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "www",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 1
},
{
"token" : "google",
"start_offset" : 12,
"end_offset" : 18,
"type" : "word",
"position" : 2
},
{
"token" : "com",
"start_offset" : 19,
"end_offset" : 22,
"type" : "word",
"position" : 3
}
]
}