Im new to Elasticsearch and I would like to know if there are any good practices for the use case I have.
I have heterogeneous data sent from an API that I save into a database (as a JSON) then save in Elasticsearch for search purposes. The data in sent in this format (because it's heterogeneous, the users can send any type of data, some metadata can be multivalued, other single values and the name of the key in the JSON may vary :)
{
"indices":{
"MultipleIndices":[
{
"index":"editors",
"values":[
"The Editing House",
"Volcan Editing"
]
},
{
"index":"colors",
"values":[
"Red",
"Blue"
]
}
],
"SimpleIndices":[
{
"index":"AuthorName",
"value": "George R. R. Martin"
},
{
"index":"NumberOfPages",
"value":"2898"
},
{
"index":"BookType",
"value":"Fantasy"
}
]
}
}
Once we receive this JSON, its formatted in the code and stored as a JSON in a database with this format :
{
"indices":{
"editors":[
"The Editing House",
"Volcan Editing"
],
"colors":[
"Red",
"Blue"
],
"AuthorName" : "George R. R. Martin"
"NumberOfPages" : "2898",
"BookType" : "Fantasy"
}
}
I then want to save this data into Elasticsearch, what's the best way I can map it ? Store it as a JSON in one field ? Will the search be efficilent if I do it this way ?
CodePudding user response:
You must mapping each field individually. You can take a look at field types to understand which type is ideal for your schema. Another suggestion is to study the text analysis because it is responsible for the process of structuring the text to optimize the search.
My suggestion map:
PUT indices
{
"mappings": {
"properties": {
"editors": {
"type": "keyword"
},
"colors":{
"type": "keyword"
},
"author_name":{
"type": "text"
},
"number_pages":{
"type": "integer"
},
"book_type":{
"type": "keyword"
}
}
}
}
CodePudding user response:
I think in your case, you don't have much choice apart from dynamic mapping, which Elasticsearch will generate for you as soon as first document is index in a particular index.
However, you can improve the process by using the dynamic template so that you can optimize your mapping, there is good examples of that in the official link I provided.