first of all, sorry if what I'm asking is stupid, but I'm very new to Elastic Search. Here's what I need to do: I have an array of keywords that I need to search for in every document of an index. Here's the mapping:
{
"resumes": {
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
Knowing this, I need to search for all the words in the keyword array in every document, and for every document in the resume index, it would return a vector with 0 for the word if not found in the document, and 1 if it was found.
Eg.
keywords = ["javascript", "html", "python"]
doc1 = "Hello there, I've only programmed in python."
doc2 = "Hello there, I've only programmed in python and javascript."
doc3 = "Hello there, I've only programmed in python and javascript. Im now learning html"
Search results would be something like:
{
"doc1": [0, 0, 1], // because it contains the word python
"doc2": [1, 0, 1], // because it contains both python and javascript
"doc3": [1, 1, 1] // because it contains all words in the keyword vector
}
Is it even possible to do this with elastic search alone? I'm coding all this in Python, but I think if I filled these with Python itself, it would be way more inefficient than if elastic search could do it.
Haven't tried much yet, since I don't even know too well the capabilities of Elastic Search. I've searched a lot for it, but I'm not even aware where to start from...
CodePudding user response:
Using scripts in elasticsearch is not healthy because they are not performative. I managed to do what you want but I warn you about performance issues.
In field "vector_field" your have your matrix.
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python."
}
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python and javascript."
}
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python and javascript. Im now learning html"
}
GET idx_teste/_search
{
"_source": "*",
"query": {
"terms": {
"description": [
"javascript","html","python"
]
}
},
"script_fields": {
"custom_field": {
"script": {
"source": """
def vector = new ArrayList();
for(int i=0; i< params.keywords.size(); i ){
String text = doc['description.keyword'].value;
if(text.contains(params.keywords[i])) {
vector.add(1);
} else {
vector.add(0);
}
}
return vector;
""",
"params": {
"keywords" :[
"javascript","html","python"
]
}
}
}
}
}
Response:
"hits": [
{
"_index": "idx_teste",
"_id": "oIyiQ4QBgXg8h_rc0Ny3",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python."
},
"fields": {
"vector_field": [
0,
0,
1
]
}
},
{
"_index": "idx_teste",
"_id": "oYypQ4QBgXg8h_rcH9wU",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python and javascript."
},
"fields": {
"vector_field": [
1,
0,
1
]
}
},
{
"_index": "idx_teste",
"_id": "ooypQ4QBgXg8h_rcJ9y2",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python and javascript. Im now learning html"
},
"fields": {
"vector_field": [
1,
1,
1
]
}
}
]