I'm developing search engine to my project and i'm using Elasticsearch
and node.js
for the server.
Every night I have a parser that scrap data from some website and insert it to the db. For now it duplicates the data that I already have.
Can I make a unique field inside the index when insert a document for example title : {unique : true}
and by that it will not insert me a document that come with this title
Here is my code :
async function insertManual(manual) {
return new Promise(async (resolve, reject) => {
const result = await client.index({
index : 'completeindexthree',
body : {
brand : manual.brand,
category : manual.category,
url : manual.url,
title : manual.title, // example {unique : true}
parsingData : new Date().toString()
}
})
await client.indices.refresh({index: 'completeindexthree'})
resolve(result);
})
}
the second is , how can i delete all my duplicates that already got in by title from that index in node.js not from logstach ?
CodePudding user response:
Tldr;
Yes it is possible, not by using the unique
keyword though.
According to the documentation, if you set an _id
and this id exist already it will be replaced/overwrite
If the target is an index and the document already exists, the request updates the document and increments its version.
Furthermore you will find this section
Using _create guarantees that the document is only indexed if it does not already exist.
To fix
You should set an _id
per document and use the create
Your code may look like the following:
async function insertManual(manual) {
return new Promise(async (resolve, reject) => {
const result = await client.create({
index : 'completeindexthree',
id: manual.id, // <- Here is your unique id.
body : {
brand : manual.brand,
category : manual.category,
url : manual.url,
title : manual.title, // example {unique : true}
parsingData : new Date().toString()
}
})
await client.indices.refresh({index: 'completeindexthree'})
resolve(result);
})
}
CodePudding user response:
If you don't give an id, elastic search creates a unique id, if you do, it creates the id you gave.
payload should be like this
{
id:"you_unique_id",
body:{foo,"bar"}
}