I have a database with millions of data. These datas contains names but I have two types of name
Datas with a given name (denomination
key)
OR
Datas with a person name (firstName
and lastName
keys, I don't have a key that concatenates the two in datas)
I would like to create an API which search the query on given name AND person name
For that, I have to search the query on denomination
key AND on concatened firstName
lastName
key
That's why I, firstly, concat firstName
and lastName
keys into identity
key.
Then i would like make an aggregation to match my query on those two keys
aggregate([
{$addFields:{'identite':{$concat:["$lastName",' ',"$firstName"]}}},
{
$match:{
$and:[{
$or : [
{
'denomination':toUpper(MySearchQuery])
},
{
'identite':toUpper(MySearchQuery)
}
]
}
/*Here, i'll be able to add more conditions*/
]
}
}
])
So my question was how to manage indexes in that case ? Did i have to index my concatened key (identity
but it does not exist in my datas) and denomination
. Or do i have to index firstName
, lastName
and denomination
And if you have a better solution to make my searh, I take it too
Thanks in advance.
CodePudding user response:
In your case, only the first $match
stage will be improved with the index in .aggregation
query. And in that case, you'll need to modify the query:
.aggregate([
{
$match: {
firstName: "John",
lastName: "Doe",
}
},
...
])
and make a compound index for your collection like: { firstName: 1, lastName: 1 }
You might wanna check this article and especially Pipeline Operators and Indexes part from MongoDB docs.
UPDATED
According to your problem:
The problem is I'll not be able to separate firstName and lastName. It will be like a search bar where u put firstName and lastName in the same query. That's why I have to concat them
There are various ways to achieve it. The best available way is using a $text index and text search. I am using it daily on a production database with 100m docs.
Mongoose Text Index example:
YourSchema.index(
{
'firstName': 'text',
'lastName': 'text',
},
{
weights:
{
'firstName': 2,
'lastName': 1
},
name: 'SearchQuery',
})
This is an example of a text index for search in multiple fields.
But if you want to look for a combination of Joe Doe
as firstName & lastName
you also could use virtual fields (not sure for 100%) or add a separate field for text index, like:
Collection Schema
{
firstName: string,
lastName: string,
// combinedName: string
}
and then add a separate text index for combinedName
field.
UPDATED v2
Unfortunately, you can't use mongoose virtuals in your case, the best way scenario is making another field with firstLastName
combined and adding a $text query index with proper weights (which actually supports language and case(in)sensitive search) to denomination
and firstLastName
fields as I described above.
Another relevant, but not MongoDB option is to store part of the collection in ElasticSearch.
Is the only way to avoid performance issues for user-search queries (where reaction time is sensitive and should be no more then 2s) and without rebuilding the whole schema.
You could also make it possible, if you are using mongoose driver for mongo, with default
property, where you could predefine your firstLastName
value from this.lastName
and this.lastName
, so you won't need to manually add it every time.
But of course, for a first time, you'll need to update the whole connection via cursor:
await YourModel
.find()
.cursor()
//.sort(by proprery not sure)
.eachAsync(async (doc) => {
doc.firstLastName = `${doc.firstName} ${doc.lastName}`;
await doc.save();
})
or updateMany
(but updateMany is long query and not so good controllable as cursor)