Mongodb does $concat need a new index-CodePudding

I have a database with millions of data. These datas contains names but I have two types of name

Datas with a given name (denomination key)

Datas with a person name (firstName and lastName keys, I don't have a key that concatenates the two in datas)

I would like to create an API which search the query on given name AND person name

For that, I have to search the query on denomination key AND on concatened firstName lastName key

That's why I, firstly, concat firstName and lastName keys into identity key.

Then i would like make an aggregation to match my query on those two keys

aggregate([
 {$addFields:{'identite':{$concat:["$lastName",' ',"$firstName"]}}},
 {
   $match:{
     $and:[{
       $or : [
         {
           'denomination':toUpper(MySearchQuery])
         },
         {
           'identite':toUpper(MySearchQuery)
         }
       ]
     }
     /*Here, i'll be able to add more conditions*/
   ]
  }
 }
])

So my question was how to manage indexes in that case ? Did i have to index my concatened key (identity but it does not exist in my datas) and denomination. Or do i have to index firstName, lastName and denomination

And if you have a better solution to make my searh, I take it too

Thanks in advance.

CodePudding user response：

In your case, only the first $match stage will be improved with the index in .aggregation query. And in that case, you'll need to modify the query:

.aggregate([
{
  $match: {
    firstName: "John",
    lastName: "Doe",
  }
},
...
])

and make a compound index for your collection like: { firstName: 1, lastName: 1 }

You might wanna check this article and especially Pipeline Operators and Indexes part from MongoDB docs.

UPDATED

According to your problem:

The problem is I'll not be able to separate firstName and lastName. It will be like a search bar where u put firstName and lastName in the same query. That's why I have to concat them

There are various ways to achieve it. The best available way is using a $text index and text search. I am using it daily on a production database with 100m docs.

Mongoose Text Index example:

YourSchema.index(
    {
    'firstName': 'text',
    'lastName': 'text',
  },
  {
    weights:
      {
        'firstName': 2,
        'lastName': 1
      },
    name: 'SearchQuery',
  })

This is an example of a text index for search in multiple fields. But if you want to look for a combination of Joe Doe as firstName & lastName you also could use virtual fields (not sure for 100%) or add a separate field for text index, like:

Collection Schema

{
  firstName: string,
  lastName: string,
  //   combinedName: string
}

and then add a separate text index for combinedName field.

UPDATED v2

Unfortunately, you can't use mongoose virtuals in your case, the best way scenario is making another field with firstLastName combined and adding a $text query index with proper weights (which actually supports language and case(in)sensitive search) to denomination and firstLastName fields as I described above.

Another relevant, but not MongoDB option is to store part of the collection in ElasticSearch.

Is the only way to avoid performance issues for user-search queries (where reaction time is sensitive and should be no more then 2s) and without rebuilding the whole schema.

You could also make it possible, if you are using mongoose driver for mongo, with default property, where you could predefine your firstLastName value from this.lastName and this.lastName, so you won't need to manually add it every time.

But of course, for a first time, you'll need to update the whole connection via cursor:

await YourModel
  .find()
  .cursor()
  //.sort(by proprery not sure)
  .eachAsync(async (doc) => {
     doc.firstLastName = `${doc.firstName} ${doc.lastName}`;
     await doc.save();
  })

or updateMany (but updateMany is long query and not so good controllable as cursor)