Home > Mobile >  Index multiple arrays in MongoDB
Index multiple arrays in MongoDB

Time:12-01

I have documents that look like this:

    ...
    propA:[0, 2, 6],      //Array of unique numbers of length max. ~50
    propB:[2, 14, 24, 39],//Array of unique numbers of length max. ~50
    propC:[1, 14, 29],    //Array of unique numbers of length max. ~50
    ...

And I want to be able to query them in a such a way that :

  • I can match all documents with certain properties
  • I can match all documents without certain properties

For example, at the moment I use the query:

mongo.db.collection('things').find({
    propA:{$all:[...], $nin:[...]},
    propB:{$all:[...], $nin:[...]},
    propC:{$all:[...], $nin:[...]},
})

But it's really slow and scales really badly because mongo has to scan every documents.

The solutions to make mongo queries faster are:

  • Make an index
  • Change document schema
  • Both

The problem is that I cannot .createIndex({propA:1, propB:1, propC: 1}) because mongo does not accept to index multiple array fields (for good reasons).

But when I use another schema like:

    ...
    props:["a0", "a2", "a6", "b2", "b14", "b24", "b39", "c1", "c14", "c29"]
    ...

And create an index on {props: 1}, querying the same way as above executes like this:

  • first stage: IXSCAN, incredibly fast, but it only selects the first property (ex: "a0")
  • second stage: FETCH, incredibly slow, has to scan all documents that passed the IXSCAN

So my question is :

How to query documents with arrays the most efficiently ?

CodePudding user response:

I'm the OP, I think I found the "optimal" way to do it : regex

For example, instead of storing

    propA:[0, 2, 6],
    propB:[2, 14, 24, 39],
    propC:[1, 14, 29], 

You store

props: "a0a2a6b2b14b24b39c1c14c29" //a string, really

You create the index .createIndex({"props": "text"})

And for example to search for documents with a1, b5, b7 and without a3, b4 and c8 you query:

mongo.db.collection('things').find({
    props: {
        $regex: /.*a1.*^a3.*^b4.*b5.*b7.*c8/
    }
})

Thinking about it I have no idea how it's possible to query a collection with millions of documents in few ms, but the nice thing is that it is.

There is only one "small" problem:

  • Query for properties higher than all documents.

Strangely, if the maximum propA of the collection is 19 and you query for 20 mongo does scan all documents instead of using the index, I guess an easy fix is to add "x999" between the series but still, it's pretty ugly.

For example, you store :

props: "a0a2a6a999b2b14b24b39b999c1c14c29c999" //another string

I am still looking forward to better alternative, indeed, you cannot make a compound index like .createIndex({"location": "2dsphere", "props": "text"}).

  • Related