Query MongoDB to fetch a nested array with skip/limit by nested object key range in Django using pym-CodePudding

I am learning Django with pymongo.

I have a MongoDB collection where I am storing some words and their year-wise occurrences in some books.

The documents are stored in MongoDB in the following format:

{
   "_id":{
      "$oid":"625c51eec27c99b793074501"
   },
   "word":"entropy",
   "occurrence":13,
   "year":{
      "1942":[
         {
            "book":{
               "$oid":"625c51eec27c99b7930744f9"
            },
            "number":8,
            "sentence":[
               1,
               288,
               322,
               1237,
               2570,
               2585,
               2617,
               2634
            ]
         }
      ],
      "1947":[
         {
            "book":{
               "$oid":"625c5280c27c99b793077042"
            },
            "number":5,
            "sentence":[
               377,
               2108,
               2771,
               3467,
               3502
            ]
         }
      ]
   }
}

Now I want to get the list of the sentences with skip and limit (and the respective book id) queried by _id and for specific year range.

For example,

I want to fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.
The array will be queried by the _id and year range.
A skip and limit will be applied on the sentence list

Is this a possible task using Django and pymongo? If yes, what is the fastest method?

So far I have done this:

search= {'$and': [{"_id": word_id_obj, "year.1942": {"$exists": 1}}]}
datalist= []
word_docs= wordcollec.find(search, {'year': 1, '_id': 0}).skip(1).limit(5)
sentlist['recordsFiltered'] = wordcollec.count_documents(search)

for b in word_docs:
    year_data= b['year'][1942]
    for by in year_data:
        i= i 1
        this_word= {'serial': i, 'year': cyear, 'book': str(by['book'])}
        datalist.append(this_word)

But obviously, it is not giving the desired result as the skip and limit are being applied to the root document object. Also the the year has a fixed value and no range.

It seems using '$slice' is an option. But I cannot figure it out.

Thanks for reading this far. And a lot more if you can throw some light.

CodePudding user response：

Here's one way to:

... fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.

db.collection.aggregate([
  { "$set": { "designWorkAround": { "$objectToArray": "$year" } } },
  { "$set": {
      "designWorkAround": {
        "$map": {
          "input": "$designWorkAround",
          "as": "yearArray",
          "in": {
            "year": "$$yearArray.k",
            "books": {
              "$map": {
                "input": "$$yearArray.v",
                "as": "bookArray",
                "in": {
                  "bookId": "$$bookArray.book",
                  "number": "$$bookArray.number",
                  "sentence": "$$bookArray.sentence"
                }
              }
            }
          }
        }
      }
    }
  },
  { "$unwind": "$designWorkAround" },
  { "$unwind": "$designWorkAround.books" },
  { "$project": {
      "_id": 0,
      "year": "$designWorkAround.year",
      "book": "$designWorkAround.books.bookId",
      "sentence": "$designWorkAround.books.sentence"
    }
  }
])

Try it on mongoplayground.net.

I don't know all the data generation nor queries that you might want, but I would probably redesign the collection and have something like one document per book with all the relevant fields in the document. This would make querying, indexing, etc., simpler and more efficient.