Home > OS >  How to sort documents in MongoDB - Python
How to sort documents in MongoDB - Python

Time:03-01

I have a bunch of documents inside my MongoDB collection. I did write an aggregation code to find and filter some documents. All is well but the result is shown randomly.

The result is, you see that entities and chat sources are shown in random order.

The desired result is to show the entities and chat sources in ascending order like the following:

Entity 1 Chat Source 1
Entity 1 Chat Source 2
Entity 1 Chat Source 3

Entity 2 Chat Source 1
Entity 2 Chat Source 2
Entity 2 Chat Source 3

Entity 3 Chat Source 1
Entity 3 Chat Source 2
Entity 3 Chat Source 3

I tried two ways to sort.

The first way, inside the aggregation code:

if total_messages > 0:
    agg_result = mycol.aggregate([
        {
            "$match": {
                "creation_date_time": {"$gte": start, "$lt": end}
            }
        },
        {
            "$group": {
                "_id": {"Entity": "$entity", "Chat Source": "$chat_source"},
                "Count": {"$sum": 1},
                "Latest Message Date Time": {"$last": "$date_time"}
            }
        },
        {
            "$sort": {"entity": -1, "chat_source": -1}
        }
    ])

This changed nothing and gave the same previous output.

The second way is using pymongo sorting:

for response in agg_result:
    doc = mycol.find().sort("entity", pymongo.ASCENDING)
    print(doc)

And this gave the following result:

<pymongo.cursor.Cursor object at 0x0000027A59AF9850>
<pymongo.cursor.Cursor object at 0x0000027A59AF9700>
<pymongo.cursor.Cursor object at 0x0000027A59AF9850>
<pymongo.cursor.Cursor object at 0x0000027A59AF9700>
<pymongo.cursor.Cursor object at 0x0000027A59AF9850>
<pymongo.cursor.Cursor object at 0x0000027A59AF9700>
<pymongo.cursor.Cursor object at 0x0000027A59AF9850>
<pymongo.cursor.Cursor object at 0x0000027A59AF9700>
<pymongo.cursor.Cursor object at 0x0000027A59AF9850>

How can I solve this issue?

CodePudding user response:

Your sort must be on a field passed from the previous stage of the pipeline, so in your case change the sort to (something like):

{ 
    "$sort": {"_id": -1}
}

In the second example you have misunderstood how the .sort() operator works, hence the unexpected results.

CodePudding user response:

I suspect that instead of

        {
            "$group": {
                "_id": {"Entity": "$entity", "Chat Source": "$chat_source"},
                "Count": {"$sum": 1},
                "Latest Message Date Time": {"$last": "$date_time"}
            }
        },
        {
            "$sort": {"entity": -1, "chat_source": -1}
        }

what you mean to do is sort on the grouped Entity Chat Source combination. After the $group stage, there will be no fields entity and chat_source upon which to sort. Try:

        {
            "$sort": {"_id.Entity":1, "_id.Chat Source":1}
        }

Also note that your pipeline assumes that after the first $match, material is flowing into the $group stage in ascending datetime order hence the use of $last -- but perhaps you mean $creation_date_time...? Unless there are truly two fields date_time and creation_date_time.

    "Latest Message Date Time": {"$last": "$creation_date_time"}
  • Related