Home > database >  Find dictionary keys in mongodb with dot infront of their key values
Find dictionary keys in mongodb with dot infront of their key values

Time:11-18

I have some data in a mongodb database collection that looks like this

{
    "_id": {
        "$oid": "63737b4b654d9b6a0c3a2006"
    },
    "tag": {
        "tagName": 0.10534846782684326
    }
}

and I want to check if a dictionary with a specific tagName exists. To do so, we can apply this

mycollection.find({f"tag.{'tagName'}": {"$exists": True}})

However, some tagNames have a dot . in front, e.g.,

{
    "_id": {
        "$oid": "63737b4b654d9b6a0c3a2006"
    },
    "tag": {
        ".tagName": 0.10534846782684326
    }
}

So when I run the query

mycollection.find({f"tag.{'.tagName'}": {"$exists": True}})

returns that the dictionary whose key name is .tagName is not found. This is because of the double dot in f"tag.{'.tagName'}". Can we write the query in such a way in order to avoid this situation?

Mongodb version:

db version v4.4.13
Build Info: {
    "version": "4.4.13",
    "gitVersion": "df25c71b8674a78e17468f48bcda5285decb9246",
    "openSSLVersion": "OpenSSL 1.1.1f  31 Mar 2020",
    "modules": [],
    "allocator": "tcmalloc",
    "environment": {
        "distmod": "ubuntu2004",
        "distarch": "x86_64",
        "target_arch": "x86_64"
    }
}

CodePudding user response:

The first syntax looks a little odd to me. I don't think it should have the curly brackets. You can see in this playground example that it doesn't find the first document. So you may be looking to remove the curly brackets from the query in both situations, and here is an example where doing so correctly returns the first document.

Now regarding the . character in the name, one approach would be to use the $getField operator. That operator helps retrieve field names that are otherwise ambiguous or contain special characters. An example (that would only retrieve the second document) might look like this:

db.collection.find({
  $expr: {
    $ifNull: [
      {
        $getField: {
          field: ".tagName",
          input: "$tag"
        }
      },
      false
    ]
  }
})

Playground example here

You may combine the two conditions with a $or to return both documents, playground example here.

I would recommend updating your data to remove the extra . character. Its presence is going to make working with the data more difficult and probably cause some performance issues since many of the operations won't be able to effectively use indexes.


Version 4.4 and earlier

As noted in the comments, the $getField operator is new in version 5.0. To accomplish something similar prior to that you could use the $objectToArray operator.

Effectively what you will do here is convert $tag to an array of k, v pairs where k contains the field name. You can then filter directly against that name (k) looking for the value(s) of interest.

The verbose, but arguably more readable, approach to doing so looks like this:

db.collection.aggregate([
  {
    "$addFields": {
      "tagNames": {
        "$objectToArray": "$tag"
      }
    }
  },
  {
    $match: {
      "tagNames.k": {
        $in: [
          "tagName",
          ".tagName"
        ]
      }
    }
  },
  {
    $project: {
      tagNames: 0
    }
  }
])

You could probably collapse it down and do it directly in find() (via $expr usage), as demonstrated here. But doing so requires a little more knowledge about your schema and the structure of the tag field. Overall though, working with field names that contain dots is even more difficult prior to 5.0, which further strengthens the suggestion to correct the underlying data.

  • Related