example:
_id = 001
field 'location' = PARIS FRANCE
field 'country' = FRANCE
_id = 002
field 'location' = TORONTO
field 'country' = CANADA
desired result:
ability to recognize that for _id 001, "france" is also in the value for location field;
whereas, _id 002 does not have a value from country that also is in location
Instead of relying on pandas, would like to see if there are more efficient options using pymongo, for example?
CodePudding user response:
This is sensitive to case, and possible abbreviations, etc., but here's one way to identify if one string is contained within the other.
Given an example collection like this:
[
{
"_id": "001",
"location": "PARIS FRANCE",
"country": "FRANCE"
},
{
"_id": "002",
"location": "TORONTO",
"country": "CANADA"
}
]
This will set "isIn"
if "country"
is contained within "location"
or vice-versa.
db.collection.aggregate([
{
"$set": {
"isIn": {
"$gte": [
{
"$sum": [
{ // returns pos or -1 if not found
"$indexOfCP": ["$location", "$country"]
},
{"$indexOfCP": ["$country", "$location"]}
]
},
-1
]
}
}
}
])
Example output:
[
{
"_id": "001",
"country": "FRANCE",
"isIn": true,
"location": "PARIS FRANCE"
},
{
"_id": "002",
"country": "CANADA",
"isIn": false,
"location": "TORONTO"
}
]
Try it on mongoplayground.net.