I'm creating a MongoDB client for a Go Application, using the MongoDB Go Driver. In particular, I have two databases with one collection each. These collection can be modified asynchronously by different clients, so i need to periodically synchronize them, keeping the most recently edited document, among those with the same id
field
The two databases are stored on different hosts, so i need to export the collection from one host using mongoexport
and import into the other host using mongoimport
.
I already tried using mongoimport --collection=myColl --mode=merge
, but this doesn't fit my goal because simply overrides the conflicting documents from myColl
with the imported ones.
My idea is to import the json into a temp collection, but i don't know how to compare the timestamps during the aggregation/merge process.
My collections are structured Like this, any idea?
Collection 1
{"_id":"K1","value":"VAL1","timest":{"$date":"2021-09-26T09:05:09.942Z"}}
{"_id":"K2","value":"VAL2","timest":{"$date":"2021-09-26T09:05:10.234Z"}}
Collection 2
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}
{"_id":"K3","value":"VAL4","timest":{"$date":"2021-09-26T09:15:10.234Z"}}
Desired Behaviour
Conflict
{"_id":"K2","value":"VAL2","timest":{"$date":"2021-09-26T09:05:10.234Z"}}
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}[LATEST]
Output
{"_id":"K1","value":"VAL1","timest":{"$date":"2021-09-26T09:05:09.942Z"}}
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}
{"_id":"K3","value":"VAL4","timest":{"$date":"2021-09-26T09:15:10.234Z"}}
CodePudding user response:
You can use $merge
The bellow merges testdb1.coll
to testdb2.coll
based on same _id
And keeps the document with the latest date. If _id
is not found, then document is inserted.
Data in
testdb1.coll
[{"_id" "K2","value" "VAL3","timest" (date "2021-09-26T09:15:09.942Z")}
{"_id" "K3","value" "VAL4","timest" (date "2021-09-26T09:15:10.234Z")}]
testdb2.coll
[{"_id" "K1","value" "VAL1","timest" (date "2021-09-26T09:05:09.942Z")}
{"_id" "K2","value" "VAL2","timest" (date "2021-09-26T09:05:10.234Z")}]
Results
testdb2.coll (after the merge)
{"_id": "K1", "value": "VAL1", "timest": {"$toDate": "2021-09-26T09:05:09.942Z"}}
{"_id": "K2", "value": "VAL3", "timest": {"$toDate": "2021-09-26T09:15:09.942Z"}}
{"_id": "K3", "value": "VAL4", "timest": {"$toDate": "2021-09-26T09:15:10.234Z"}}
Query
(instead of $let
you could use $$new
)
client.db("testdb1").collection("coll").aggregate(
[
{
"$merge": {
"into": {
"db": "testdb2",
"coll": "coll"
},
"on": [
"_id"
],
"let": {
"p_ROOT": "$$ROOT"
},
"whenMatched": [
{
"$replaceRoot": {
"newRoot": {
"$cond": [
{
"$gt": [
"$$p_ROOT.timest",
"$timest"
]
},
"$$p_ROOT",
"$$ROOT"
]
}
}
}
],
"whenNotMatched": "insert"
}
}
])
CodePudding user response:
You can do following in an aggregation pipeline:
- use
$unionWith
to combine the 2 collections $sort
to order them bytimest
- use
$first
to get the latest document - use
$replaceRoot
to get the final form your want
Here is the Mongo playground for your reference.