I am trying to use the MongoDB Aggregation pipeline to pull items from one collection, check a string (a url), extract a substring from that, then check in a second collection for a matching document to said substring off one of the second collection document fields.
collection1 document:
{
_id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
url: 'https://example.com/WNE8UH'
}
edit: The productId is not always located at the end of the url string. Sometimes it's right in the middle, which forces me to extract it before any comparison step.
collection2 document:
{
_id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
productId: 'WNE8UH'
}
See the url
in collection1 contains WNE8UH
which is productId
in collection2?
How do you use aggregation to return the substring of url
, then use $lookup
to locate the document in collection2 productId
field?
Here is my current code:
db.collection1.aggregation([
{ $match: { url: RegExp( 'example.com', 'i' ) } },
{ $lookup: {
from: 'collection2',
let: {
productId: {
$regexFind: {
input: "$url",
regex: '(?<=com\/)\w{6}'
}
}
},
pipeline: [
{
$match: { productId: '$$productId'
}
}
],
as: 'matching_doc'
}}
])
Result
{
_id: ObjectId('xxxxxxxxxxxxxxxxxxx'),
url: 'https://example.com/WNE8UH',
matching_doc: []
}
Getting empty array. I need to get the matching doc.
What am I doing wrong?
CodePudding user response:
Query
- lookup if
url
contains theproductId
in the end of the URL
*if this regex is not ok for your case you can use any other
*if you want to use an index to match the url
you should create the index and also use regex with ^
like ^https://example.com/
(i mean before the lookup like in your query)
coll1.aggregate(
[{"$lookup":
{"from": "coll2",
"let": {"url": "$url"},
"pipeline":
[{"$match":
{"$expr":
{"$regexMatch":
{"input": "$$url",
"regex": {"$concat": ["$productId", {"$literal": "$"}]}}}}}],
"as": "matching_doc"}}])
Edit
In case that the productId
is not in the end but it is always after the domain name here example.com you can use something like
coll1.aggregate(
[{"$lookup":
{"from":"coll2",
"pipeline":
[{"$match":
{"$expr":
{"$regexMatch":
{"input":"$$url",
"regex":
{"$concat":["^https://example.com/", "$productId", "*"]}}}}}],
"as":"matching_doc",
"let":{"url":"$url"}}}])
If it can be anywhere inside the URL you can use something like the bellow
coll1.aggregate(
[{"$lookup":
{"from": "coll2",
"pipeline":
[{"$match":
{"$expr":
{"$regexMatch": {"input": "$$url", "regex": "$productId"}}}}],
"as": "matching_doc",
"let": {"url": "$url"}}}])