I am querying mongo db via Pyspark connector but unable to get the query executed. I am getting empty data frame.
I have attempted following version of queries:
agg_query = [{
'$match': {
'tweetInfo.created_at': {
'$gte': "2021-10-10T00:00:00.0000Z"
}
}
}]
agg_query = [{
'$match': {
'tweetInfo.created_at': {
'$lte': datetime.datetime.now().isoformat()
}
}
}]
When I tried on Studio 3T, following query worked for me:
agg_query = [{
'$match': {
'tweetInfo.created_at': {
'$gte': "2021-10-10T00:00:00.0000Z"
}
}
}]
agg_query = [{
'$match': {
'tweetInfo.created_at': {
'$lte': datetime.datetime.now().isoformat()
}
}
}]
Thanks in anticipation.
CodePudding user response:
In the recent past, I had to make a date field query on pyspark and as per my understanding, it looks like the pyspark mongo connector is unable to infer that provided value is a date type.
I sorted out my issue by explicitly adding the $date
operator to let the Pyspark-mongo connector know that I want documents that are greater/less than a certain date.
agg_query = [{
'$match': {
'tweetInfo.created_at': {
'$gte': {
"$date": start.strftime("%Y-%m-%dT%H:%M:%SZ")
},
'$lt': {
"$date": end.strftime("%Y-%m-%dT%H:%M:%SZ")
}
}
}
}]