Home > OS >  How to query a date field in pyspark mongo connector?
How to query a date field in pyspark mongo connector?

Time:10-26

I am querying mongo db via Pyspark connector but unable to get the query executed. I am getting empty data frame.

I have attempted following version of queries:

    agg_query = [{
        '$match': {
            'tweetInfo.created_at': {
                '$gte': "2021-10-10T00:00:00.0000Z"
            }
        }
    }]
    agg_query = [{
        '$match': {
            'tweetInfo.created_at': {
                '$lte': datetime.datetime.now().isoformat()
            }
        }
    }]

When I tried on Studio 3T, following query worked for me:

    agg_query = [{
        '$match': {
            'tweetInfo.created_at': {
                '$gte': "2021-10-10T00:00:00.0000Z"
            }
        }
    }]
    agg_query = [{
        '$match': {
            'tweetInfo.created_at': {
                '$lte': datetime.datetime.now().isoformat()
            }
        }
    }]

Thanks in anticipation.

CodePudding user response:

In the recent past, I had to make a date field query on pyspark and as per my understanding, it looks like the pyspark mongo connector is unable to infer that provided value is a date type.

I sorted out my issue by explicitly adding the $date operator to let the Pyspark-mongo connector know that I want documents that are greater/less than a certain date.

    agg_query = [{
        '$match': {
            'tweetInfo.created_at': {
                '$gte': { 
                    "$date": start.strftime("%Y-%m-%dT%H:%M:%SZ")
                },
                '$lt': {
                    "$date": end.strftime("%Y-%m-%dT%H:%M:%SZ")
                }
            }
        }
    }]
  • Related