Home > front end >  PyMongo advanced query by string "date" field
PyMongo advanced query by string "date" field

Time:09-14

I have a python script that is reaching out to an Alteryx MongoDB and pulling in the Queue table. Works well but it's a fairly dumb collection.find() that returns everything. The size isn't insurmountable now but will continue to grow. I only truly need the current day's results. Unfortunately, Alteryx MongoDB instance uses strings for all date fields. I've worked out a script using Studio 3T that pulls the data but I am stuck on how to convert this to a python query so I can automate this in Lambda.

Python Function:

def authenticate():  
    client = MongoClient('mongodb://user:*******')
    db = client.AlteryxService
    collection = db.AS_Queue
    data = pd.DataFrame(list(collection.find()))
   
    #Trim columns not needed 
    data = data.drop(["IsAnonymous", "RestrictOutput","AS_Schedules__ID","__ServiceData", "__Version","__UpdateCheck"], axis=1 )
    #Swap Object to string type
    data = data.astype('string')

    #Change time to D/T
    data["CompletionDateTime"] = pd.to_datetime(data["CompletionDateTime"])
    data["CreationDateTime"] = pd.to_datetime(data["CreationDateTime"])

    #Grab only todays data
    today = datetime.datetime.now().strftime('%Y/%m/%d')
    data = data[data['CreationDateTime'] >= today]
    #normalize to postgres output
    data.columns = map(str.lower, data.columns)

    return data

This part of the function can easily be trimmed down if I only grab the current day.

In the Studio3T script, I am using this:

today = new Date();
  today.setHours(0, 0, 0, 0);
startOfDay = today.toISOString()
 .slice(0, -5)
 .replace(/T/g, match => match === 'T' ? ' ' : '');
end = new Date();
end.setHours(23, 59, 59, 999);
endOfDay = end.toISOString()
 .slice(0, -5)
 .replace(/T/g, match => match === 'T' ? ' ' : '');
db.getCollection("AS_Queue").find( 
   { 
     "CompletionDateTime": {
       $gt: startOfDay,
       $lt: endOfDay
}},{"IsAnonymous":0,"RestrictOutput":0,"AS_Schedules__ID":0,"__ServiceData":0,"__Version":0,"__UpdateCheck":0})

This returns both the columns I need and the correct date. How do I convert this into the python query? When I just trim the size by adding the columns:0 it returns zero results.

CodePudding user response:

To construct the eqivalant filter querying a date string in YYYY-MM-DD format, use:

from datetime import date, timedelta
start_of_day = date.today().strftime("%Y-%m-%d")
end_of_day = (date.today()   timedelta(days=1)).strftime("%Y-%m-%d")
doc_filter = {
    "CompletionDateTime": {
        '$gte': start_of_day,
        '$lt': end_of_day
    }
} 
  • Related