I have a python script that is reaching out to an Alteryx MongoDB and pulling in the Queue table. Works well but it's a fairly dumb collection.find()
that returns everything.
The size isn't insurmountable now but will continue to grow. I only truly need the current day's results. Unfortunately, Alteryx MongoDB instance uses strings for all date fields.
I've worked out a script using Studio 3T that pulls the data but I am stuck on how to convert this to a python query so I can automate this in Lambda.
Python Function:
def authenticate():
client = MongoClient('mongodb://user:*******')
db = client.AlteryxService
collection = db.AS_Queue
data = pd.DataFrame(list(collection.find()))
#Trim columns not needed
data = data.drop(["IsAnonymous", "RestrictOutput","AS_Schedules__ID","__ServiceData", "__Version","__UpdateCheck"], axis=1 )
#Swap Object to string type
data = data.astype('string')
#Change time to D/T
data["CompletionDateTime"] = pd.to_datetime(data["CompletionDateTime"])
data["CreationDateTime"] = pd.to_datetime(data["CreationDateTime"])
#Grab only todays data
today = datetime.datetime.now().strftime('%Y/%m/%d')
data = data[data['CreationDateTime'] >= today]
#normalize to postgres output
data.columns = map(str.lower, data.columns)
return data
This part of the function can easily be trimmed down if I only grab the current day.
In the Studio3T script, I am using this:
today = new Date();
today.setHours(0, 0, 0, 0);
startOfDay = today.toISOString()
.slice(0, -5)
.replace(/T/g, match => match === 'T' ? ' ' : '');
end = new Date();
end.setHours(23, 59, 59, 999);
endOfDay = end.toISOString()
.slice(0, -5)
.replace(/T/g, match => match === 'T' ? ' ' : '');
db.getCollection("AS_Queue").find(
{
"CompletionDateTime": {
$gt: startOfDay,
$lt: endOfDay
}},{"IsAnonymous":0,"RestrictOutput":0,"AS_Schedules__ID":0,"__ServiceData":0,"__Version":0,"__UpdateCheck":0})
This returns both the columns I need and the correct date. How do I convert this into the python query? When I just trim the size by adding the columns:0 it returns zero results.
CodePudding user response:
To construct the eqivalant filter querying a date string in YYYY-MM-DD format, use:
from datetime import date, timedelta
start_of_day = date.today().strftime("%Y-%m-%d")
end_of_day = (date.today() timedelta(days=1)).strftime("%Y-%m-%d")
doc_filter = {
"CompletionDateTime": {
'$gte': start_of_day,
'$lt': end_of_day
}
}