I have a few nested for loops that work and will append a list of scores. This is currently very slow to run. Are there anyways to easily optimize this and have it run quicker?
scores = []
for day in range(0,len(date)):
x = []
for entry in range(0,len(df_new)):
if df_new['timestamp(America/New_York)'].dt.strftime('%Y-%m-%d').iloc[entry] == date[day]:
for times in range(0,len(time)):
if df_new['timestamp(America/New_York)'].dt.strftime('%H:%M:%S').iloc[entry] == time[times]:
x.append(df_new['score'].iloc[entry])
scores.append(x)
!Here is a picture of the data frame as well. ]1
CodePudding user response:
You are currently calling the method df_new['timestamp(America/New_York)'].dt.strftime('%Y-%m-%d')
alot of times inside your nested loops, which means that you will have to fetch that data ALOT of times.
What you could do is to store the value from df_new['timestamp(America/New_York)'].dt.strftime('%Y-%m-%d')
in a variable before the loops, and then just call for that variable instead, since it now contains the data you need.
Something like this
data_frame = df_new['timestamp(America/New_York)'].dt.strftime('%Y-%m-%d')
for entry in range(0,len(df_new)):
if data_frame.iloc[entry] == date[day]:
for times in range(0,len(time)):
etc. etc.
Don't think it will improve TOO much, but a bit atleast!
CodePudding user response:
There is no need for loops, you can create two boolean masks for each condition and then index your original dataframe using &
between the two masks. An example is shown below.
print(df)
ts score
0 2021-09-16 11:45:00 88.6
1 2021-09-16 11:48:00 92.3
2 2021-09-30 11:45:00 44.5
3 2021-09-30 12:45:00 55.4
print(dates)
['2021-09-16']
print(times)
['11:45:00', '11:48:00']
mask1 = df["ts"].dt.strftime('%Y-%m-%d').isin(dates)
mask2 = df["ts"].dt.strftime('%H:%M:%S').isin(times)
df.loc[mask1 & mask2]
ts score
0 2021-09-16 11:45:00 88.6
1 2021-09-16 11:48:00 92.3