I am trying to filter multiple dataframes at once by a specific date range (for this example January 2 - January 4). I know you can filter a dataframe by date using the following code: df = (df['Date'] > 'start-date') & (df['Date'] < 'end-date')
; however, when I created a list of dataframes and tried to loop over them, I am returned the original dataframe with the original date range.
Any suggestions? I have provide some example code below:
d1 = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10]
}
df1 = pd.DataFrame(data=d1)
d2 = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
'C': [11, 12, 13, 14, 15],
'D': [16, 17, 18, 19, 20]
}
df2 = pd.DataFrame(data=d2)
df_list = [df1, df2]
for df in df_list:
df = (df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')
df1
**Output:**
Date A B
0 2021-01-01 1 6
1 2021-01-02 2 7
2 2021-01-03 3 8
3 2021-01-04 4 9
4 2021-01-05 5 10
I have tried various ways to filter, such as .loc, writing functions, and creating a mask, but still can't get it to work. Another thing to note is that I am doing more formatting as part of this loop, and all the other formats are applied except this one. Any help is greatly appreciated! Thanks!
CodePudding user response:
The issue here is that you're simply reassigning the variable df
in your for loop without actually writing the result back into df_list
.
This solves your issue:
df_list = [df1, df2]
output_list = []
for df in df_list:
df_filter = (df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')
output_list.append(df.loc[df_filter])
output_list
now contains the filtered dataframes.
CodePudding user response:
As @anky mentioned, you need to specify your 'Date' columns as type of datetime. In addition, in your for loop you get boolean. You need to use it for selection.
[...]
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
df_list_filtered = []
for df in df_list:
df = df[(df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')]
df_list_filtered.append(df)
[...]
print(df_list_filtered[0])
Date A B
1 2021-01-02 2 7
2 2021-01-03 3 8
3 2021-01-04 4 9