Home > other >  How to filter multiple dataframes at once by the same date range?
How to filter multiple dataframes at once by the same date range?

Time:09-30

I am trying to filter multiple dataframes at once by a specific date range (for this example January 2 - January 4). I know you can filter a dataframe by date using the following code: df = (df['Date'] > 'start-date') & (df['Date'] < 'end-date'); however, when I created a list of dataframes and tried to loop over them, I am returned the original dataframe with the original date range.

Any suggestions? I have provide some example code below:

d1 = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
   'A': [1, 2, 3, 4, 5],
   'B': [6, 7, 8, 9, 10]
 }
df1 = pd.DataFrame(data=d1)

d2 = {'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
       'C': [11, 12, 13, 14, 15],
       'D': [16, 17, 18, 19, 20]
     }
df2 = pd.DataFrame(data=d2)

df_list = [df1, df2]

for df in df_list:
    df = (df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')
df1

**Output:**
 Date  A   B
0  2021-01-01  1   6
1  2021-01-02  2   7
2  2021-01-03  3   8
3  2021-01-04  4   9
4  2021-01-05  5  10

I have tried various ways to filter, such as .loc, writing functions, and creating a mask, but still can't get it to work. Another thing to note is that I am doing more formatting as part of this loop, and all the other formats are applied except this one. Any help is greatly appreciated! Thanks!

CodePudding user response:

The issue here is that you're simply reassigning the variable df in your for loop without actually writing the result back into df_list.

This solves your issue:

df_list = [df1, df2]
output_list = []

for df in df_list:
    df_filter = (df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')
    output_list.append(df.loc[df_filter])

output_list now contains the filtered dataframes.

CodePudding user response:

As @anky mentioned, you need to specify your 'Date' columns as type of datetime. In addition, in your for loop you get boolean. You need to use it for selection.

[...]
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

df_list_filtered = []
for df in df_list:
    df = df[(df['Date'] > '2021-01-01') & (df['Date'] < '2021-01-05')]
    df_list_filtered.append(df)

[...]
 
print(df_list_filtered[0])
        Date  A  B
1 2021-01-02  2  7
2 2021-01-03  3  8
3 2021-01-04  4  9
  • Related