Home > OS >  Define start and end date of several DataFrames with pandas
Define start and end date of several DataFrames with pandas

Time:12-13

I have many DataFrames which have a different period lengths. I am trying to create a for loop to define for all those DataFrames a specific start and end day.

Here is a simple example:

df1:
    Dates       ID1 ID2
0   2021-01-01  0   1
1   2021-01-02  0   0
2   2021-01-03  1   0
3   2021-01-04  2   2
4   2021-01-05  1   4
5   2021-01-06  -1  -2

df2:
    Dates       ID1 ID2
0   2021-01-01  0   1
1   2021-01-02  1   2
2   2021-01-03  -1  3
3   2021-01-04  1   -1
4   2021-01-05  4   2

I want to define a specific start and end day as:

start = pd.to_datetime('2021-01-02')
end = pd.to_datetime('2021-01-04')

So far, I only figured out how to define the period for one DataFrame:

df1.loc[(df1['Dates'] >= start) & (df1['Dates'] <= end)]

Is there an easy method to loop over all DataFrames at the same time to define the start and end dates?

For reproducibility:

import pandas as pd
df1 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
    'ID1':[0,0,1,2,1,-1], 
    'ID2':[1,0,0,2,4,-2]})
df1['Dates'] = pd.to_datetime(df1['Dates'])

df2 = pd.DataFrame({
    'Dates':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05'],
    'ID1':[0,1,-1,1,4], 
    'ID2':[1,2,3,-1,2]})
df2['Dates'] = pd.to_datetime(df2['Dates'])

CodePudding user response:

You can store your dataframes in a list, and then apply your loc formula on all the dataframes in the list using list comprehension, and return back a new list of the resulting filtered dataframes:

# Create a list with your dataframes
dfs = [df1 , df2]

# Thresholds
start = pd.to_datetime('2021-01-02')
end = pd.to_datetime('2021-01-04')

# Filter all of them and store back
filtered_dfs = [df.loc[(df['Dates'] >= start) & (df['Dates'] <= end)] for df in dfs]

Result:

>>> print(filtered_dfs)

[       Dates  ID1  ID2
 1 2021-01-02    0    0
 2 2021-01-03    1    0
 3 2021-01-04    2    2,
        Dates  ID1  ID2
 1 2021-01-02    1    2
 2 2021-01-03   -1    3
 3 2021-01-04    1   -1]


>>> print(dfs)

[       Dates  ID1  ID2
 0 2021-01-01    0    1
 1 2021-01-02    0    0
 2 2021-01-03    1    0
 3 2021-01-04    2    2
 4 2021-01-05    1    4
 5 2021-01-06   -1   -2,
        Dates  ID1  ID2
 0 2021-01-01    0    1
 1 2021-01-02    1    2
 2 2021-01-03   -1    3
 3 2021-01-04    1   -1
 4 2021-01-05    4    2]
  • Related