Home > Net >  How to loop over unique dates in a pandas dataframe producing new dataframes in each iteration?
How to loop over unique dates in a pandas dataframe producing new dataframes in each iteration?

Time:09-08

I have a dataframe like below and need to create (1) a new dataframe for each unique date and (2) create a new global variable with the date of the new dataframe as the value. This needs to be in a loop.

Using the dataframe below, I need to iterate through 3 new dataframes, one for each date value (202107, 202108, and 202109). This loop occurs within an existing function that then uses the new dataframe and its respective global variable of each iteration in further calculations. For example, the first iteration would yield a new dataframe consisting of the first two rows of the below dataframe and a value for the new global variable of "202107." What is the most straightforward way of doing this?

Date Col1 Col2
202107 1.23 6.72
202107 1.56 2.54
202108 1.78 7.54
202108 1.53 7.43
202108 1.58 2.54
202109 1.09 2.43
202109 1.07 5.32

CodePudding user response:

Loop over the results of .groupby:

for _, new_df in df.groupby("Date"):
    print(new_df)
    print("-" * 80)

Prints:

     Date  Col1  Col2
0  202107  1.23  6.72
1  202107  1.56  2.54
--------------------------------------------------------------------------------
     Date  Col1  Col2
2  202108  1.78  7.54
3  202108  1.53  7.43
4  202108  1.58  2.54
--------------------------------------------------------------------------------
     Date  Col1  Col2
5  202109  1.09  2.43
6  202109  1.07  5.32
--------------------------------------------------------------------------------

Then you can store new_df to a list or a dictionary and use it afterwards.

CodePudding user response:

You can extract the unique date values y the .unique() method, and then store your new dataframes and dates in a dict to access then easily like :

unique_dates = init_df.Date.unique()

df_by_date = {
    str(date): init_df[init_df['Date'] == date] for date in unique_dates
    }

you use the dict like :

for date in unique_dates:
    print(date, ': \n', df_by_date[str(date)])

output:

202107 : 
      Date  Col1  Col2
0  202107  1.23  6.72
1  202107  1.56  2.54
202108 : 
      Date  Col1  Col2
2  202108  1.78  7.54
3  202108  1.53  7.43
4  202108  1.58  2.54
202109 : 
      Date  Col1  Col2
5  202109  1.09  2.43
6  202109  1.07  5.32
  • Related