Home > Software design >  How to split DF by dates based on another DF
How to split DF by dates based on another DF

Time:06-08

I have two DataFrames (df1, df2), both with a DateTime index type:

print(type(df1.index))  =>  pandas.core.indexes.datetimes.DatetimeIndex
print(type(df2.index))  =>  pandas.core.indexes.datetimes.DatetimeIndex

They look like:

df1:

Sample Date  Value_df1     
1992-01-02   430.0
1992-01-03   436.0
1992-01-04   439.0
1992-01-05   432.0
1992-01-06   427.0
1992-01-07   427.0
1992-01-08   425.0
1992-01-09   430.0
1992-01-10   441.0
1992-01-11   443.0
1992-01-12   441.0
1992-01-13   453.0
1992-01-14   469.0
1992-01-15   474.0
1992-01-16   471.0
1992-01-17   474.0
1992-01-18   474.0
1992-01-19   469.0
1992-01-20   464.0
1992-01-21   464.0

df2:

Sample Date  Value_df2     
1992-01-02   30.0
1992-01-10   42.0
1992-01-15   50.0
1992-01-20   44.0
1992-01-21   55.0

I want to split df1 based on the dates in df2. The date index of df2 will be the limits of each new df. Something like this:

Sample Date  NEW DF1    
1992-01-02   430.0
1992-01-03   436.0
1992-01-04   439.0
1992-01-05   432.0
1992-01-06   427.0
1992-01-07   427.0
1992-01-08   425.0
1992-01-09   430.0
1992-01-10   441.0

Sample Date  NEW DF2  
1992-01-10   441.0
1992-01-11   443.0
1992-01-12   441.0
1992-01-13   453.0
1992-01-14   469.0
1992-01-15   474.0

Sample Date  NEW DF3  
1992-01-15   474.0
1992-01-16   471.0
1992-01-17   474.0
1992-01-18   474.0
1992-01-19   469.0
1992-01-20   464.0

Sample Date  NEW DF4  
1992-01-20   464.0
1992-01-21   464.0

I have tried split and groupby, but I am still struggling.

Any idea would be appreciate it.

NEW INFORMATION:

I am trying to Estimate the load of sediments in a river and I am using different methods. One of them uses 'Non-periodic data' and it was defined by littlewood in 1998. I do not want to go into much detail. To put is as simple as I can:

I was thinking on doing an iteration, generating these NEWDF and overwriting them. I need to calculate for each NEWDF different values such as: number of days within each NEWDF, average of the value and standard deviation in each NEWDF, among others. Then I will create a variable result to store a final result needed. Something like: result = result. This way I will add the calculation from each NEWDF and each iteration to the variable result

I did this to get an array just with the date of df2: limit_dates = df2.index.values and then:

it  = iter(limit_dates)
result = 0
for x in it:
    x
    y=next(it)
    NEWDF = df1.loc[x:y]
    #some calculations...
    result  = result

However, limit_dates variable did not take all the ranges I need.

Thanks in advance for your help! :)

CodePudding user response:

it save dataframe to df_list

df2_date_list = df2['Sample Date'].tolist()
df_list = []
for idx, val in enumerate(df2_date_list):
    if val != df2_date_list[-1]:
        df_list.append(df1[(df1['Sample Date'] >= val) & (df1['Sample Date'] <= df2_date_list[idx   1])])

if you want to save to csv, just using below code

for idx, val in enumerate(df_list):
    val.to_csv(f'df{idx}.csv')

CodePudding user response:

Here's a possible solution using df.loc[row_index, column_index]:

df2 = df2.reset_index()

for i in range(len(df2)-1):
    new_df = df1.loc[df2.loc[i, 'Sample Date'] : df2.loc[i 1, 'Sample Date']]
    print(new_df)

Output:

           Value_df1
Sample Date          
1992-01-02     430.0
1992-01-03     436.0
1992-01-04     439.0
1992-01-05     432.0
1992-01-06     427.0
1992-01-07     427.0
1992-01-08     425.0
1992-01-09     430.0
1992-01-10     441.0
           Value_df1
Sample Date          
1992-01-10     441.0
1992-01-11     443.0
1992-01-12     441.0
1992-01-13     453.0
1992-01-14     469.0
1992-01-15     474.0
           Value_df1
Sample Date          
1992-01-15     474.0
1992-01-16     471.0
1992-01-17     474.0
1992-01-18     474.0
1992-01-19     469.0
1992-01-20     464.0
           Value_df1
Sample Date          
1992-01-20     464.0
1992-01-21     464.0
  • Related