I have two DataFrames (df1, df2), both with a DateTime index type:
print(type(df1.index)) => pandas.core.indexes.datetimes.DatetimeIndex
print(type(df2.index)) => pandas.core.indexes.datetimes.DatetimeIndex
They look like:
df1:
Sample Date Value_df1
1992-01-02 430.0
1992-01-03 436.0
1992-01-04 439.0
1992-01-05 432.0
1992-01-06 427.0
1992-01-07 427.0
1992-01-08 425.0
1992-01-09 430.0
1992-01-10 441.0
1992-01-11 443.0
1992-01-12 441.0
1992-01-13 453.0
1992-01-14 469.0
1992-01-15 474.0
1992-01-16 471.0
1992-01-17 474.0
1992-01-18 474.0
1992-01-19 469.0
1992-01-20 464.0
1992-01-21 464.0
df2:
Sample Date Value_df2
1992-01-02 30.0
1992-01-10 42.0
1992-01-15 50.0
1992-01-20 44.0
1992-01-21 55.0
I want to split df1 based on the dates in df2. The date index of df2 will be the limits of each new df. Something like this:
Sample Date NEW DF1
1992-01-02 430.0
1992-01-03 436.0
1992-01-04 439.0
1992-01-05 432.0
1992-01-06 427.0
1992-01-07 427.0
1992-01-08 425.0
1992-01-09 430.0
1992-01-10 441.0
Sample Date NEW DF2
1992-01-10 441.0
1992-01-11 443.0
1992-01-12 441.0
1992-01-13 453.0
1992-01-14 469.0
1992-01-15 474.0
Sample Date NEW DF3
1992-01-15 474.0
1992-01-16 471.0
1992-01-17 474.0
1992-01-18 474.0
1992-01-19 469.0
1992-01-20 464.0
Sample Date NEW DF4
1992-01-20 464.0
1992-01-21 464.0
I have tried split
and groupby
, but I am still struggling.
Any idea would be appreciate it.
NEW INFORMATION:
I am trying to Estimate the load of sediments in a river and I am using different methods. One of them uses 'Non-periodic data' and it was defined by littlewood in 1998. I do not want to go into much detail. To put is as simple as I can:
I was thinking on doing an iteration, generating these NEWDF and overwriting them. I need to calculate for each NEWDF different values such as: number of days within each NEWDF, average of the value and standard deviation in each NEWDF, among others. Then I will create a variable result
to store a final result needed. Something like: result = result
. This way I will add the calculation from each NEWDF and each iteration to the variable result
I did this to get an array just with the date of df2: limit_dates = df2.index.values
and then:
it = iter(limit_dates)
result = 0
for x in it:
x
y=next(it)
NEWDF = df1.loc[x:y]
#some calculations...
result = result
However, limit_dates
variable did not take all the ranges I need.
Thanks in advance for your help! :)
CodePudding user response:
it save dataframe to df_list
df2_date_list = df2['Sample Date'].tolist()
df_list = []
for idx, val in enumerate(df2_date_list):
if val != df2_date_list[-1]:
df_list.append(df1[(df1['Sample Date'] >= val) & (df1['Sample Date'] <= df2_date_list[idx 1])])
if you want to save to csv, just using below code
for idx, val in enumerate(df_list):
val.to_csv(f'df{idx}.csv')
CodePudding user response:
Here's a possible solution using df.loc[row_index, column_index]
:
df2 = df2.reset_index()
for i in range(len(df2)-1):
new_df = df1.loc[df2.loc[i, 'Sample Date'] : df2.loc[i 1, 'Sample Date']]
print(new_df)
Output:
Value_df1
Sample Date
1992-01-02 430.0
1992-01-03 436.0
1992-01-04 439.0
1992-01-05 432.0
1992-01-06 427.0
1992-01-07 427.0
1992-01-08 425.0
1992-01-09 430.0
1992-01-10 441.0
Value_df1
Sample Date
1992-01-10 441.0
1992-01-11 443.0
1992-01-12 441.0
1992-01-13 453.0
1992-01-14 469.0
1992-01-15 474.0
Value_df1
Sample Date
1992-01-15 474.0
1992-01-16 471.0
1992-01-17 474.0
1992-01-18 474.0
1992-01-19 469.0
1992-01-20 464.0
Value_df1
Sample Date
1992-01-20 464.0
1992-01-21 464.0