I would like to retrieve each row of dataframe 2 that are included in the timestamps of dataframe 1. The timestamps of dataframe 1 go in pairs (11 being the beginning and 12 the end).
The goal is to create a dictionary of dataframes to plot each curve.
df1
Timestamp Num
2021-01-01 08:00:00 11
2021-01-01 09:00:00 12
2021-01-01 10:00:00 11
2021-01-01 11:00:00 12
2021-01-01 12:00:00 11
2021-01-01 13:00:00 12
And
df2 Value
2021-01-01 07:30:00 66
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
2021-01-01 09:15:00 64
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
2021-01-01 11:15:00 60
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
I thought of making a mask, but it works only for one couple of timestamp of dataframe 1. I have the idea, but I can't write it in python:
start = df1.iloc[::2, :]["Timestamp"] #to have each 11
end = df2.iloc[1::2, :]["Timestamp"] ##to have each 12
for each (start, end) in df2 :
create a df
dict = dict.append(df)
The final result must be:
Dict:
final_df1 Value
2021-01-01 08:30:00 67
2021-01-01 08:45:00 67
final_df2 Value
2021-01-01 10:30:00 65
2021-01-01 10:30:00 61
2021-01-01 10:45:00 68
final_df3 Value
2021-01-01 12:30:00 66
2021-01-01 12:30:00 67
2021-01-01 12:45:00 67
I tried this :
df_12 = df_1.iloc[1::2, :]["Timestamp"]
df_11 = df_1.iloc[::2, :]["Timestamp"]
df_12 = pd.DataFrame(df_12)
df_11 = pd.DataFrame(df_11)
for row in df_11.iterrows():
for row in df_12.iterrows():
mask_Filtered = (df_2['Timestamp']>= df_11) & (df_2['Timestamp'<=df_12) <-----
df_2 = df_2.loc[mask_Filtered]
dict.append(df_2)
MemoryError: Unable to allocate 46.0 GiB for an array with shape (1090629, 5657) and data type float64
CodePudding user response:
Is it what you are looking for?
intervals = list(zip(df1[::2]['Timestamp'], df1[1::2]['Timestamp']))
bins = pd.IntervalIndex.from_tuples(intervals)
groups = df2.groupby(pd.cut(df2['Timestamp'], bins=bins))
dfs = {}
for idx, (_, df) in enumerate(groups, 1):
dfs[f"final_df{idx}"] = df
# or process individually here to avoid MemoryError
Output:
>>> dfs
{'final_df1':
Timestamp Num
1 2021-01-01 08:30:00 67
2 2021-01-01 08:45:00 67,
'final_df2':
Timestamp Num
4 2021-01-01 10:30:00 65
5 2021-01-01 10:30:00 61
6 2021-01-01 10:45:00 68,
'final_df3':
Timestamp Num
8 2021-01-01 12:30:00 66
9 2021-01-01 12:30:00 67
10 2021-01-01 12:45:00 67}