I have the following two dataframes:
df1=
date col1
0 2023-01-01 16:00:00 100
1 2023-01-01 16:15:00 120
2 2023-01-01 16:30:00 140
3 2023-01-01 16:45:00 160
4 2023-01-01 17:00:00 200
5 2023-01-01 17:15:00 430
6 2023-01-01 17:30:00 890
df2 =
date col2 col3
0 2023-01-01 16:00:00 100 200
1 2023-01-01 16:15:00 120 400
2 2023-01-01 17:00:00 200 500
and in df2
I have some missing timestamps compared to df1
. I am able to find those timestamps using the following code:
df1[~df1['date'].isin(df2['date'])]
I want to populate those missing timestamps in df2
and fill in the values of the columns with the average value of the two previous rows.
So the new df2
should look like this:
df2 =
date col2 col3
0 2023-01-01 16:00:00 100 200
1 2023-01-01 16:15:00 120 400
2 2023-01-01 16:30:00 110 300
3 2023-01-01 16:45:00 115 350
4 2023-01-01 17:00:00 200 500
5 2023-01-01 17:15:00 257.5 425
6 2023-01-01 17:30:00 228.75 462.5
CodePudding user response:
Not ideal solution via iteration:
df1 = [
['2023-01-01 16:00:00', 100],
['2023-01-01 16:15:00', 120],
['2023-01-01 16:30:00', 140],
['2023-01-01 16:45:00', 160],
['2023-01-01 17:00:00', 200],
['2023-01-01 17:15:00', 430],
['2023-01-01 17:30:00', 890],
]
df2 = [
['2023-01-01 16:00:00', 100, 200],
['2023-01-01 16:15:00', 120, 400],
['2023-01-01 17:00:00', 200, 500],
]
df1= pd.DataFrame(df1, columns = ['date', 'col1'])
df2= pd.DataFrame(df2, columns = ['date', 'col2', 'col3'])
missing = df1[~df1['date'].isin(df2['date'])]
missing = missing.drop(['col1'], axis=1)
merged = pd.concat([df2, missing])
merged.sort_values('date', inplace=True, ignore_index=True)
for index, row in merged.iterrows():
if np.isnan(row['col2']):
merged['col2'].at[index] = merged['col2'].iloc[[index-1, index-2]].mean()
if np.isnan(row['col3']):
merged['col3'].at[index] = merged['col3'].iloc[[index-1, index-2]].mean()
print(merged)
Output:
date | col2 | col3 |
---|---|---|
2023-01-01 16:00:00 | 100.00 | 200.0 |
2023-01-01 16:15:00 | 120.00 | 400.0 |
2023-01-01 16:30:00 | 110.00 | 300.0 |
2023-01-01 16:45:00 | 115.00 | 350.0 |
2023-01-01 17:00:00 | 200.00 | 500.0 |
2023-01-01 17:15:00 | 157.50 | 425.0 |
2023-01-01 17:30:00 | 178.75 | 462.5 |