I have two DataFrames like that below:
df_1 = pd.DataFrame({
'wag': [111, 111, 222],
'date': ['2019-08-30', '2019-09-20', '2019-08-10']
})
and
df_2 = pd.DataFrame({
'wag': [111, 111, 111, 222, 222, 333, 333, 333],
'msr_date': ['2019-08-30', '2019-08-30', '2019-08-30',
'2019-08-30', '2019-08-30', '2019-08-30',
'2019-08-30', '2019-08-30'],
'measurements': [1, 2, 3, 4, 5, 6, 7, 8]
})
df_1
and df_2
are below:
wag | date |
---|---|
111 | 2019-08-30 |
111 | 2019-09-20 |
222 | 2019-08-10 |
wag | msr_date | measurements |
---|---|---|
111 | 2019-03-29 | 1 |
111 | 2019-03-19 | 2 |
111 | 2019-03-10 | 3 |
222 | 2019-09-30 | 4 |
222 | 2019-09-20 | 5 |
333 | 2019-08-30 | 6 |
333 | 2019-08-30 | 7 |
333 | 2019-09-20 | 8 |
So, I have two DataFrames df_1
and df_2
and I want to iterate through them at the same time and filter df_2
. I need to iterate based on wag
in both tables.
If wag
in df_1
is in df_2
then I have to compare date
and mrs_date
and
- if
mrs_date <= date
then I append all rows of thatwag
- else I check if I have one more condition for that row in
df_1
for thatwag
and again repeat the comparison of dates. - When conditions / rows are finished for that wags we go to the next value of
wag
.
For example, here I first check the first row of df_1
and if mrs_date <= date
in df_2
then I append all rows to df_new
. If not then I go to the next date
for that wag
. If we have one if not then check another wag
. So finally, for this example I will get df_new
as below:
wag | msr_date | measurements |
---|---|---|
111 | 2019-03-29 | 1 |
111 | 2019-03-19 | 2 |
111 | 2019-03-10 | 3 |
In reality, there are so many values in df_1
and in df_2
so for_loop
should be quite complicated and I couldn't find the way how to do it. If it is not clear how I described the task please ask.
CodePudding user response:
Merge your 2 dataframes on wag
column then keep only rows that match your condition msr_date <= date
. Finally, drop the column date
from df_1
and remove duplicates rows.
>>> df_2.merge(df_1, on='wag', how='left') \
.query('msr_date <= date') \
.drop(columns='date').drop_duplicates()
wag msr_date measurements
0 111 2019-03-29 1
2 111 2019-03-19 2
4 111 2019-03-10 3
CodePudding user response:
Using a for loop:
new_df = pd.DataFrame(data=None, columns=df_2.columns)
for idx, row in df_1.iterrows():
for idx2, row2 in df_2.iterrows():
if row['wag'] == row2['wag'] and row2['msr_date'] <= row['date']:
new_df = new_df.append(row2).drop_duplicates()