The two data frames left_a
and right_a
both have the columns start_date
. How to get the desired output below?
df_prep = pd.DataFrame({
'offr_id': ['1', '1', '2', '2'],
'mbr_id' : ['10','11','12','13'],
'start_date': ['', '','2021-09-01', '2021-09-01']
})
df_map_grocery = pd.DataFrame({
'offr_id': ['1', '2'],
'x2': ['a', 'b'],
'start_date': ['2021-09-06', '2021-09-06']
})
left_a = df_prep.set_index('offr_id')
right_a = df_map_grocery.set_index('offr_id')
df_prep = left_a.reindex(columns=left_a.columns.union(right_a.columns))
df_prep.update(right_a)
df_prep.reset_index(inplace=True)
df_prep
Current output:
offr_id mbr_id start_date x2
0 1 10 2021-09-06 a
1 1 11 2021-09-06 a
2 2 12 2021-09-06 b
3 2 13 2021-09-06 b
Desired output:
offr_id mbr_id start_date x2
0 1 10 a
1 1 11 a
2 2 12 2021-09-01 b
3 2 13 2021-09-01 b
CodePudding user response:
If I understand well, you want to join both dataframes use offr_id
− which you do using indexes − and you also want to ignore the start_date
from df_map_grocery
, only using the one from df_prep
.
If that is correct, you can simply do:
>>> df_prep.merge(df_map_grocery.drop(columns=['start_date']), on=['offr_id'])
offr_id mbr_id start_date x2
0 1 10 a
1 1 11 a
2 2 12 2021-09-01 b
3 2 13 2021-09-01 b
Alternately, using the indexes:
>>> left_a.join(right_a.drop(columns=['start_date'])).reset_index()
offr_id mbr_id start_date x2
0 1 10 a
1 1 11 a
2 2 12 2021-09-01 b
3 2 13 2021-09-01 b
CodePudding user response:
I think that there is no need for temporary dataframes left_a and right_a. Check if this code provides the desired output - at least for your short example dataframes it does.
df_prep = df_prep.merge(df_map_grocery, left_on = 'offr_id', right_on = 'offr_id')
df_prep.drop('start_date_y', inplace = True, axis = 1)
print(df_prep)
offr_id mbr_id start_date_x x2
0 1 10 a
1 1 11 a
2 2 12 2021-09-01 b
3 2 13 2021-09-01 b