Home > Software design >  How to choose column 'start_date' from a data frame when a second data frame also has the
How to choose column 'start_date' from a data frame when a second data frame also has the

Time:09-29

The two data frames left_a and right_a both have the columns start_date. How to get the desired output below?

df_prep = pd.DataFrame({
    'offr_id': ['1', '1', '2', '2'],  
    'mbr_id' : ['10','11','12','13'],                
    'start_date': ['', '','2021-09-01', '2021-09-01']
})

df_map_grocery = pd.DataFrame({
    'offr_id': ['1', '2'],
    'x2': ['a', 'b'],
    'start_date': ['2021-09-06', '2021-09-06']
})

left_a = df_prep.set_index('offr_id')
right_a = df_map_grocery.set_index('offr_id')
df_prep = left_a.reindex(columns=left_a.columns.union(right_a.columns))
df_prep.update(right_a)
df_prep.reset_index(inplace=True) 
df_prep

Current output:

  offr_id mbr_id  start_date x2
0       1     10  2021-09-06  a
1       1     11  2021-09-06  a
2       2     12  2021-09-06  b
3       2     13  2021-09-06  b

Desired output:

  offr_id mbr_id  start_date x2
0       1     10              a
1       1     11              a
2       2     12  2021-09-01  b
3       2     13  2021-09-01  b

CodePudding user response:

If I understand well, you want to join both dataframes use offr_id − which you do using indexes − and you also want to ignore the start_date from df_map_grocery, only using the one from df_prep.

If that is correct, you can simply do:

>>> df_prep.merge(df_map_grocery.drop(columns=['start_date']), on=['offr_id'])
  offr_id mbr_id  start_date x2
0       1     10              a
1       1     11              a
2       2     12  2021-09-01  b
3       2     13  2021-09-01  b

Alternately, using the indexes:

>>> left_a.join(right_a.drop(columns=['start_date'])).reset_index()
  offr_id mbr_id  start_date x2
0       1     10              a
1       1     11              a
2       2     12  2021-09-01  b
3       2     13  2021-09-01  b

CodePudding user response:

I think that there is no need for temporary dataframes left_a and right_a. Check if this code provides the desired output - at least for your short example dataframes it does.

 df_prep = df_prep.merge(df_map_grocery, left_on = 'offr_id', right_on = 'offr_id')

 df_prep.drop('start_date_y', inplace = True, axis = 1)

 print(df_prep)

 offr_id  mbr_id start_date_x x2
0       1     10               a
1       1     11               a
2       2     12   2021-09-01  b
3       2     13   2021-09-01  b
  • Related