I have two data frames with several thousand lines, which are like the following two data frames:
I want the value of the goal column in the first data frame to be moved to the second data frame wherever the names of the campaigns are the same in the first and second data frames. I mean something like the following data frame:
CodePudding user response:
You have to do left join with df_1 in the left side and then use the existing goal column in df_1 to fill the nulls produced by join.
df_1 = pd.DataFrame()
df_2 = pd.DataFrame()
df_1['campaign'] = ['a', 'b', 'c', 'd']
df_1['goal'] =['order', 'order', 'off', 'order']
df_2['campaign'] = ['a', 'b', 'c']
df_2['goal'] = ['Subscription', 'order', 'Subscription']
# left join
df = df_1.merge(df_2.rename(columns={'goal': 'new_goal'}), on=['campaign'], how='left')
# replace nulls
df['new_goal'].fillna(df['goal'], inplace=True)
df
--- ---------- ------- --------------
| | campaign | goal | new_goal |
--- ---------- ------- --------------
| 0 | a | order | Subscription |
| 1 | b | order | order |
| 2 | c | off | Subscription |
| 3 | d | order | order |
--- ---------- ------- --------------
You can select the columns you need and rename them as per your need
df_final = df[['campaign', 'new_goal']].rename(columns={'new_goal': 'goal'})
CodePudding user response:
This will overwrite values in df1
import pandas as pd
df1 = pd.DataFrame({'campaign':['a','b','c','d'],'goal':['order','order','off','order',]})
df2 = pd.DataFrame({'campaign':['a','b','c','d'],'goal':['Subscription','order','Subscription','order',]})
df2.merge(df1, how= 'left')
>> campaign goal
0 a Subscription
1 b order
2 c Subscription
3 d order
CodePudding user response:
You can use pandas.DataFrame.merge for that:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
df1 = pd.DataFrame({'campaign': ['a','b','c','d'], 'goal': ['order','order','off','order']})
df2 = pd.DataFrame({'campaign': ['a','b','c'], 'goal': ['subscription','order','subscription']})
df_out=pd.merge(df2,df1,on='campaign',how='left',suffixes=('_df2','_df1'))
Result:
campaign goal_df2 goal_df1
0 a subscription order
1 b order order
2 c subscription off