I wanted to left join df2 on df1 and then keep the row that matches by group and if there is no matching group then I would like to keep the first row of the group in order to achieve df3 (the desired result). I was hoping you guys could help me with finding the optimal solution.
Here is my code to create the two dataframes and the required result.
import pandas as pd
import numpy as np
market = ['SP', 'SP', 'SP']
underlying = ['TSLA', 'GOOG', 'MSFT']
# DF1
df = pd.DataFrame(list(zip(market, underlying)),
columns=['market', 'underlying'])
market2 = ['SP', 'SP', 'SP', 'SP', 'SP']
underlying2 = [None, 'TSLA', 'GBX', 'GBM', 'GBS']
client2 = [17, 12, 100, 21, 10]
# DF2
df2 = pd.DataFrame(list(zip(market2, underlying2, client2)),
columns=['market', 'underlying', 'client'])
market3 = ['SP', 'SP', 'SP']
underlying3 = ['TSLA', 'GOOG', 'MSFT']
client3 = [12, 17, 17]
# Desired
df3 = pd.DataFrame(list(zip(market3, underlying3, client3)),
columns =['market', 'underlying', 'client'])
# This works but feels sub optimal
df3 = pd.merge(df,
df2,
how='left',
on=['market', 'underlying'])
df3 = pd.merge(df3,
df2,
how='left',
on=['market'])
df3 = df3.drop_duplicates(['market', 'underlying_x'])
df3['client'] = df3['client_x'].combine_first(df3['client_y'])
df3 = df3.drop(labels=['underlying_y', 'client_x', 'client_y'], axis=1)
df3 = df3.rename(columns={'underlying_x': 'underlying'})
Hope you guys could help, thankyou so much!
CodePudding user response:
Store the first value (a groupby might not be necessary if every single one in market is 'SP'), merge and fill with the first value:
fill_value = df2.groupby('market').client.first()
(df
.merge(
df2,
on = ['market', 'underlying'],
how = 'left')
.set_index('market')
.fillna({'client':fill_value}, downcast='infer')
)
underlying client
market
SP TSLA 12
SP GOOG 17
SP MSFT 17