Home > Enterprise >  Keep the row for which the values of two columns match by group otherwise keep the first row by grou
Keep the row for which the values of two columns match by group otherwise keep the first row by grou

Time:11-19

I wanted to left join df2 on df1 and then keep the row that matches by group and if there is no matching group then I would like to keep the first row of the group in order to achieve df3 (the desired result). I was hoping you guys could help me with finding the optimal solution.

Here is my code to create the two dataframes and the required result.

import pandas as pd
import numpy as np

market = ['SP', 'SP', 'SP']
underlying = ['TSLA', 'GOOG', 'MSFT']
  
# DF1
df = pd.DataFrame(list(zip(market, underlying)),
               columns=['market', 'underlying'])

market2 = ['SP', 'SP', 'SP', 'SP', 'SP']
underlying2 = [None, 'TSLA', 'GBX', 'GBM', 'GBS']
client2 = [17, 12, 100, 21, 10]

# DF2
df2 = pd.DataFrame(list(zip(market2, underlying2, client2)),
               columns=['market', 'underlying', 'client'])

market3 = ['SP', 'SP', 'SP']
underlying3 = ['TSLA', 'GOOG', 'MSFT']
client3 = [12, 17, 17]

# Desired
df3 = pd.DataFrame(list(zip(market3, underlying3, client3)),
               columns =['market', 'underlying', 'client'])

# This works but feels sub optimal
df3 = pd.merge(df, 
               df2,
               how='left',
               on=['market', 'underlying'])
df3 = pd.merge(df3, 
               df2,
               how='left',
               on=['market'])
df3 = df3.drop_duplicates(['market', 'underlying_x'])
df3['client'] = df3['client_x'].combine_first(df3['client_y'])
df3 = df3.drop(labels=['underlying_y', 'client_x', 'client_y'], axis=1)
df3 = df3.rename(columns={'underlying_x': 'underlying'})

Hope you guys could help, thankyou so much!

CodePudding user response:

Store the first value (a groupby might not be necessary if every single one in market is 'SP'), merge and fill with the first value:

fill_value = df2.groupby('market').client.first()
(df
.merge(
    df2, 
    on = ['market', 'underlying'], 
    how = 'left')
.set_index('market')
.fillna({'client':fill_value}, downcast='infer')
)

       underlying  client
market                   
SP           TSLA      12
SP           GOOG      17
SP           MSFT      17
  • Related