Home > Software engineering >  how can merge with key and list of keys in pandas dataframe
how can merge with key and list of keys in pandas dataframe

Time:12-15

There is a key in df1 and df2 has a column having list of keys

df1 = pd.DataFrame({"key": ["D0", "D1", "D3", "D5", "D7"]})
df2 = pd.DataFrame({"keys" : [['D0','D1','D2'], ['D1','D2','D5'], ['D6','D7','D8'], ['D9','D10','D11', 'D12']],
                    "value" : [10, 20, 30, 40]})

df1:

    key
0   D0
1   D1
2   D3
3   D5
4   D7

df2:

    keys                value
0   [D0, D1, D2]        10
1   [D1, D2, D5]        20
2   [D6, D7, D8]        30
3   [D9, D10, D11, D12] 40

I'd like to merge df1 and df2 using 'key' column and 'keys' column and keep 'first' in case of duplicated values(ex 'D1')

df3:

    key value
0   D0  10.0
1   D1  10.0
2   D3  NaN
3   D5  20.0
4   D7  30.0

CodePudding user response:

Use DataFrame.join with DataFrame.explode and GroupBy.first:

df = df1.join(df2.explode('keys').groupby('keys').first(), on='key')
print (df)
  key  value
0  D0   10.0
1  D1   10.0
2  D3    NaN
3  D5   20.0
4  D7   30.0

CodePudding user response:

Use explode and merge:

df3 = df1.merge(df2.explode('keys'), left_on='key', right_on='keys', how='left') \
         .drop_duplicates('key').drop(columns='keys')
print(df3)

# Output:
  key  value
0  D0   10.0
1  D1   10.0
3  D3    NaN
4  D5   20.0
5  D7   30.0

Another way, use explode and map:

df3 = df1.assign(value=df1['key'].map(df2.explode('keys').drop_duplicates('keys') \
                                         .set_index('keys')['value']))
print(df3)

# Output:
  key  value
0  D0   10.0
1  D1   10.0
2  D3    NaN
3  D5   20.0
4  D7   30.0
  • Related