There is a key in df1 and df2 has a column having list of keys
df1 = pd.DataFrame({"key": ["D0", "D1", "D3", "D5", "D7"]})
df2 = pd.DataFrame({"keys" : [['D0','D1','D2'], ['D1','D2','D5'], ['D6','D7','D8'], ['D9','D10','D11', 'D12']],
"value" : [10, 20, 30, 40]})
df1:
key
0 D0
1 D1
2 D3
3 D5
4 D7
df2:
keys value
0 [D0, D1, D2] 10
1 [D1, D2, D5] 20
2 [D6, D7, D8] 30
3 [D9, D10, D11, D12] 40
I'd like to merge df1 and df2 using 'key' column and 'keys' column and keep 'first' in case of duplicated values(ex 'D1')
df3:
key value
0 D0 10.0
1 D1 10.0
2 D3 NaN
3 D5 20.0
4 D7 30.0
CodePudding user response:
Use DataFrame.join
with DataFrame.explode
and GroupBy.first
:
df = df1.join(df2.explode('keys').groupby('keys').first(), on='key')
print (df)
key value
0 D0 10.0
1 D1 10.0
2 D3 NaN
3 D5 20.0
4 D7 30.0
CodePudding user response:
Use explode
and merge
:
df3 = df1.merge(df2.explode('keys'), left_on='key', right_on='keys', how='left') \
.drop_duplicates('key').drop(columns='keys')
print(df3)
# Output:
key value
0 D0 10.0
1 D1 10.0
3 D3 NaN
4 D5 20.0
5 D7 30.0
Another way, use explode
and map
:
df3 = df1.assign(value=df1['key'].map(df2.explode('keys').drop_duplicates('keys') \
.set_index('keys')['value']))
print(df3)
# Output:
key value
0 D0 10.0
1 D1 10.0
2 D3 NaN
3 D5 20.0
4 D7 30.0