I want to create column which take reference from df1 and map to df2. df1:
Id Value
[1,3,4] Furn
[1,3,4] Furn
[1,7,9] SYS
[1,2,3] YSU
[1,3,4] Furn
[1,7,9] SYS
df2:
Id Val2
[1,3,5,4] Kcv
[1,3,9,4] Kcm
[1,0,9,7] jks
[5,7,9,1] kja
[1,2,9,7] kla
[3,2,0,8] ola
[1,2,4,5] opa
df_output:
Id Val2 df1_reference
[1,3,5,4] Kcv [1,3,4] -- Every element present in df1.Id
[1,3,9,4] Kcm [1,3,4] -- Every element present in df1.Id
[1,0,9,7] jks [1,7,9] -- Every element present in df1.Id
[5,7,9,1] kja [1,7,9] -- Every element present in df1.Id
[1,2,9,7] kla [1,7,9] -- Every element present in df1.Id
[3,2,0,8] ola NaN
[1,2,4,5] opa NaN
I need to create "df1_reference" columns based on the element present in df.Id, if df.Id list element present in df2.Id then map with df.Id.
CodePudding user response:
You need to use a loop here. You can take advantage of the set
/frozenset
methods:
ids = set(df1['Id'].apply(frozenset)) # use tuple instead of frozenset
# if order matters
df2['df1_reference'] = [next((x for x in ids if x.issubset(s)), np.nan)
for s in df2['Id']]
Or if you want to keep the original objects of df1['Id']
:
ids = {frozenset(x): x for x in df1['Id']} # use tuple instead if order matters
df2['df1_reference'] = [next((v for k,v in ids.items() if k.issubset(s)), np.nan)
for s in df2['Id']]
output:
Id Val2 df1_reference
0 [1, 3, 5, 4] Kcv [1, 3, 4]
1 [1, 3, 9, 4] Kcm [1, 3, 4]
2 [1, 0, 9, 7] jks [1, 7, 9]
3 [5, 7, 9, 1] kja [1, 7, 9]
4 [1, 2, 9, 7] kla [1, 7, 9]
5 [3, 2, 0, 8] ola NaN
6 [1, 2, 4, 5] opa NaN