So, I have the following dataframe
A B C
0 a1 {x1, x2, x3} {c1, c3, c5}
1 a2 {y1} {c1, c2, c3}
2 a3 {z1, z2} {c2, c4}
Now, for all rows where the set in the C column contains the elements c1 and c3, I want to unionize the set in B with set W = {w1, w2}. So in this case I want this result:
A B C
0 a1 {x1, x2, x3, w1, w2} {c1, c3, c5}
1 a2 {y1, w1, w2} {c1, c2, c3}
2 a3 {z1, z2} {c2, c4}
I'm now doing this.
uppersets = df.B.apply(lambda s: s.issuperset({c1, c3}))
list_B = df[uppersets].B.to_list()
list_B = [item.union(W) for item in list_B]
df['B'] = pd.Series(list_B)
But, is there a more efficient way to do this? I could also step away from using sets, but I i don't want the sets in column B to contain doubles.
Cheers in advance!
ps. Here is code to instantiate the DF:
df = pd.DataFrame({'A' : [1, 2, 3],
'B' : [{1, 2, 3}, {1}, {1,2}],
'C' : [{1,3,5}, {1,2,3}, {2,4}] })
ind_s = [j for j in range(3) if df.loc[j,'C'].issuperset({1, 3})]
list_B = df.loc[ind_s].B.to_list()
list_B = [item.union({10,20}) for item in list_B]
df.loc[ind_s,'B'] = pd.Series(data = list_B, index=bool_s)
CodePudding user response:
IIUC, you could do:
m = df['C']>{'c1', 'c3'}
df.loc[m, 'B'] = [e|W for e in df.loc[m, 'B']]
or, with apply
:
m = df['C']>{'c1', 'c3'}
df.loc[m, 'B'] = df.loc[m, 'B'].apply(W.union)
output:
A B C
0 a1 {x2, w2, w1, x3, x1} {c5, c3, c1}
1 a2 {w2, y1, w1} {c2, c3, c1}
2 a3 {z1, z2} {c2, c4}
reproducible input:
df = pd.DataFrame({'A': ['a1', 'a2', 'a3'],
'B': [{'x1', 'x2', 'x3'}, {'y1'}, {'z1', 'z2'}],
'C': [{'c1', 'c3', 'c5'}, {'c1', 'c2', 'c3'}, {'c2', 'c4'}]}
)