Home > Net >  How to unionize certain sets in pandas dataframe with one set quickly
How to unionize certain sets in pandas dataframe with one set quickly

Time:03-08

So, I have the following dataframe

    A  B            C
0  a1 {x1, x2, x3} {c1, c3, c5}
1  a2 {y1}         {c1, c2, c3}
2  a3 {z1, z2}     {c2, c4}

Now, for all rows where the set in the C column contains the elements c1 and c3, I want to unionize the set in B with set W = {w1, w2}. So in this case I want this result:

    A  B                      C
0  a1 {x1, x2, x3, w1, w2}   {c1, c3, c5}
1  a2 {y1, w1, w2}           {c1, c2, c3}
2  a3 {z1, z2}               {c2, c4}

I'm now doing this.

uppersets = df.B.apply(lambda s: s.issuperset({c1, c3}))
list_B    = df[uppersets].B.to_list()
list_B    = [item.union(W) for item in list_B]
df['B']   = pd.Series(list_B)

But, is there a more efficient way to do this? I could also step away from using sets, but I i don't want the sets in column B to contain doubles.

Cheers in advance!

ps. Here is code to instantiate the DF:

df = pd.DataFrame({'A' : [1, 2, 3],
                  'B' : [{1, 2, 3}, {1}, {1,2}],
                  'C' : [{1,3,5}, {1,2,3}, {2,4}] })

ind_s  = [j for j in range(3) if df.loc[j,'C'].issuperset({1, 3})] 
list_B = df.loc[ind_s].B.to_list()
list_B = [item.union({10,20}) for item in list_B]
df.loc[ind_s,'B'] = pd.Series(data = list_B, index=bool_s)

CodePudding user response:

IIUC, you could do:

m = df['C']>{'c1', 'c3'}
df.loc[m, 'B'] = [e|W for e in df.loc[m, 'B']]

or, with apply:

m = df['C']>{'c1', 'c3'}
df.loc[m, 'B'] = df.loc[m, 'B'].apply(W.union)

output:

    A                     B             C
0  a1  {x2, w2, w1, x3, x1}  {c5, c3, c1}
1  a2          {w2, y1, w1}  {c2, c3, c1}
2  a3              {z1, z2}      {c2, c4}

reproducible input:

df = pd.DataFrame({'A': ['a1', 'a2', 'a3'],
                   'B': [{'x1', 'x2', 'x3'}, {'y1'}, {'z1', 'z2'}],
                   'C': [{'c1', 'c3', 'c5'}, {'c1', 'c2', 'c3'}, {'c2', 'c4'}]}
                 )
  • Related