Compute the sum of values in one column, if row in another column contains item in list-CodePudding

In python I have a dataframe that looks like:

Column1               Column2
[a,b,c,d]             4    
[a,f,g]               3
[b,c]                 6
[a,c,d]               5

I would like to compute a third column, that adds the value in Column2 for each time one of the items is present in Column1 (for example in the first row it would be a=4 3 5, b=4 6, c=5 6 5, d=4 5, so in total 4 3 5 4 6 5 6 5 4 5=47):

Column1               Column2     Column3
[a,b,c,d]             4           47
[a,f,g]               3           21
[b,c]                 6           26
[a,c,d]               5           37

I've tried my best with query and indexing but with no success, thank you in advance!

CodePudding user response：

Try with explode, then create the mapping dict and groupby back

s = df.explode('Column1')
d = s.groupby('Column1')['Column2'].sum()
s['new'] = s['Column1'].map(d)
out = s.groupby(level=0).agg({'Column1':list,'Column2':'first','new':'sum'})
out
        Column1  Column2  new
0  [a, b, c, d]        4   46
1     [a, f, g]        3   18
2        [b, c]        6   25
3     [a, c, d]        5   36

Notice :

c = 4 6 5

CodePudding user response：

df = pd.DataFrame({'Column1': [['a', 'b', 'c', 'd'], ['a', 'f', 'g'], ['b', 'c'], ['a', 'c', 'd']],
                   'Column2': [4, 3, 6, 5]})
df1 = df.explode('Column1')
df['Column3'] = df1.groupby(level=0).apply(
    lambda d: d.Column1.apply(lambda x: df1.loc[df1.Column1 == x, 'Column2'].sum()).sum())
print(df)

        Column1  Column2  Column3
0  [a, b, c, d]        4       46
1     [a, f, g]        3       18
2        [b, c]        6       25
3     [a, c, d]        5       36

CodePudding user response：

Let's start from the easier to comprehend version, step by step.

Explode Column1:

wrk = df.explode(column='Column1')

The result is:

  Column1  Column2
0       a        4
0       b        4
0       c        4
0       d        4
1       a        3
1       f        3
1       g        3
2       b        6
2       c        6
3       a        5
3       c        5
3       d        5

Compute weights for each element from lists in Column1:

weight = wrk.groupby('Column1').sum().rename(columns={'Column2': 'Weight'})

The result is:

         Weight
Column1        
a            12
b            10
c            15
d             9
f             3
g             3

Note some differences to your counting, e.g. weight for c is 4 6 5 = 15.

Join Column1 from wrk with weight:

 wrk2 = wrk[['Column1']].join(weight, on='Column1')

The result is:

  Column1  Weight
0       a      12
0       b      10
0       c      15
0       d       9
1       a      12
1       f       3
1       g       3
2       b      10
2       c      15
3       a      12
3       c      15
3       d       9

And the final step is to compute the new column:

df['Column3'] = wrk2.groupby(level=0).Weight.sum()

The result is:

        Column1  Column2  Column3
0  [a, b, c, d]        4       46
1     [a, f, g]        3       18
2        [b, c]        6       25
3     [a, c, d]        5       36

But if you want more concise code, you can "compress" the above solution to:

wrk = df.explode(column='Column1')
df['Column3'] = wrk[['Column1']].join(wrk.groupby('Column1').sum().rename(
    columns={'Column2': 'Weight'}), on='Column1').groupby(level=0).Weight.sum()