Home > Software engineering >  Compute the sum of values in one column, if row in another column contains item in list
Compute the sum of values in one column, if row in another column contains item in list

Time:11-28

In python I have a dataframe that looks like:

Column1               Column2
[a,b,c,d]             4    
[a,f,g]               3
[b,c]                 6
[a,c,d]               5

I would like to compute a third column, that adds the value in Column2 for each time one of the items is present in Column1 (for example in the first row it would be a=4 3 5, b=4 6, c=5 6 5, d=4 5, so in total 4 3 5 4 6 5 6 5 4 5=47):

Column1               Column2     Column3
[a,b,c,d]             4           47
[a,f,g]               3           21
[b,c]                 6           26
[a,c,d]               5           37

I've tried my best with query and indexing but with no success, thank you in advance!

CodePudding user response:

Try with explode, then create the mapping dict and groupby back

s = df.explode('Column1')
d = s.groupby('Column1')['Column2'].sum()
s['new'] = s['Column1'].map(d)
out = s.groupby(level=0).agg({'Column1':list,'Column2':'first','new':'sum'})
out
        Column1  Column2  new
0  [a, b, c, d]        4   46
1     [a, f, g]        3   18
2        [b, c]        6   25
3     [a, c, d]        5   36

Notice :

c = 4 6 5

CodePudding user response:

df = pd.DataFrame({'Column1': [['a', 'b', 'c', 'd'], ['a', 'f', 'g'], ['b', 'c'], ['a', 'c', 'd']],
                   'Column2': [4, 3, 6, 5]})
df1 = df.explode('Column1')
df['Column3'] = df1.groupby(level=0).apply(
    lambda d: d.Column1.apply(lambda x: df1.loc[df1.Column1 == x, 'Column2'].sum()).sum())
print(df)
        Column1  Column2  Column3
0  [a, b, c, d]        4       46
1     [a, f, g]        3       18
2        [b, c]        6       25
3     [a, c, d]        5       36

CodePudding user response:

Let's start from the easier to comprehend version, step by step.

  1. Explode Column1:

    wrk = df.explode(column='Column1')
    

    The result is:

      Column1  Column2
    0       a        4
    0       b        4
    0       c        4
    0       d        4
    1       a        3
    1       f        3
    1       g        3
    2       b        6
    2       c        6
    3       a        5
    3       c        5
    3       d        5
    
  2. Compute weights for each element from lists in Column1:

    weight = wrk.groupby('Column1').sum().rename(columns={'Column2': 'Weight'})
    

    The result is:

             Weight
    Column1        
    a            12
    b            10
    c            15
    d             9
    f             3
    g             3
    

    Note some differences to your counting, e.g. weight for c is 4 6 5 = 15.

  3. Join Column1 from wrk with weight:

     wrk2 = wrk[['Column1']].join(weight, on='Column1')
    

    The result is:

      Column1  Weight
    0       a      12
    0       b      10
    0       c      15
    0       d       9
    1       a      12
    1       f       3
    1       g       3
    2       b      10
    2       c      15
    3       a      12
    3       c      15
    3       d       9
    
  4. And the final step is to compute the new column:

    df['Column3'] = wrk2.groupby(level=0).Weight.sum()
    

    The result is:

            Column1  Column2  Column3
    0  [a, b, c, d]        4       46
    1     [a, f, g]        3       18
    2        [b, c]        6       25
    3     [a, c, d]        5       36
    

But if you want more concise code, you can "compress" the above solution to:

wrk = df.explode(column='Column1')
df['Column3'] = wrk[['Column1']].join(wrk.groupby('Column1').sum().rename(
    columns={'Column2': 'Weight'}), on='Column1').groupby(level=0).Weight.sum()
  • Related