In python I have a dataframe that looks like:
Column1 Column2
[a,b,c,d] 4
[a,f,g] 3
[b,c] 6
[a,c,d] 5
I would like to compute a third column, that adds the value in Column2 for each time one of the items is present in Column1 (for example in the first row it would be a=4 3 5, b=4 6, c=5 6 5, d=4 5, so in total 4 3 5 4 6 5 6 5 4 5=47):
Column1 Column2 Column3
[a,b,c,d] 4 47
[a,f,g] 3 21
[b,c] 6 26
[a,c,d] 5 37
I've tried my best with query and indexing but with no success, thank you in advance!
CodePudding user response:
Try with explode
, then create the mapping dict and groupby
back
s = df.explode('Column1')
d = s.groupby('Column1')['Column2'].sum()
s['new'] = s['Column1'].map(d)
out = s.groupby(level=0).agg({'Column1':list,'Column2':'first','new':'sum'})
out
Column1 Column2 new
0 [a, b, c, d] 4 46
1 [a, f, g] 3 18
2 [b, c] 6 25
3 [a, c, d] 5 36
Notice :
c = 4 6 5
CodePudding user response:
df = pd.DataFrame({'Column1': [['a', 'b', 'c', 'd'], ['a', 'f', 'g'], ['b', 'c'], ['a', 'c', 'd']],
'Column2': [4, 3, 6, 5]})
df1 = df.explode('Column1')
df['Column3'] = df1.groupby(level=0).apply(
lambda d: d.Column1.apply(lambda x: df1.loc[df1.Column1 == x, 'Column2'].sum()).sum())
print(df)
Column1 Column2 Column3
0 [a, b, c, d] 4 46
1 [a, f, g] 3 18
2 [b, c] 6 25
3 [a, c, d] 5 36
CodePudding user response:
Let's start from the easier to comprehend version, step by step.
Explode Column1:
wrk = df.explode(column='Column1')
The result is:
Column1 Column2 0 a 4 0 b 4 0 c 4 0 d 4 1 a 3 1 f 3 1 g 3 2 b 6 2 c 6 3 a 5 3 c 5 3 d 5
Compute weights for each element from lists in Column1:
weight = wrk.groupby('Column1').sum().rename(columns={'Column2': 'Weight'})
The result is:
Weight Column1 a 12 b 10 c 15 d 9 f 3 g 3
Note some differences to your counting, e.g. weight for c is 4 6 5 = 15.
Join Column1 from wrk with weight:
wrk2 = wrk[['Column1']].join(weight, on='Column1')
The result is:
Column1 Weight 0 a 12 0 b 10 0 c 15 0 d 9 1 a 12 1 f 3 1 g 3 2 b 10 2 c 15 3 a 12 3 c 15 3 d 9
And the final step is to compute the new column:
df['Column3'] = wrk2.groupby(level=0).Weight.sum()
The result is:
Column1 Column2 Column3 0 [a, b, c, d] 4 46 1 [a, f, g] 3 18 2 [b, c] 6 25 3 [a, c, d] 5 36
But if you want more concise code, you can "compress" the above solution to:
wrk = df.explode(column='Column1')
df['Column3'] = wrk[['Column1']].join(wrk.groupby('Column1').sum().rename(
columns={'Column2': 'Weight'}), on='Column1').groupby(level=0).Weight.sum()