I have a dataframe that loads a CSV. The csv is like this:
PROFIT STRING
16 A_B_C_D
3 A_D_C
-4 A_D_C
20 A_X_C
10 A_F_S
PROFIT is a float, string is a list of characters. The underscore "_" seperates them, so that A_B_C_D would be A,B,C and D individually.
I'm trying to see the profit distribution by character.
eg:
A: Total profit = 16 3-4 20 10 = 45
Mean = xxx
Median = yyy
B:
Total profit = 16 3 = 19
Mean = zzzz etc...
Can this be done using pandas, and if so how?
CodePudding user response:
Split
and explode
by column STRING
, then do groupby agg
on column PROFIT
df.assign(STRING=df['STRING'].str.split('_'))\
.explode('STRING').groupby('STRING')['PROFIT'].agg(['sum', 'mean', 'median'])
sum mean median
STRING
A 45 9.00 10.0
B 16 16.00 16.0
C 35 8.75 9.5
D 15 5.00 3.0
F 10 10.00 10.0
S 10 10.00 10.0
X 20 20.00 20.0