Home > other >  How to see distribution of values in a dataframe
How to see distribution of values in a dataframe

Time:05-17

I have a dataframe that loads a CSV. The csv is like this:

PROFIT  STRING
16  A_B_C_D
3   A_D_C
-4  A_D_C
20  A_X_C
10  A_F_S

PROFIT is a float, string is a list of characters. The underscore "_" seperates them, so that A_B_C_D would be A,B,C and D individually.

I'm trying to see the profit distribution by character.

eg:

A: Total profit = 16 3-4 20 10 = 45

Mean = xxx

Median = yyy

B:

Total profit = 16 3 = 19

Mean = zzzz etc...

Can this be done using pandas, and if so how?

CodePudding user response:

Split and explode by column STRING, then do groupby agg on column PROFIT

df.assign(STRING=df['STRING'].str.split('_'))\
  .explode('STRING').groupby('STRING')['PROFIT'].agg(['sum', 'mean', 'median'])

        sum   mean  median
STRING                    
A        45   9.00    10.0
B        16  16.00    16.0
C        35   8.75     9.5
D        15   5.00     3.0
F        10  10.00    10.0
S        10  10.00    10.0
X        20  20.00    20.0
  • Related