Home > Mobile >  Convert value counts of multiple columns to pandas dataframe
Convert value counts of multiple columns to pandas dataframe

Time:08-30

I have a dataset in this form:

Name  Batch  DXYR  Emp  Lateral  GDX  MMT  CN
Joe    2       0          2       2    2    0  
Alan   0       1          1       2    0    0
Josh   1       1          2       1    1    2 
Max    0       1          0       0    0    2

These columns can have only three distinct values ie. 0, 1 and 2.. So, I need percent of value counts for each column in pandas dataframe..

I have simply make a loop like:

for i in df.columns:
  (df[i].value_counts()/df[i].count())*100

I am getting the output like:

0    90.608831
1     0.391169
2    9.6787899
Name: Batch, dtype: float64

0    95.545455
1     2.235422
2    2.6243553
Name: MX, dtype: float64

and so on...

These outputs are correct but I need it in pandas dataframe like this:

                 Batch  DXYR  Emp  Lateral  GDX     MMT    CN
Count_0_percent  98.32  52.5   22   54.5     44.2   53.4  76.01  
Count_1_percent  0.44   34.5   43   43.5     44.5   46.5  22.44
Count_2_percent  1.3    64.3   44   2.87     12.6   1.88  2.567

Can someone please suggest me how to get it

CodePudding user response:

You can melt the data, then use pd.crosstab:

melt = df.melt('Name')
pd.crosstab(melt['value'], melt['variable'], normalize='columns')

Or a bit faster (yet more verbose) with melt and groupby().value_counts():

(df.melt('Name')
   .groupby('variable')['value'].value_counts(normalize=True)
   .unstack('variable', fill_value=0)
)

Output:

variable  Batch   CN  DXYR  Emp Lateral   GDX   MMT
value                                              
0          0.50  0.5  0.25         0.25  0.25  0.50
1          0.25  0.0  0.75         0.25  0.25  0.25
2          0.25  0.5  0.00         0.50  0.50  0.25

Update: apply also works:

df.drop(columns=['Name']).apply(pd.Series.value_counts, normalize=True)
  • Related