Home > OS >  break values in pandas column into bins
break values in pandas column into bins

Time:11-09

I have a following DataFrame:

columns1 parametr_1 parametr_2 parametr_3
val_1 1 2 1
val_2 1 2 5
val_3 7 1 7
val_4 4 2 11

I want to break each column into bin, so it would look something like this:

column bin count
parametr_1 (0-1) 2
parametr_1 (1-inf) 2
parametr_2 (0-2) 3
parametr_2 (2-inf) 0
parametr_3 (0-5) 2
parametr_3 (5-inf) 2

and maybe having each parametr cell merged so I would only have single parametr_1, parametr_2 and parametr_3 cells in second column

Maybe there is a specific library to that?

CodePudding user response:

First is specified bins for each parameter column by dictioanry and call cut, then count values by Series.value_counts and reshape by DataFrame.melt, remove rows with missing values, convert to integers:

d = {'parametr_1':1,'parametr_2':2,'parametr_3':5}

for k, v in d.items():
    df[k] = pd.cut(df[k], bins=[0, v, np.inf])

df = (df.set_index('columns1')
       .apply(pd.value_counts)
       .melt(ignore_index=False, value_name='count', var_name='column')
       .dropna(subset=['count'])
       .astype({'count':int})
       .rename_axis('bin')
       .reset_index()[['column','bin','count']])
print (df)
       column         bin  count
0  parametr_1  (0.0, 1.0]      2
1  parametr_1  (1.0, inf]      2
2  parametr_2  (0.0, 2.0]      4
3  parametr_2  (2.0, inf]      0
4  parametr_3  (0.0, 5.0]      2
5  parametr_3  (5.0, inf]      2
  • Related