I have a following DataFrame:
columns1 | parametr_1 | parametr_2 | parametr_3 |
---|---|---|---|
val_1 | 1 | 2 | 1 |
val_2 | 1 | 2 | 5 |
val_3 | 7 | 1 | 7 |
val_4 | 4 | 2 | 11 |
I want to break each column into bin, so it would look something like this:
column | bin | count |
---|---|---|
parametr_1 | (0-1) | 2 |
parametr_1 | (1-inf) | 2 |
parametr_2 | (0-2) | 3 |
parametr_2 | (2-inf) | 0 |
parametr_3 | (0-5) | 2 |
parametr_3 | (5-inf) | 2 |
and maybe having each parametr cell merged so I would only have single parametr_1, parametr_2 and parametr_3 cells in second column
Maybe there is a specific library to that?
CodePudding user response:
First is specified bins for each parameter
column by dictioanry and call cut
, then count values by Series.value_counts
and reshape by DataFrame.melt
, remove rows with missing values, convert to integers:
d = {'parametr_1':1,'parametr_2':2,'parametr_3':5}
for k, v in d.items():
df[k] = pd.cut(df[k], bins=[0, v, np.inf])
df = (df.set_index('columns1')
.apply(pd.value_counts)
.melt(ignore_index=False, value_name='count', var_name='column')
.dropna(subset=['count'])
.astype({'count':int})
.rename_axis('bin')
.reset_index()[['column','bin','count']])
print (df)
column bin count
0 parametr_1 (0.0, 1.0] 2
1 parametr_1 (1.0, inf] 2
2 parametr_2 (0.0, 2.0] 4
3 parametr_2 (2.0, inf] 0
4 parametr_3 (0.0, 5.0] 2
5 parametr_3 (5.0, inf] 2