if I have a data frame like this, I've tried binning and value count, cannot figure out how to put them together or there is some other way?
value
-4
-4
-3
-2
0
44
59
199
I need an output like this
value value_cont
-4 2
-3 1
-2 1
0 1
1-200 3
CodePudding user response:
You have several options.
1- replace the values higher or equal to 1 with '1-200':
(df
.assign(value=df['value'].mask(df['value'].ge(1), '1-200'))
.groupby('value')['value'].agg(count='count')
.reset_index()
)
output:
value count
0 -4 2
1 -3 1
2 -2 1
3 0 1
4 1-200 3
2- use pandas.cut
with explicit bins:
bins = [-4,-3,-2,-1,0]
(pd
.cut(df['value'], [float('-inf')] bins [float('inf')], labels=bins ['1-200'])
.value_counts(sort=False).rename_axis('value')
.reset_index(name='count')
)
output:
value count
0 -4 2
1 -3 1
2 -2 1
3 -1 0
4 0 1
5 1-200 3
CodePudding user response:
You can try select the bin rows then drop them from original dataframe and add the bin value.
out = (df['value'].value_counts()
.to_frame('value_count')
.rename_axis('value').reset_index())
df_ = out[out['value'].between(1, 200)]
out = out[~out.index.isin(df_.index)]
out.loc[len(out)] = ['1-200', df_['value_count'].sum()]
print(out)
value value_count
0 -4 2
1 -3 1
2 -2 1
3 0 1
4 1-200 3