Home > OS >  How to group by number of bins a ordered dataframe?
How to group by number of bins a ordered dataframe?

Time:02-20

I have a dataframe like that:

year count_yes count_no
1900 5 7
1903 5 3
1915 14 6
1919 6 14

I want to have two bins, independently of the value itself.

How can I group those categories and sum its values?

Expected result:

year count_yes count_no
1900 10 10
1910 20 20

Logic: Grouped the first two rows (1900 and 1903) and the two last rows (1915 and 1919) and summed the values of each category

I want to create a stacked percentage column graphic, so 1900 would be 50/50% and 1910 would be also 50/50%.

I've already created the function to build this graphic, I just need to adjust the dataframe size into bins to create a better distribution and visualization

CodePudding user response:

This is a way to do what you need, if you are ok using the decades as index:

df['year'] = (df.year//10)*10
df_group   = df.groupby('year').sum()

Output>>>

df_group

       count_yes    count_no
year        
1900          10          10
1910          20          20

CodePudding user response:

You can bin the years with pandas.cut and aggregate with groupby sum:

bins = list(range(1900, df['year'].max() 10, 10))
group = pd.cut(df['year'], bins=bins, labels=bins[:-1], right=False)

df.drop('year', axis=1).groupby(group).sum().reset_index()

If you only want to specify the number of bins, compute group with:

group = pd.cut(df['year'], bins=2, right=False)

output:

   year  count_yes  count_no
0  1900         10        10
1  1910         20        20
  • Related