Home > OS >  pandas: group years by decade
pandas: group years by decade

Time:04-16

So I have data in CSV. Here is my code.

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

The result looks like this.

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

I want to group the data by year and type. Then I want to know the size of the each type on specific year. So here is my code.

grouped = data.groupby(['year', 'type']).size()
print(grouped)

The result look like this.

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

The problem is, how if I want to get the size data from 1910 until 2020 and the increase year is 10 (Per decade). So the year index will 1910, 1920, 1930, 1940, and so on until 2020.

CodePudding user response:

I see two simple options.

1- round the years to the lower 10:

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2- use pandas.cut:

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()
  • Related