So I have data in CSV. Here is my code.
data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)
The result looks like this.
title year name type \
0 Closet Monster 2015 Buffy #1 actor
1 Suuri illusioni 1985 Homo $ actor
2 Battle of the Sexes 2017 $hutter actor
3 Secret in Their Eyes 2015 $hutter actor
4 Steve Jobs 2015 $hutter actor
... ... ... ... ...
74996 Mia fora kai ena... moro 2011 Penelope Anastasopoulou actress
74997 The Magician King 2004 Tiannah Anastassiades actress
74998 Festival of Lights 2010 Zoe Anastassiou actress
74999 Toxic Tutu 2016 Zoe Anastassiou actress
75000 Fugitive Pieces 2007 Anastassia Anastassopoulou actress
character n
0 Buffy 4 31.0
1 Guests 22.0
2 Bobby Riggs Fan 10.0
3 2002 Dodger Fan NaN
4 1988 Opera House Patron NaN
... ... ...
74996 Popi voulkanizater 11.0
74997 Unicycle Race Attendant NaN
74998 Guidance Counselor 20.0
74999 Demon of Toxicity NaN
75000 Laundry Girl 25.0
[75001 rows x 6 columns]
I want to group the data by year and type. Then I want to know the size of the each type on specific year. So here is my code.
grouped = data.groupby(['year', 'type']).size()
print(grouped)
The result look like this.
year type
1912 actor 1
actress 2
1913 actor 9
actress 1
1914 actor 38
..
2019 actress 3
2020 actor 3
actress 1
2023 actor 1
actress 2
Length: 220, dtype: int64
The problem is, how if I want to get the size data from 1910 until 2020 and the increase year is 10 (Per decade). So the year index will 1910, 1920, 1930, 1940, and so on until 2020.
CodePudding user response:
I see two simple options.
1- round the years to the lower 10:
group = df['year']//10*10 # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()
2- use pandas.cut
:
years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()