Let's say I have a very simple dataframe with one column, year.
There would be 14 distinct years, from 2010 to 2023.
I would need to bin/bucket these years into three categories, 'old', 'medium', and 'new' where new would be the 3 most recent years (2023,2022,2021), medium would be 2015-2020, and old would be 2010-2014.
How would I do this?
CodePudding user response:
You're looking for pandas.cut
.
Assuming (df
) if your dataframe, you can use :
bins = [2010, 2014, 2020, 2023]
labels = ["old", "medium", "new"]
df["cat"] = pd.cut(df["year"], bins=bins, labels=labels, include_lowest=True, right=True)
And here is an example to show you the output :
(
pd.DataFrame(pd.date_range("2010", periods=14, freq="Y").year, columns=["year"])
.assign(cat = lambda df_: pd.cut(df_["year"],
bins=[2010, 2014, 2020, 2023],
labels=["old", "medium", "new"],
include_lowest=True, right=True))
)
Output :
year cat
0 2010 old
1 2011 old
2 2012 old
3 2013 old
4 2014 old
5 2015 medium
6 2016 medium
7 2017 medium
8 2018 medium
9 2019 medium
10 2020 medium
11 2021 new
12 2022 new
13 2023 new
CodePudding user response:
You may just create a hash like the following and use the year as the key to get its bin.
bins = {'2023' : 'new',
'2022' : 'new',
'2021' : 'new',
'2020' : 'medium',
'2019' : 'medium',
'2018' : 'medium',
'2017' : 'medium',
'2016' : 'medium',
'2015' : 'medium',
'2014' : 'old',
'2013' : 'old',
'2012' : 'old',
'2011' : 'old',
'2010' : 'old'
}