Home > database >  Bin data by year using pandas cut() function
Bin data by year using pandas cut() function

Time:03-11

There's a function called cut() in R that can bin data by year to add the column "closeyr" like the table below.

df$closeyr <- cut(df$closedate,breaks="year")
Bank closedate closeyr
Bank1 2008-07-25 2008-01-01
Bank2 2008-10-20 2008-01-01
Bank3 2010-12-10 2010-01-01
Bank4 2005-10-01 2005-01-01
Bank5 2007-08-04 2007-01-01
Bank6 2005-06-10 2005-01-01

I am trying to translate this into python using the pandas cut() function but I am not sure how to bin data by year. If a row in "closedate" includes the year "2008" then the row value in the "closeyr" column should be "2008-01-01".

df['closeyr'] = pd.cut(df['closedate'], bins = )

How could I make a new column that organizes my "closedate" values by year to replicate the table above?

CodePudding user response:

df.closedate = pd.to_datetime(df.closedate)
df['closeyr'] = df.closedate.dt.strftime('%Y-01-01')
df
Out[37]: 
    Bank  closedate     closeyr
0  Bank1 2008-07-25  2008-01-01
1  Bank2 2008-10-20  2008-01-01
2  Bank3 2010-12-10  2010-01-01
3  Bank4 2005-10-01  2005-01-01
4  Bank5 2007-08-04  2007-01-01
5  Bank6 2005-06-10  2005-01-01

CodePudding user response:

Try this:

df['closeyr'] = pd.to_datetime(df['closedate']).dt.year.apply(lambda x: pd.Timestamp(x, 1, 1))
  • Related