There's a function called cut() in R that can bin data by year to add the column "closeyr" like the table below.
df$closeyr <- cut(df$closedate,breaks="year")
Bank | closedate | closeyr |
---|---|---|
Bank1 | 2008-07-25 | 2008-01-01 |
Bank2 | 2008-10-20 | 2008-01-01 |
Bank3 | 2010-12-10 | 2010-01-01 |
Bank4 | 2005-10-01 | 2005-01-01 |
Bank5 | 2007-08-04 | 2007-01-01 |
Bank6 | 2005-06-10 | 2005-01-01 |
I am trying to translate this into python using the pandas cut() function but I am not sure how to bin data by year. If a row in "closedate" includes the year "2008" then the row value in the "closeyr" column should be "2008-01-01".
df['closeyr'] = pd.cut(df['closedate'], bins = )
How could I make a new column that organizes my "closedate" values by year to replicate the table above?
CodePudding user response:
df.closedate = pd.to_datetime(df.closedate)
df['closeyr'] = df.closedate.dt.strftime('%Y-01-01')
df
Out[37]:
Bank closedate closeyr
0 Bank1 2008-07-25 2008-01-01
1 Bank2 2008-10-20 2008-01-01
2 Bank3 2010-12-10 2010-01-01
3 Bank4 2005-10-01 2005-01-01
4 Bank5 2007-08-04 2007-01-01
5 Bank6 2005-06-10 2005-01-01
CodePudding user response:
Try this:
df['closeyr'] = pd.to_datetime(df['closedate']).dt.year.apply(lambda x: pd.Timestamp(x, 1, 1))