I need to create a new column in the dataset that contains only the year. The dpro
columns contains more text example: 1913/12/30 : classé MH. I´ve tried with other arguments but Something is missing and I am junior in python. Thanks
Code:
monuments["year_protec"] = pd.to_datetime(monuments["dpro"], format ="%Y",errors ="coerce")
monuments.head()
CodePudding user response:
Maybe you can try to clean up the string first, and convert it into datetime
format and finally get the year part.
import pandas as pd
import re
s = ["1913/12/30 : classé MH", "1913/12/30 : classé MH","1913/12/30 : classé MH"]
df = pd.DataFrame({"date" : s})
#df
date
0 1913/12/30 : classé MH
1 1913/12/30 : classé MH
2 1913/12/30 : classé MH
drop = re.compile(r'[^(\d{4}\/\d{2}\/\d{2})]')
df["clean_date"] = df["date"].str.replace(drop, "")
df["year"] = pd.to_datetime(df["clean_date"], format = "%Y/%m/%d").dt.year
# df
date clean_date year
0 1913/12/30 : classé MH 1913/12/30 1913
1 1913/12/30 : classé MH 1913/12/30 1913
2 1913/12/30 : classé MH 1913/12/30 1913