Home > Enterprise >  Create a new column only with the year from a string variable
Create a new column only with the year from a string variable

Time:04-10

I need to create a new column in the dataset that contains only the year. The dpro columns contains more text example: 1913/12/30 : classé MH. I´ve tried with other arguments but Something is missing and I am junior in python. Thanks

Code:

monuments["year_protec"] = pd.to_datetime(monuments["dpro"], format ="%Y",errors ="coerce")
monuments.head()

CodePudding user response:

Maybe you can try to clean up the string first, and convert it into datetime format and finally get the year part.

import pandas as pd
import re

s = ["1913/12/30 : classé MH", "1913/12/30 : classé MH","1913/12/30 : classé MH"]
df = pd.DataFrame({"date" : s})

#df 

    date
0   1913/12/30 : classé MH
1   1913/12/30 : classé MH
2   1913/12/30 : classé MH
drop = re.compile(r'[^(\d{4}\/\d{2}\/\d{2})]')
df["clean_date"] = df["date"].str.replace(drop, "")
df["year"] = pd.to_datetime(df["clean_date"], format = "%Y/%m/%d").dt.year
# df
    date                    clean_date  year
0   1913/12/30 : classé MH  1913/12/30  1913
1   1913/12/30 : classé MH  1913/12/30  1913
2   1913/12/30 : classé MH  1913/12/30  1913
  • Related