Home > Blockchain >  Updating Tag basis Date column basis condition
Updating Tag basis Date column basis condition

Time:01-24

Existing Dataframe :

id             Date                   Tag
01      sep 2021 - 2023               ok
02      feb 2019 - aug 2020           ok
03      07/2022 - 11/2022             ok
04      2013-2017                     ok

Conditions that need to be applied : checking which is the max year in the Date if max year is lesser <= 2020 Tag needs to updated to 'Old' else no change

Expected Dataframe :

id             Date                   Tag
01      sep 2021 - 2023               ok
02      feb 2019 - aug 2020           Old
03      07/2022 - 11/2022             ok
04      2013-2017                     Old

i tried approaching it with df['Dates'].dt.year.max() but how to apply conditional statement for the same

CodePudding user response:

You can use pandas.Series.findall with numpy.fromiter :

import numpy as np

#make a list of the two years
ser = df["Date"].str.findall("(\d{4})")

#get the maximum year
arr = np.fromiter((max(x) for x in ser), int)

#check if the max year is <= 2020 and make a mapping
df["Tag"] = pd.Series(arr).le(2020).map({False:"Ok", True: "Old"})

NB : If the maximum year is always at the right side of the column Date, you can simply use :

df["Tag"] = df["Date"].str.strip().str[-4:].le("2020").map({False:"Ok", True: "Old"})

​Output :

print(df)

   id                 Date  Tag
0   1      sep 2021 - 2023   Ok
1   2  feb 2019 - aug 2020  Old
2   3    07/2022 - 11/2022   Ok
3   4            2013-2017  Old
  • Related