Existing Dataframe :
id Date Tag
01 sep 2021 - 2023 ok
02 feb 2019 - aug 2020 ok
03 07/2022 - 11/2022 ok
04 2013-2017 ok
Conditions that need to be applied : checking which is the max year in the Date if max year is lesser <= 2020 Tag needs to updated to 'Old' else no change
Expected Dataframe :
id Date Tag
01 sep 2021 - 2023 ok
02 feb 2019 - aug 2020 Old
03 07/2022 - 11/2022 ok
04 2013-2017 Old
i tried approaching it with df['Dates'].dt.year.max()
but how to apply conditional statement for the same
CodePudding user response:
You can use pandas.Series.findall
with numpy.fromiter
:
import numpy as np
#make a list of the two years
ser = df["Date"].str.findall("(\d{4})")
#get the maximum year
arr = np.fromiter((max(x) for x in ser), int)
#check if the max year is <= 2020 and make a mapping
df["Tag"] = pd.Series(arr).le(2020).map({False:"Ok", True: "Old"})
NB : If the maximum year is always at the right side of the column Date
, you can simply use :
df["Tag"] = df["Date"].str.strip().str[-4:].le("2020").map({False:"Ok", True: "Old"})
Output :
print(df)
id Date Tag
0 1 sep 2021 - 2023 Ok
1 2 feb 2019 - aug 2020 Old
2 3 07/2022 - 11/2022 Ok
3 4 2013-2017 Old