Have two columns start_date and end_date (strings):
Some of the rows in end_date are formated like 9999-12-12 i want to replace 9999 for the YYYY in start_date
I tried extracting the first 4 characters (got a pd.Series) for the start_date column and using that to replace but i cannot find a way
How it is
start_date | end_date |
---|---|
2020-12-25 | 2020-12-28 |
2021-02-02 | 9999-02-09 |
2019-02-13 | 9999-02-15 |
How it should be
start_date | end_date |
---|---|
2020-12-25 | 2020-12-28 |
2021-02-02 | 2021-02-09 |
2019-02-13 | 2019-02-15 |
CodePudding user response:
Assuming your data frame is called df
:
df['end_date'] = df.apply(lambda row: row['end_date'].replace("9999", row['start_date'][0:4]), axis=1)
CodePudding user response:
Or use np.where
:
df['end_date'] = np.where(df.end_date.str[:4] == '9999', df.start_date.str[:4] df.end_date.str[4:], df.end_date)
df
start_date end_date
0 2020-12-25 2020-12-28
1 2021-02-02 2021-02-09
2 2019-02-13 2019-02-15