I have some dates in strings like
dates = ['052021','01052021','02112021']
Some dates contain the day in between like %m%d%Y and some not, like %m%Y. I need to be able to extract so that the previous list comes as
dates = ['052021','012021','022021']
I can only do it using regex and in one expression, I have tried something like using the pattern '(\d{2}(?:(\d{2})\d{4})' without any luck.
CodePudding user response:
Maybe try:
import regex as re
dates = ['052021','01052021','02112021']
new_dates = [re.sub(r'\B\d\d(\d{4})$', r'\1', s) for s in dates]
print(new_dates)
Prints:
['052021', '012021', '022021']
See an online demo.
\B
- Non-word boundary;\d\d
- Two digits (can also use '..' if you want to be less explicit);(\d{4})
- 4 Digits in capture group (can also use '.{4}` again);$
- End-line anchor.
Edit: With pandas using str.replace()
:
import pandas as pd
df = pd.DataFrame({'dates': ['052021','01052021','02112021']})
df['dates'] = df['dates'].str.replace(r'\B\d\d(\d{4})$', r'\1', regex=True)
print(df)
Prints:
dates
0 052021
1 012021
2 022021