Home > Software design >  Python regex negative extract of an optional group
Python regex negative extract of an optional group

Time:08-03

I have some dates in strings like

dates = ['052021','01052021','02112021']

Some dates contain the day in between like %m%d%Y and some not, like %m%Y. I need to be able to extract so that the previous list comes as

dates = ['052021','012021','022021']

I can only do it using regex and in one expression, I have tried something like using the pattern '(\d{2}(?:(\d{2})\d{4})' without any luck.

CodePudding user response:

Maybe try:

import regex as re

dates = ['052021','01052021','02112021']
new_dates = [re.sub(r'\B\d\d(\d{4})$', r'\1', s) for s in dates]

print(new_dates)

Prints:

['052021', '012021', '022021']

See an online demo.

  • \B - Non-word boundary;
  • \d\d - Two digits (can also use '..' if you want to be less explicit);
  • (\d{4}) - 4 Digits in capture group (can also use '.{4}` again);
  • $ - End-line anchor.

Edit: With pandas using str.replace():

import pandas as pd

df = pd.DataFrame({'dates': ['052021','01052021','02112021']})
df['dates'] = df['dates'].str.replace(r'\B\d\d(\d{4})$', r'\1', regex=True)

print(df)

Prints:

    dates
0  052021
1  012021
2  022021
  • Related