Home > Enterprise >  Invalid dates parser
Invalid dates parser

Time:10-27

Is there an easy way to decrement an invalid date like "November 31st" to the last valid date of the month? November 31st does not exist as there aren't 31 days in November.

The date strings that I'm working with are very messy and inconsistent so I want to avoid trying to slice the string or anything like that. Parser.parse() works great for my use case when the dates aren't invalid.

from dateutil import parser

datstrings_list = ["Nov 31, 1976", "11/31/76", "11/31/1976", "November 31st, 1976","1/32/1976"]

date_list = []

for i in datestrings_list:

    datestring = i

    date = parser.parse(datestring).date()

    date_list.append(date)


Creates error:


ParserError: day is out of range for month: Nov 31, 1976

desired value for date_list:


[datetime.date(1976, 11, 30), datetime.date(1976, 11, 30),datetime.date(1976, 11, 30),datetime.date(1976, 11, 30), datetime.date(1/31/1976)]

CodePudding user response:

You could make use of a while loop and complete it that way.

from dateutil import parser

datestring = "Nov 31, 1976"
date = None
while date is None:
    date_array = datestring.split()
    
    try:
        date = parser.parse(datestring).date()
    except parser._parser.ParserError:
        day = int("".join(x for x in date_array[1] if x.isdigit()))-1
        date_array[1] = f"{day},"
        datestring = f"{date_array[0]} {date_array[1]} {date_array[2]}"

print(date)

This should cover you for your needs.

UPDATE FOR QUESTION:

from dateutil import parser
import calendar

datestrings_list = ["Nov 31, 1976", "11/31/76", "11/31/1976", "November 31st, 1976","1/32/1976"]
c = {month: index for index, month in enumerate(calendar.month_abbr) if month}

# Format string list
def standardise_list(date_list):
    lst = []
    for index, ls in enumerate(date_list):
        if "/" not in ls:
            ds = ls.split()
            if len(ds[0]) > 3:
                ds[0] = ds[0][:3]

            if len(ds[1]) > 2:
                ds[1] = ds[1][:2]

            nd = f"{c[ds[0]]}/{ds[1]}/{ds[2]}"
            lst.append(nd)
        else:
            lst.append(ls)
    return lst

# Fix out of range dates
def date_fix(datestring):
    date = None
    while date is None:
        date_array = datestring.split("/")
        
        try:
            date = parser.parse(datestring).date()
        except parser._parser.ParserError:
            day = int("".join(x for x in date_array[1] if x.isdigit()))-1
            date_array[1] = f"{day}"
            datestring = f"{date_array[0]} {date_array[1]} {date_array[2]}"
    return date

standard_string_list = standardise_list(datestrings_list)

dates = [date_fix(ds) for ds in standard_string_list]
print(dates)

>>> [datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 1, 31)]

CodePudding user response:

I'm not sure if there's an easy way to replace invalid dates with the max day for that month, but one approach could be to use helper functions under the calendar module to get the max day for a given month and year:

import calendar

# Mapping of month abbreviation to month index. Ex: 'Jan': 1
month_indices = {month: i for i, month in enumerate(calendar.month_abbr)}

datestring = "Nov 31, 1976"

month_abbr, day, yr = datestring.replace(',', '').split()
last_day_in_month = calendar.monthrange(int(yr), month_indices[month_abbr])[-1]

assert last_day_in_month == 30

Alternatively, if you have a date string like 11/31/76, here's how you get the max day for this month and year:

datestring = "11/31/76"

month, day, year = map(int, datestring.split('/'))
# checking if we have abbreviation like `76` for year
# we have to make sure the year has 4 digits, otherwise `monthrange`
# appears to parse the year as `2076`, which is not what we want.
if year < 100:
    year  = 1900

assert calendar.monthrange(year, month)[-1] == 30
  • Related