Is there an easy way to decrement an invalid date like "November 31st" to the last valid date of the month? November 31st does not exist as there aren't 31 days in November.
The date strings that I'm working with are very messy and inconsistent so I want to avoid trying to slice the string or anything like that. Parser.parse() works great for my use case when the dates aren't invalid.
from dateutil import parser
datstrings_list = ["Nov 31, 1976", "11/31/76", "11/31/1976", "November 31st, 1976","1/32/1976"]
date_list = []
for i in datestrings_list:
datestring = i
date = parser.parse(datestring).date()
date_list.append(date)
Creates error:
ParserError: day is out of range for month: Nov 31, 1976
desired value for date_list:
[datetime.date(1976, 11, 30), datetime.date(1976, 11, 30),datetime.date(1976, 11, 30),datetime.date(1976, 11, 30), datetime.date(1/31/1976)]
CodePudding user response:
You could make use of a while
loop and complete it that way.
from dateutil import parser
datestring = "Nov 31, 1976"
date = None
while date is None:
date_array = datestring.split()
try:
date = parser.parse(datestring).date()
except parser._parser.ParserError:
day = int("".join(x for x in date_array[1] if x.isdigit()))-1
date_array[1] = f"{day},"
datestring = f"{date_array[0]} {date_array[1]} {date_array[2]}"
print(date)
This should cover you for your needs.
UPDATE FOR QUESTION:
from dateutil import parser
import calendar
datestrings_list = ["Nov 31, 1976", "11/31/76", "11/31/1976", "November 31st, 1976","1/32/1976"]
c = {month: index for index, month in enumerate(calendar.month_abbr) if month}
# Format string list
def standardise_list(date_list):
lst = []
for index, ls in enumerate(date_list):
if "/" not in ls:
ds = ls.split()
if len(ds[0]) > 3:
ds[0] = ds[0][:3]
if len(ds[1]) > 2:
ds[1] = ds[1][:2]
nd = f"{c[ds[0]]}/{ds[1]}/{ds[2]}"
lst.append(nd)
else:
lst.append(ls)
return lst
# Fix out of range dates
def date_fix(datestring):
date = None
while date is None:
date_array = datestring.split("/")
try:
date = parser.parse(datestring).date()
except parser._parser.ParserError:
day = int("".join(x for x in date_array[1] if x.isdigit()))-1
date_array[1] = f"{day}"
datestring = f"{date_array[0]} {date_array[1]} {date_array[2]}"
return date
standard_string_list = standardise_list(datestrings_list)
dates = [date_fix(ds) for ds in standard_string_list]
print(dates)
>>> [datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 11, 30), datetime.date(1976, 1, 31)]
CodePudding user response:
I'm not sure if there's an easy way to replace invalid dates with the max day for that month, but one approach could be to use helper functions under the calendar
module to get the max day for a given month and year:
import calendar
# Mapping of month abbreviation to month index. Ex: 'Jan': 1
month_indices = {month: i for i, month in enumerate(calendar.month_abbr)}
datestring = "Nov 31, 1976"
month_abbr, day, yr = datestring.replace(',', '').split()
last_day_in_month = calendar.monthrange(int(yr), month_indices[month_abbr])[-1]
assert last_day_in_month == 30
Alternatively, if you have a date string like 11/31/76
, here's how you get the max day for this month and year:
datestring = "11/31/76"
month, day, year = map(int, datestring.split('/'))
# checking if we have abbreviation like `76` for year
# we have to make sure the year has 4 digits, otherwise `monthrange`
# appears to parse the year as `2076`, which is not what we want.
if year < 100:
year = 1900
assert calendar.monthrange(year, month)[-1] == 30