Python Human-friendly string to datetime-CodePudding

I am working on a program which gets a human-friendly input and converts it into a unix time (i.e. seconds since 1970 1st January midnight).

2030y1d4M: Midnight on 1st April 2030
2030y1d4M5m: 00:05 on 1st April 2030
6h1d4M: 06:00 on 1st April in the current year

Basically, the user would input some or all of the units in arbitrary order.

I've looked for similar questions and found some external sites to convert units into timedelta. However, none suggested a way to convert it into a datetime object. strptime is too strict and doesn't (seem to) allow for different orders

It doesn't matter whether it converts into a datetime.datetime object and then the unix time, or directly into unix. I might not need the exact code, but I would be glad to be pointed at the right direction.

CodePudding user response：

You could use a regex to parse the user input:

import re
import datetime as dt
userDate = '6h2030y1d4M5m'

# Use reg ex to parse user input
dateParts = {
    m[-1]: int(m[:-1])
    for m in re.findall(r'([\d]{1,4}[ydhmM]{1})', userDate)
}
now = dt.datetime.now()

# construct datetime obj, use datetime.now() as default for the case values are missing in user string
dObj = dt.datetime(
    dateParts.get('y', now.year), dateParts.get('m', now.month),
    dateParts.get('d', now.day), dateParts.get('h', now.hour),
    dateParts.get('M', now.minute), dateParts.get('s', now.second)
)

print(dObj)

Out:

2030-05-01 06:04:35

CodePudding user response：

What you would surely need is to tokenize your inputs. Considering your example inputs it should be tokenizable via regular expressions, consider following example

import re
def tokenize(x):
   return re.findall(r'(\d )(\D )',x)
d1 = "2030y1d4M"
d2 = "2030y1d4M5m"
d3 = "6h1d4M"
print(tokenize(d1))
print(tokenize(d2))
print(tokenize(d3))

output

[('2030', 'y'), ('1', 'd'), ('4', 'M')]
[('2030', 'y'), ('1', 'd'), ('4', 'M'), ('5', 'm')]
[('6', 'h'), ('1', 'd'), ('4', 'M')]

Explanation: function tokenize does convert input string into list of 2-tuples containg value (as string) and unit (also string). Beware however that this assume that user input are items of certain Chomsky Type 3 languge, if this does not hold true, regular expression will not suffice.

CodePudding user response：

Define the symbols and their meaning e.g. y for year. Then using regex, parse the string to see each unit e.g. [('2030', 'y'), ('1', 'd'), ('4', 'M')]. Using those 2 data, we can already construct a datetime object.

from datetime import datetime, MINYEAR
import re

UNITS = {
    "y": "year",
    "M": "month",
    "d": "day",
    "h": "hour",
    "m": "minute",
}

dt_re = re.compile(r"(\d )([A-Za-z])")

for text_date in [
    "2030y1d4M",
    "2030y1d4M5m",
    "6h1d4M",
    "23h1993y59m25d12M",
]:
    unit_list = dt_re.findall(text_date)
    dt_kwargs = {"year": MINYEAR, "month": 1, "day": 1}  # Default values for required arguments

    for unit in unit_list:
        dt_kwargs[UNITS[unit[1]]] = int(unit[0])

    dt = datetime(**dt_kwargs)
    print(dt)

Output

2030-04-01 00:00:00
2030-04-01 00:05:00
0001-04-01 06:00:00
1993-12-25 23:59:00