Home > Blockchain >  Detect time string format in Python?
Detect time string format in Python?

Time:05-05

I have an extremely large dataset with date/time columns with various formats. I have a validation function to detect the possible date/time string formats that can handle handle 24 hour time as well as 12 hour. The seperator is always :. A sample of the is below. However, after profiling my code, it seems this can become a bottleneck and expensive in terms of the execution time. My question is if there is a better way to do this without affecting the performance.

import datetime
def validate_time(time_str: str):
    for time_format in ["%H:%M", "%H:%M:%S", "%H:%M:%S.%f", "%I:%M %p"]:
        try:
            return datetime.datetime.strptime(time_str, time_format)
        except ValueError:
            continue
    return None

print(validate_time(time_str="9:21 PM"))

CodePudding user response:

Instead of trying to parse using every format string, you could split by colons to obtain the segments of your string that denote hours, minutes, and everything that remains. Then you can parse the result depending on the number of values the split returns:

def validate_time_new(time_str: str):
    time_vals = time_str.split(':')
    
    try:
        if len(time_vals) == 1: 
            # No split, so invalid time
            return None
        elif len(time_vals) == 2:
            if time_vals[-1][::-2].lower() in ["am", "pm"]:
                # if last element contains am or pm, try to parse as 12hr time
                return datetime.datetime.strptime(time_str, "%I:%M %p")
            else:
                # try to parse as 24h time
                return datetime.datetime.strptime(time_str, "%H:%M")
        elif len(time_vals) == 3:
            if "." in time_vals[-1]:
                # If the last element has a decimal point, try to parse microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S.%f")
            else:
                # try to parse without microseconds
                return datetime.datetime.strptime(time_str, "%H:%M:%S")
        else: return None
    except ValueError:
        # If any of the attempts to parse throws an error, return None
        return None

To test this, let's time both methods for a bunch of test strings:

import timeit
print("old\t\t\tnew\t\t\t\told/new\t\ttest_string")
for s in ["12:24", "12:23:42", "13:53", "1:53 PM", "12:24:43.220", "not a date", "54:23:21"]:
    t1 = timeit.timeit('validate_time(s)', 'from __main__ import datetime, validate_time, s', number=100)
    t2 = timeit.timeit('validate_time_new(s)', 'from __main__ import datetime, validate_time_new, s', number=100)
    print(f"{t1:.6f}\t{t2:.6f}\t\t{t1/t2:.6f}\t\t{s}")
old         new             old/new     test_string
0.001628    0.001143        1.424322        12:24
0.001567    0.001012        1.548661        12:23:42
0.000935    0.000979        0.955177        13:53
0.003004    0.000722        4.161657        1:53 PM
0.004523    0.001396        3.241204        12:24:43.220
0.002148    0.000025        84.897370       not a date
0.002262    0.000622        3.638629        54:23:21
  • Related