How to fix Value Error in datetime.strptime in python-CodePudding

I'm writing a programme where the initial step is to split a filename into two components. The files have the format: 12080103_20220809191000.nc where the number before the underscore is the file name and the string after is the date (2022/08/09 19:10:00 in this case).

I'm splitting the file as follows:

filename = os.path.basename(pathname)
sn_file = filename.split("_")
file_date = dt.datetime.strptime(sn_file[1], "%Y%m%d%H%M%S.nc")
sn=sn_file[0]

However, this gives the error: ValueError: time data '12070069' does not match format '%Y%m%d%H%M%S.nc' which shows that somehow the first string of characters is getting mixed up with the second.

I have no clue how this is happening or why it's happening. Any advice would be a great help

EDIT: As requested, here is the full code:

import xarray as xr
import datetime as dt
import os
from pathlib import Path
import pandas as pd
import plotly.express as px
import numpy as np


def sn_date_fromfile(pathname):
    filename = os.path.basename(pathname)
    sn_file = filename.split("_")
    print(sn_file)
    file_date = dt.datetime.strptime(sn_file[1], "%Y%m%d%H%M%S.nc")
    sn=sn_file[0]

    return file_date, sn, pathname

def plot_single_cam(dat, stat, title=""):
    #dat[stat]=dat[stat].fillna(0.).where(dat[stat]>2000)
    #dat[stat]=dat[stat].fillna(0.).where(dat[stat]<500)
    arr = np.array(dat[stat])
    arr_max = np.quantile(arr, 0.95)
    arr_min = np.quantile(arr, 0.05)
    # awful - fgure out hwo to do quantile
    arr_max = arr_max - (arr_max * 0.98)
    arr_min = arr_min   (arr_min * 1.02)
    arr[arr < 1000] = 1000
    arr[arr > 2000] = 2000
    dat[stat].values = arr
    fig = px.imshow((dat[stat] - 1000) / 10,
                    color_continuous_scale='temps',
                    origin='lower',
                    animation_frame="time",
                    aspect="equal",
                    contrast_rescaling="minmax",
                    width=750,
                    height=750,
                    title=title,
                    )
    return fig

hours_ago = 0.5
# inside /var/www/data/PI-160/ are the .nc files. Change the directory to match where the files are
paths = sorted(Path('/Volumes/1A/file1').iterdir(), key=os.path.getmtime)
file_meta = pd.DataFrame(
    [sn_date_fromfile(path) for path in paths],
    columns=["sn", "file_time", "path_name"],
    )

file_meta = file_meta[
    file_meta.file_time >
    (dt.datetime.utcnow() - dt.timedelta(hours = hours_ago))
    ]

stat = "t_b_snapshot"
try:
    os.remove("/var/www/html/static_plots/static.html")
except:
    pass
with open("/var/www/html/static_plots/static.html", 'a') as f:
    for sn in file_meta["sn"].unique():
        file_meta_sn = file_meta[file_meta['sn'] == sn]
        dat = xr.open_mfdataset(file_meta_sn['path_name'], engine="netcdf4")
        fig = plot_single_cam(dat, stat, title=f"{sn} updated {dt.datetime.utcnow()} UTC")
        f.write(fig.to_html(full_html=False, include_plotlyjs='cdn'))

CodePudding user response：

Using python 3.8.10 I cannot reproduce this - datetime correctly parses the time string for the example you gave. I feel the error is in the reliance on the underscore split and the use of sn_file[1]. Is it possible for the file to contain another underscore before the final expected one? Try using sn_file[-1] to get the last member of the split which is what we want.

I'm assuming you're doing this for many files, some of which do not have this problem. Try using a try statement to catch the ValueError to print filename and sn_file, e.g.

filename = os.path.basename(pathname)
sn_file = filename.split("_")
try:
    file_date = dt.datetime.strptime(sn_file[1], "%Y%m%d%H%M%S.nc")
except:
    print(f"error with filename \'{filename}\' {sn_file}")
sn=sn_file[0]

CodePudding user response：

The issue is with either the file naming or the splitting, not with the datetime parsing. Basically you're passing the wrong chunk of the filename to dt.datetime.strptime().

One potential fix for this would be to not rely on splitting the filename at the _, but instead to use a regex to look for the part of the filename with the appropriate format. This would also eliminate the need for parsing the string for the datetime.

For example:

import datetime as dt
import re

filename = '12080103_20220809191000.nc'

matches = re.search('(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})', filename)
matches = [int(m) for m in matches.groups()]

datetime = dt.datetime(*matches)