Home > Blockchain >  How can I ensure that a pandas date range has even spacing?
How can I ensure that a pandas date range has even spacing?

Time:10-24

I am writing some code to interpolate some data with space (x, y) and time. The data needs to be on a regular grid. I cant seem to make a generalized function to find a date range with regular spacing. The range that fails for me is:

date_min = numpy.datetime64('2022-10-24T00:00:00.000000000')
date_max = numpy.datetime64('2022-11-03T00:00:00.000000000')

And it needs to roughly match the current values of times I have, which for this case is 44.

periods = 44

I tried testing if the time difference is divisible by 2 and then adding 1 to the number of periods, which worked for a lot of cases, but it doesn't seem to really work for this time range:

def unique_diff(x):
    return numpy.unique(numpy.diff(x))


unique_diff(pd.date_range(date_min, date_max, periods=periods))
Out[31]: array([20093023255813, 20093023255814], dtype='timedelta64[ns]')

unique_diff(pd.date_range(date_min, date_max, periods=periods 1))
Out[32]: array([19636363636363, 19636363636364], dtype='timedelta64[ns]')

unique_diff(pd.date_range(date_min, date_max, periods=periods-1))
Out[33]: array([20571428571428, 20571428571429], dtype='timedelta64[ns]')

However, it does work for 2:

unique_diff(pd.date_range(date_min, date_max, periods=periods 2))
Out[34]: array([19200000000000], dtype='timedelta64[ns]')

I could just keep trying different period deltas until I get a solution, but I would rather know why this is happening and how I can generalize this problem for any min/max times with a target number of periods

CodePudding user response:

Your date range doesn't divide evenly by the periods in nanosecond resolution:

# as the contains start and end, there's a step fewer than there are periods 
steps = periods - 1
int(date_max - date_min) / steps
# 20093023255813.953

A solution could be to round up (or down) your max date, to make it divide evenly in nanosecond resolution:

date_max_r = (date_min   
              int(numpy.ceil(int(date_max - date_min) / (steps)) * (steps)))

unique_diff(pd.date_range(date_min, date_max_r, periods=periods))
# array([20093023255814], dtype='timedelta64[ns]')
  • Related