Evenly spaced series of values from a list of (timestamp, value) tuples-CodePudding

I'm stuck on this because I'm not quite sure how to ask the question, so here's my best attempt!

I have a list of tuples which represent a temperature reading at a particular timestamp.

[
  (datetime.datetime(2022, 11, 30, 8, 25, 10, 261853), 19.82),
  (datetime.datetime(2022, 11, 30, 8, 27, 22, 479093), 20.01),
  (datetime.datetime(2022, 11, 30, 8, 27, 36, 984757), 19.96),
  (datetime.datetime(2022, 11, 30, 8, 36, 46, 651432), 21.25),
  (datetime.datetime(2022, 11, 30, 8, 41, 27, 230438), 21.42),
  ...
  (datetime.datetime(2022, 11, 30, 11, 57, 4, 689363), 17.8)
]

As you can see, the deltas between the records are all over the place - some are a few seconds apart, while others are minutes apart.

From these, I want to create a new list of tuples (or other data structure - I am happy to use NumPy or Pandas) where the timestamp value is exactly every 5 minutes, while the temperature reading is calculated as the assumed value at that timestamp given the data that is available. Something like this:

[
  (datetime.datetime(2022, 11, 30, 8, 25, 0, 0), ??),
  (datetime.datetime(2022, 11, 30, 8, 30, 0, 0), ??),
  (datetime.datetime(2022, 11, 30, 8, 35, 0, 0), ??),
  (datetime.datetime(2022, 11, 30, 8, 40, 0, 0), ??),
  ...
  (datetime.datetime(2022, 11, 30, 11, 30, 0, 0), ??),
]

My end goal is to be able to plot this data using PIL, but not MatPlotLib as I'm on very constrained hardware. I want to plot a smooth temperature line over a given time period, given the imperfect data I have on hand.

CodePudding user response：

Assuming lst the input list, you can use:

import pandas as pd

out = (
 pd.DataFrame(lst).set_index(0).resample('5min')
   .mean().interpolate('linear')
   .reset_index().to_numpy().tolist()
)

If you really want a list of tuples:

out = list(map(tuple, out))

Output:

[[Timestamp('2022-11-30 08:25:00'), 19.930000000000003],
 [Timestamp('2022-11-30 08:30:00'), 20.590000000000003],
 [Timestamp('2022-11-30 08:35:00'), 21.25],
 [Timestamp('2022-11-30 08:40:00'), 21.42],
 [Timestamp('2022-11-30 08:45:00'), 21.32717948717949],
 [Timestamp('2022-11-30 08:50:00'), 21.234358974358976],
 ...
 [Timestamp('2022-11-30 11:45:00'), 17.985641025641026],
 [Timestamp('2022-11-30 11:50:00'), 17.892820512820514],
 [Timestamp('2022-11-30 11:55:00'), 17.8]]

For datetime types:

out = (
 pd.DataFrame(lst).set_index(0).resample('5min')
   .mean().interpolate('linear')[1]
)

out = list(zip(out.index.to_pydatetime(), out))

Output:

[(datetime.datetime(2022, 11, 30, 8, 25), 19.930000000000003),
 (datetime.datetime(2022, 11, 30, 8, 30), 20.590000000000003),
 (datetime.datetime(2022, 11, 30, 8, 35), 21.25),
 (datetime.datetime(2022, 11, 30, 8, 40), 21.42),
 (datetime.datetime(2022, 11, 30, 8, 45), 21.32717948717949),
 (datetime.datetime(2022, 11, 30, 8, 50), 21.234358974358976),
 ...
 (datetime.datetime(2022, 11, 30, 11, 45), 17.985641025641026),
 (datetime.datetime(2022, 11, 30, 11, 50), 17.892820512820514),
 (datetime.datetime(2022, 11, 30, 11, 55), 17.8)]

Before/after resampling: