I have 5 columns in an array that are datetime type values. There are 100k values in each column
The values are the datetime for a timeline going from 2015-01-15 00:30 to 2020-12-31 23:00 in 30 minute increments.
Basically what I want to do is to loop through values in the array, and check if the current value is an exact 30 minute timestep from the last value. In an array with columns this would be the value above the current value being investigated
Theres probably a few ways to do this, but I've included a pseudocode sample of how I'm thinking about it
for row in _the_whole_array :
for cell in row:
if cell == to the 30 minute timestep of the cell above it
continue iterating
else:
store that value
return the smallest timestep found, and the biggest timestep found
I have looked at for loops as well as nditer
but I'm getting errors with iterating over datetimes, and I'm also wondering how I can find the cell value above the current cell value above it.
Any help hugely appreciated
CodePudding user response:
try using pandas dataFrame
import pandas
import numpy
# Creating Dataframe From NumPy Array
df = pd.DataFrame(yout_array)
#iterating through the dataframe
for index, row in df.iterrows():
#do something with index and row
print(type(index))
print(index)
print('~~~~~~')
print(type(row))
print(row)
print('------')
CodePudding user response:
i wasn't sure of your axis, even if you use rows on your pseudocode. but the overall idea is the same: if your dtype is datetime you can do basic arithmetic on it
x, y = _the_whole_array.shape
for row in range(x):
diff=_the_whole_array[row][1:]-_the_whole_array[row][:-1]
print(np.amin(diff), np.amax(diff))
for column in range(y):
diff=_the_whole_array[:,column][1:]-_the_whole_array[:,column][:-1]
print(np.amin(diff), np.amax(diff))
CodePudding user response:
As mentioned by @Heidiki, consider using pandas because it will be much easier to process large data.
To address your question, you can create a temporary variable to store values of previous row while looping. At every iteration, you calculate the differences and check for absolute min and max, exactly as your pseudocode.
Here are an example and test data, 3x5-matrix. Please check whether the output matches your requirement.
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
# test data
arr = np.array([
[datetime(2019, 1, 2, 13, 12), datetime(2019, 1, 2, 15, 19),
datetime(2019, 1, 2, 15, 59), datetime(2019, 1, 2, 17, 23),
datetime(2019, 1, 2, 15, 18)],
[datetime(2019, 1, 2, 13, 34), datetime(2019, 1, 2, 15, 57),
datetime(2019, 1, 2, 18, 53), datetime(2019, 1, 2, 17, 34),
datetime(2019, 1, 2, 15, 29)],
[datetime(2019, 1, 2, 13, 49), datetime(2019, 1, 2, 16, 35),
datetime(2019, 1, 2, 21, 18), datetime(2019, 1, 2, 17, 59),
datetime(2019, 1, 2, 15, 46)]
])
def timedelta_to_minutes(dt: timedelta) -> int:
return (dt.days * 24 * 60) (dt.seconds // 60)
def min_max_timestep(data: np.array) -> tuple:
prev_row = min_step = max_step = None
df = pd.DataFrame(data)
for idx, row in df.iterrows():
if not idx:
prev_row = row
continue
diff = row - prev_row
min_diff, max_diff = min(diff), max(diff)
if min_step is None or min_diff < min_step:
min_step = min_diff
if max_step is None or max_diff > max_step:
max_step = max_diff
prev_row = row
# convert timedelta to minutes
return timedelta_to_minutes(min_step), timedelta_to_minutes(max_step)
result = min_max_timestep(arr)