I have a data frame that looks like the one below:
DF.head(20):
time var1 var2 prob
12:30 10 12 85
12:31 15 45 85
12:32 18 12 85
12:33 17 26 85
12:34 11 14 85
12:35 14 65 85
12:36 19 29 92
12:37 15 32 92
12:38 13 44 92
12:39 15 33 92
12:40 11 15 92
12:41 15 45 92
12:42 13 44 94
12:43 15 33 94
12:44 11 15 94
12:45 15 45 94
12:46 13 44 92
12:47 15 33 92
12:48 11 15 92
12:49 15 45 92
I want to predict the value of prob for a sequence of 6 previous values. So for the given example, I will take two-time series -> var1 and var2 from time 12:30 to 12:35 to predict prob for 12:35. the input shape that will go to LSTM as per my knowledge will be (df. shape[0],6,1). but I do not know how to convert my input from 2 dimensions to 3 dimensions. I also have a condition where I need to see the previous 6 times only if they are all under the same prob value. so in the given example, I won't be able to take the previous 6 values for prob = 94 as 94 occurs only 4 times and I cannot make 6 timesteps from that.
My pseudo code looks like this:
for i in range(df.shape[0]): #loop across all rows
if final_df[i,'prob'] == final_df[i 1,'prob']: #go until the value of prob change
make multiple non overlaping dataframes of shape (6,2)
else:
continue
I need help building the logic and preparing the input data for my LSTM.
CodePudding user response:
Your question is not completely clear but the input to the LSTM should be in form:
[samples, timesteps, features]
For example:
inputs = tf.random.normal([32, 10, 8])
So in your case, each sample will have shape (6,2). You can use rolling or simple for to make the data. Example:
df = pd.DataFrame({'var1': np.arange(10), 'var2': np.arange(10), 'prob': np.random.randint(0,10,10)})
xs = []
ys = []
for i in range(6,10):
xs.append(df[i-6:i][['var1', 'var2']].values)
ys.append(df.iloc[i]['prob'])
data = np.array(xs).reshape(-1,6,2)
data.shape
Output:
(4, 6, 2)
Based on the comment:
for i in range(6,20,6):
...