Preparing input data for LSTM layer with conditions-CodePudding

I have a data frame that looks like the one below:

DF.head(20):
time        var1       var2       prob     
12:30       10          12         85
12:31       15          45         85
12:32       18          12         85
12:33       17          26         85
12:34       11          14         85
12:35       14          65         85
12:36       19          29         92
12:37       15          32         92
12:38       13          44         92
12:39       15          33         92
12:40       11          15         92
12:41       15          45         92
12:42       13          44         94
12:43       15          33         94
12:44       11          15         94
12:45       15          45         94
12:46       13          44         92
12:47       15          33         92
12:48       11          15         92
12:49       15          45         92

I want to predict the value of prob for a sequence of 6 previous values. So for the given example, I will take two-time series -> var1 and var2 from time 12:30 to 12:35 to predict prob for 12:35. the input shape that will go to LSTM as per my knowledge will be (df. shape[0],6,1). but I do not know how to convert my input from 2 dimensions to 3 dimensions. I also have a condition where I need to see the previous 6 times only if they are all under the same prob value. so in the given example, I won't be able to take the previous 6 values for prob = 94 as 94 occurs only 4 times and I cannot make 6 timesteps from that.

My pseudo code looks like this:

for i in range(df.shape[0]):        #loop across all rows
  if final_df[i,'prob'] == final_df[i 1,'prob']:     #go until the value of prob change
      make multiple non overlaping dataframes of shape (6,2)
  else:
      continue

I need help building the logic and preparing the input data for my LSTM.

CodePudding user response：

Your question is not completely clear but the input to the LSTM should be in form:

[samples, timesteps, features]

For example:

inputs = tf.random.normal([32, 10, 8])

So in your case, each sample will have shape (6,2). You can use rolling or simple for to make the data. Example:

df = pd.DataFrame({'var1': np.arange(10), 'var2': np.arange(10), 'prob': np.random.randint(0,10,10)})
xs = []
ys = []
for i in range(6,10):
    xs.append(df[i-6:i][['var1', 'var2']].values)
    ys.append(df.iloc[i]['prob'])
    
data = np.array(xs).reshape(-1,6,2)

data.shape

Output:

(4, 6, 2)

Based on the comment:

for i in range(6,20,6):
...