Home > OS >  Preparing input data for LSTM layer with conditions
Preparing input data for LSTM layer with conditions

Time:05-10

I have a data frame that looks like the one below:

DF.head(20):
time        var1       var2       prob     
12:30       10          12         85
12:31       15          45         85
12:32       18          12         85
12:33       17          26         85
12:34       11          14         85
12:35       14          65         85
12:36       19          29         92
12:37       15          32         92
12:38       13          44         92
12:39       15          33         92
12:40       11          15         92
12:41       15          45         92
12:42       13          44         94
12:43       15          33         94
12:44       11          15         94
12:45       15          45         94
12:46       13          44         92
12:47       15          33         92
12:48       11          15         92
12:49       15          45         92

I want to predict the value of prob for a sequence of 6 previous values. So for the given example, I will take two-time series -> var1 and var2 from time 12:30 to 12:35 to predict prob for 12:35. the input shape that will go to LSTM as per my knowledge will be (df. shape[0],6,1). but I do not know how to convert my input from 2 dimensions to 3 dimensions. I also have a condition where I need to see the previous 6 times only if they are all under the same prob value. so in the given example, I won't be able to take the previous 6 values for prob = 94 as 94 occurs only 4 times and I cannot make 6 timesteps from that.

My pseudo code looks like this:

for i in range(df.shape[0]):        #loop across all rows
  if final_df[i,'prob'] == final_df[i 1,'prob']:     #go until the value of prob change
      make multiple non overlaping dataframes of shape (6,2)
  else:
      continue

I need help building the logic and preparing the input data for my LSTM.

CodePudding user response:

Your question is not completely clear but the input to the LSTM should be in form:

[samples, timesteps, features]

For example:

inputs = tf.random.normal([32, 10, 8])

So in your case, each sample will have shape (6,2). You can use rolling or simple for to make the data. Example:

df = pd.DataFrame({'var1': np.arange(10), 'var2': np.arange(10), 'prob': np.random.randint(0,10,10)})
xs = []
ys = []
for i in range(6,10):
    xs.append(df[i-6:i][['var1', 'var2']].values)
    ys.append(df.iloc[i]['prob'])
    
data = np.array(xs).reshape(-1,6,2)

data.shape

Output:

(4, 6, 2)

Based on the comment:

for i in range(6,20,6):
...
  • Related