I have a df
which follows this structure:
Week_no Feature_1 Feature_2 Feature_3 Feature_4 Feature_5 Target
1 32456 342 16473 73453732 6346 363
2 56352 435 75673 53456275 3534 254
3 27276 342 35362 23466425 2367 262
4 88437 327 75653 47567465 4737 625
5 84947 114 45732 45347367 3735 425
6 44744 434 22265 86534563 4845 353
I am trying to pass this data to keras.SimpleRNN
to predict the Target
. From multimple sources I've read that I should drop the time aspect of the df
, then I split the data in to training, validation and test datasets.
df.shape # (50, 6) after dropping the Week_no column
Splitting the data:
n = len(inputs)
train_df = inputs[0:int(n*0.7)].drop(columns = ['Target'])
train_ans = inputs[0:int(n*0.7)]['Target']
val_df = inputs[int(n*0.7):int(n*0.9)].drop(columns = ['Target'])
val_ans = inputs[int(n*0.7):int(n*0.9)]['Target']
test_df = inputs[int(n*0.9):].drop(columns = ['Target'])
test_ans = inputs[int(n*0.9):]['Target']
train_df.shape # (35, 5)
train_ans.shape # (35, )
val_df.shape # (10, 5)
test_df.shape # (5, 5)
I understand that the size is small, but this is for the sake of understanding how to pass pandas dfs
.
From keras
documentation it states and also, I've checked this answer.
inputs: A 3D tensor, with shape [batch, timesteps, feature].
So to my understanding to use this model:
model = keras.models.Sequential([
keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None, 1]),
keras.layers.SimpleRNN(5, return_sequences=True),
keras.layers.SimpleRNN(1)
])
model.compile(loss="mse", optimizer="adam")
history = model.fit(train_df, train_ans, epochs=20,
validation_data=(val_df, val_ans))
Which currently produces an error:
ValueError: Error when checking input: expected simple_rnn_75_input to have 3 dimensions, but got array with shape (35, 5)
I've tried reshaping my both train
and val
dataframes in to shapes of 3 dimensions but none of them worked and I've kept getting one of the two erros, one above or this:
ValueError: Error when checking input: expected simple_rnn_81_input to have shape (None, 1) but got array with shape (5, 5)
At this point I can't figure out the correct way of passing the data to the SimpleRNN
keras layer and if my approach is correct.
CodePudding user response:
It really depends what your model should be doing. You can, for example, add a features dimension like this:
train_df = train_df[..., None]
So your data will have 5 timesteps and each timestep represents one of the features:
import pandas as pd
import tensorflow as tf
inputs = pd.DataFrame(data={"Week_no": [1,2, 3, 4, 5, 6],
"Feature_1": [1, 2, 3, 4, 5, 6],
"Feature_2": [1, 2, 3, 4, 5, 6],
"Feature_3": [1, 2, 3, 4, 5, 6],
"Feature_4": [1, 2, 3, 4, 5, 6],
"Feature_5": [1, 2, 3, 4, 5, 6],
"Target": [1, 2, 3, 4, 5, 6]})
n = len(inputs)
train_df = inputs[0:int(n*0.7)].drop(columns = ['Target', 'Week_no'])
train_ans = inputs[0:int(n*0.7)]['Target']
model = tf.keras.models.Sequential([
tf.keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None, 1]),
tf.keras.layers.SimpleRNN(5, return_sequences=True),
tf.keras.layers.SimpleRNN(1)
])
model.compile(loss="mse", optimizer="adam")
history = model.fit(train_df.to_numpy()[..., None], train_ans, epochs=20)
Take a look at this post to understand how you should work with dataframes and time series data.