Home > other >  Passing pandas dataframe to keras.SimpleRNN layer
Passing pandas dataframe to keras.SimpleRNN layer

Time:07-29

I have a df which follows this structure:

Week_no     Feature_1       Feature_2       Feature_3       Feature_4       Feature_5       Target
1           32456           342             16473           73453732        6346            363
2           56352           435             75673           53456275        3534            254
3           27276           342             35362           23466425        2367            262
4           88437           327             75653           47567465        4737            625
5           84947           114             45732           45347367        3735            425
6           44744           434             22265           86534563        4845            353

I am trying to pass this data to keras.SimpleRNN to predict the Target. From multimple sources I've read that I should drop the time aspect of the df, then I split the data in to training, validation and test datasets.

df.shape # (50, 6) after dropping the Week_no column

Splitting the data:

n = len(inputs)
train_df = inputs[0:int(n*0.7)].drop(columns = ['Target'])
train_ans = inputs[0:int(n*0.7)]['Target']

val_df = inputs[int(n*0.7):int(n*0.9)].drop(columns = ['Target'])
val_ans = inputs[int(n*0.7):int(n*0.9)]['Target']

test_df = inputs[int(n*0.9):].drop(columns = ['Target'])
test_ans = inputs[int(n*0.9):]['Target']
train_df.shape # (35, 5)
train_ans.shape # (35, )

val_df.shape # (10, 5)
test_df.shape # (5, 5)

I understand that the size is small, but this is for the sake of understanding how to pass pandas dfs. From keras documentation it states and also, I've checked this answer.

inputs: A 3D tensor, with shape [batch, timesteps, feature].

So to my understanding to use this model:

model = keras.models.Sequential([
    keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None, 1]),
    keras.layers.SimpleRNN(5, return_sequences=True),
    keras.layers.SimpleRNN(1)
])

model.compile(loss="mse", optimizer="adam")
history = model.fit(train_df, train_ans, epochs=20,
                    validation_data=(val_df, val_ans))

Which currently produces an error:

ValueError: Error when checking input: expected simple_rnn_75_input to have 3 dimensions, but got array with shape (35, 5)

I've tried reshaping my both train and val dataframes in to shapes of 3 dimensions but none of them worked and I've kept getting one of the two erros, one above or this:

ValueError: Error when checking input: expected simple_rnn_81_input to have shape (None, 1) but got array with shape (5, 5)

At this point I can't figure out the correct way of passing the data to the SimpleRNN keras layer and if my approach is correct.

CodePudding user response:

It really depends what your model should be doing. You can, for example, add a features dimension like this:

train_df = train_df[..., None]

So your data will have 5 timesteps and each timestep represents one of the features:

import pandas as pd
import tensorflow as tf


inputs = pd.DataFrame(data={"Week_no": [1,2, 3, 4, 5, 6],
                        "Feature_1": [1, 2, 3, 4, 5, 6],
                        "Feature_2": [1, 2, 3, 4, 5, 6],
                        "Feature_3": [1, 2, 3, 4, 5, 6],
                        "Feature_4": [1, 2, 3, 4, 5, 6],
                        "Feature_5": [1, 2, 3, 4, 5, 6],
                        "Target": [1, 2, 3, 4, 5, 6]})

n = len(inputs)

train_df = inputs[0:int(n*0.7)].drop(columns = ['Target', 'Week_no'])
train_ans = inputs[0:int(n*0.7)]['Target']

model = tf.keras.models.Sequential([
    tf.keras.layers.SimpleRNN(5, return_sequences=True, input_shape=[None, 1]),
    tf.keras.layers.SimpleRNN(5, return_sequences=True),
    tf.keras.layers.SimpleRNN(1)
])
model.compile(loss="mse", optimizer="adam")
history = model.fit(train_df.to_numpy()[..., None], train_ans, epochs=20)

Take a look at this post to understand how you should work with dataframes and time series data.

  • Related