Home > Back-end >  How to define a Pandas column as a list
How to define a Pandas column as a list

Time:11-11

I'm using Keras Image Data Generator for data augmentation, and the flow_from_dataframe function within it. Info regarding it here: https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow_from_dataframe

# Create new dataframes for train and test

df_train = pd.DataFrame()
df_train['image'], df_train['labels'] = X_train, y_train

df_test = pd.DataFrame()
df_test['image'], df_test['labels'] = X_test, y_test

This is what one dataframe looks like:

image   labels
4227  /Users/m/Documents/Machine Learning Pr...  [73, 0]
4676  /Users/m/Documents/Machine Learning Pr...  [36, 0]
800   /Users/m/Documents/Machine Learning Pr...  [26, 0]
3671  /Users/m/Documents/Machine Learning Pr...  [42, 0]

This is how I've imported the data generator:

from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rescale = 1./255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
    )

test_datagen= ImageDataGenerator(rescale=1./255.)

train_generator=datagen.flow_from_dataframe(
dataframe = df_train,
x_col="image",
y_col="labels",
batch_size=32,
seed=42,
shuffle=True,
class_mode='multi_output',
target_size=(128, 128))


valid_generator = test_datagen.flow_from_dataframe(
dataframe = df_test,
x_col = "image",
y_col = "labels",
batch_size = 32,
seed = 42,
shuffle = True,
class_mode='multi_output',

target_size=(128, 128))

The function reads in a dataframe, but in the documentation it says the y_col specified must be a list:

y_col string or list, column/s in dataframe that has the target data.

Before I created the dataframe the column was a list, but now that it's a column in pandas it's no longer classed as a 'list', right? So why do I get this error message:

TypeError: If class_mode="multi_output", y_col must be a list. Received str.

I want to use the class mode multi outputas above, and it states y_col must be a list but it's a string. Not sure why it is stating it's a string? Is there anyway to change the 'type' of a column within a dataframe or am I misunderstanding?

CodePudding user response:

'List' here means list of column names.

As Zelemist has said, change your dataframe so that there are two columns rather than the one you have.

Then input a list to y_col such as:

y_col = ['col1', 'col2]

Hope it makes sense now.

  • Related