Home > database >  Scikit-Learn ColumnTransformer gives "TypeError: zip argument #1 must support iteration"
Scikit-Learn ColumnTransformer gives "TypeError: zip argument #1 must support iteration"

Time:10-13

I am attempting to transform some columns of my data frame with the MinMaxScaler() from Scikit-Learn. The data looks as such:

DataFrame

The columns I wish to transform:

ct_columns = ['Number_of_Cigarettes', 'Nicotine_Content', 'Tar_Content', 'Price', 'Units_Sold_Per_Week', 'Profits_Per_Week']

Pass them to a column transformer:

ct = ColumnTransformer( (MinMaxScaler(),
                       ct_columns)
)

Assign the input features and label, then pass them to train_test_split:

X = one_hot_cigarette_df.drop('Units_Sold_Per_Week', axis=1)
y = one_hot_cigarette_df['Units_Sold_Per_Week']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=RANDOM_STATE) 

When I attempt to then pass the column transformer to the fit method I get the issue:

  ct.fit(X_train)

    271             return
    272 
--> 273         names, transformers, _ = zip(*self.transformers)
    274 
    275         # validate names

TypeError: zip argument #1 must support iteration

CodePudding user response:

You should replace ct = ColumnTransformer((MinMaxScaler(), ct_columns)) with

ct = ColumnTransformer([('scaler', MinMaxScaler(), ct_columns)])

You should also drop the label name 'Units_Sold_Per_Week' from ct_columns if you are planning to apply the transformer only to the feature matrix.

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
np.random.seed(0)

# generate the data
df = pd.DataFrame(
    columns=['Units_Sold_Per_Week', 'Number_of_Cigarettes', 'Nicotine_Content', 'Tar_Content', 'Price', 'Profits_Per_Week'],
    data=np.random.lognormal(1, 0.5, (100, 6))
)

# extract the features and target
X = df.drop('Units_Sold_Per_Week', axis=1)
y = df['Units_Sold_Per_Week']

# split the data
X_train, X_test, Y_train, Y_test = train_test_split(X, y, random_state=100)

# scale the features
ct_columns = ['Number_of_Cigarettes', 'Nicotine_Content', 'Tar_Content', 'Price', 'Profits_Per_Week']
ct = ColumnTransformer([('scaler', MinMaxScaler(), ct_columns)])
ct.fit(X_train)

X_train_scaled = ct.transform(X_train)
X_test_scaled = ct.transform(X_test)
  • Related