Home > Mobile >  Need help transforming 1 categorical column into separate categorical columns in Python
Need help transforming 1 categorical column into separate categorical columns in Python

Time:09-16

I have a large csv file named 'grocery store transactions' loaded into Python that contains only 2 columns: a) transaction # and b) products bought during each transaction.

I am trying to do some Association Rule Mining, but before I can do that, I have to transform the dataset below:

txns = pd.read_csv('transactions.csv')
txns.head(15)

enter image description here

*** My dilemma is that I am trying to transform the one "Product" column into its corresponding categorical columns with 1s and 0s ***

*** Does anyone know how to transform my "Product" column into separate binary categorical columns?

Below is an example of how this should look like but using a different dataset from class. This is a bank csv file that had a single 'bank features' column transformed into separate categorical binary columns. 1 represents the customer had the bank feature and 0 means they do not.

enter image description here

Any help would be greatly appreciated !!

CodePudding user response:

You can use either pandas or scikit-learn.

Pandas:

pd.get_dummies(df.Product, prefix='Product')

Scikit Learn:

from sklearn.preprocessing import LabelBinarizer
y = LabelBinarizer().fit_transform(df.Product)

For multicolumn data you can also use scikit-learn's OneHotEncoder, but I won't give an example here since you don't need it.

  • Related