The data available is as follows:
bread milk butter jam nutella cheese chips
0 bread NaN butter jam nutella NaN NaN
1 NaN NaN butter jam nutella NaN chips
2 NaN milk NaN NaN NaN cheese NaN
3 bread milk butter jam nutella cheese chips
4 bread milk NaN NaN nutella NaN NaN
5 bread milk butter jam NaN cheese chips
6 bread milk NaN NaN nutella NaN NaN
7 bread NaN butter NaN NaN cheese NaN
8 bread NaN butter jam nutella NaN NaN
9 NaN milk butter jam NaN cheese NaN
10 bread NaN NaN jam nutella cheese chips
11 bread milk butter jam nutella NaN NaN
12 bread NaN butter NaN nutella cheese NaN
13 bread NaN butter jam nutella cheese chips
14 bread milk butter jam nutella cheese chips
15 NaN milk butter jam nutella cheese NaN
16 NaN milk NaN jam nutella cheese NaN
17 bread milk butter jam nutella cheese chips
18 bread NaN butter jam nutella cheese NaN
19 bread milk butter NaN nutella cheese NaN
20 NaN milk NaN NaN NaN NaN chips
I want to one hot encode each column to produce something as follows for all column, the entire dataset:
bread | milk | butter | jam | nutella | cheese | chips |
---|---|---|---|---|---|---|
1 | 0 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 1 | 1 | 1 | 0 | 1 |
Can someone please help me with the code?
So I tried to use the following code:
pd.get_dummies(book_data, columns = ['bread', 'milk','butter','jam', 'nutella','cheese','chips'])
I obtained the following error:
KeyError: "['bread', 'cheese'] not in index"
CodePudding user response:
You can use a trick with pandas.Series.name
to replace the column name with 1
, then fillna(0)
.
First make sure to clean up the column names with:
book_data.columns= book_data.columns.str.strip()
And why not also the values of each row :
book_data= book_data.replace("\s ", "", regex=True)
Then try this :
out= book_data.apply(lambda x: x.replace(x.name, 1), axis=0).fillna(0).astype(int)
# Output :
print(out)
bread milk butter jam nutella cheese chips
0 1 0 1 1 1 0 0
1 0 0 1 1 1 0 1
2 0 1 0 0 0 1 0
3 1 1 1 1 1 1 1
4 1 1 0 0 1 0 0
5 1 1 1 1 0 1 1
6 1 1 0 0 1 0 0
7 1 0 1 0 0 1 0
8 1 0 1 1 1 0 0
9 0 1 1 1 0 1 0
10 1 0 0 1 1 1 1
11 1 1 1 1 1 0 0
12 1 0 1 0 1 1 0
13 1 0 1 1 1 1 1
14 1 1 1 1 1 1 1
15 0 1 1 1 1 1 0
16 0 1 0 1 1 1 0
17 1 1 1 1 1 1 1
18 1 0 1 1 1 1 0
19 1 1 1 0 1 1 0
20 0 1 0 0 0 0 1