Home > Software design >  How to turn the item in the column to multiple columns?
How to turn the item in the column to multiple columns?

Time:05-14

I am trying to do this. So, currently my df look like this.

col_names = ['movie_id', 'movie_title', 'genres']
df = pd.read_csv('/content/drive/MyDrive/testing/ml-1m/movies.csv', sep='::', names=col_names, encoding='latin-1')

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller

while the columns are such "movie_id, name, and genre".

I want it to look like this.

The columns would be movie_id, name,Action,Adventure,Animation,Children's,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western


1::Toy Story (1995)::0::0::1::1::1::0::0::0::0::0::0::0::0::0::0::0::0::0::0
2::Jumanji (1995)::0::1::0::1::0::0::0::0::1::0::0::0::0::0::0::0::0::0::0
.
.
...

Basically, I want to turn genre column to multiple columns and 1 will be there if it match the columns.

Is there anyway to do this with pandas?

CodePudding user response:

IIUC you can do it like that:

res = (df
       .join(df['genre'].str.get_dummies())
       .drop('genre',axis=1))

print(res)

print(res)

   movie_id                                name  Action  Adventure  Animation  Children's  Comedy  Crime  Drama  Fantasy  Romance  Thriller
0         1                    Toy Story (1995)       0          0          1           1       1      0      0        0        0         0
1         2                      Jumanji (1995)       0          1          0           1       0      0      0        1        0         0
2         3             Grumpier Old Men (1995)       0          0          0           0       1      0      0        0        1         0
3         4            Waiting to Exhale (1995)       0          0          0           0       1      0      1        0        0         0
4         5  Father of the Bride Part II (1995)       0          0          0           0       1      0      0        0        0         0
5         6                         Heat (1995)       1          0          0           0       0      1      0        0        0         1
6         7                      Sabrina (1995)       0          0          0           0       1      0      0        0        1         0
7         8                 Tom and Huck (1995)       0          1          0           1       0      0      0        0        0         0
8         9                 Sudden Death (1995)       1          0          0           0       0      0      0        0        0         0
9        10                    GoldenEye (1995)       1          1          0           0       0      0      0        0        0         1

If you don't want to delete the original genre column, just remove the last part with drop

  • Related