I have a data frame with a column for 'genre' with strings like 'drama, comedy, action'.
I want to split the elements like this 'drama', 'comedy', 'action' so I've used;
Genre=[]
for genre_type in books['genre'].astype('str'):
Genre.append(genre_type.split(','))
genre['genres_1']=genres_1
but, the result contains spaces between genres (other than the first one listed) like 'drama','_comedy','_action'. (I used an underscore to represent the space because otherwise it's hard to see).
so I tried
Genre_clean=[]
for x in books['genres_1'].astype('str'):
Genre_clean.append(x.strip(' '))
Genre_clean
but the space remains, what am I doing wrong?
my full code is below;
import pandas as pd
# Creating sample dataframes
books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']
# Splitting genre
Genre=[]
for genre_type in books['genre'].astype('str'):
Genre.append(genre_type.split(','))
books['genres_1']=Genre
# trying to remove the space
Genre_clean=[]
for x in books['genres_1'].astype('str'):
Genre_clean.append(x.strip(' '))
Genre_clean
CodePudding user response:
Don't use traditional loops/list comprehension for pandas. Look up the equivalent, far more efficient, pandas specific function for whatever you want to do. Otherwise, there's no reason to use pandas.
See: pandas str functions
books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']
books.genre = books.genre.str.split(', ')
print(books)
Output:
genre
0 [drama, comedy, action]
1 [romance, sci-fi, drama]
2 [horror]
If you want this as a string, you can join the list again with:
books.genre = books.genre.str.join(',')
# Or, all at once:
# books.genre = books.genre.str.split(', ').str.join(',')
# Or, just replace spaces with nothing:
# books.genre = books.genre.str.replace(' ', '')
print(books)
# Output:
genre
0 drama,comedy,action
1 romance,sci-fi,drama
2 horror