Home > Blockchain >  Splitting a string into elements creates a space that I can't remove
Splitting a string into elements creates a space that I can't remove

Time:07-08

I have a data frame with a column for 'genre' with strings like 'drama, comedy, action'.

I want to split the elements like this 'drama', 'comedy', 'action' so I've used;

Genre=[]

for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
genre['genres_1']=genres_1

but, the result contains spaces between genres (other than the first one listed) like 'drama','_comedy','_action'. (I used an underscore to represent the space because otherwise it's hard to see).

so I tried

Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean

but the space remains, what am I doing wrong?

my full code is below;

import pandas as pd

# Creating sample dataframes
books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

# Splitting genre
Genre=[]
for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
books['genres_1']=Genre

# trying to remove the space
Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean

CodePudding user response:

Don't use traditional loops/list comprehension for pandas. Look up the equivalent, far more efficient, pandas specific function for whatever you want to do. Otherwise, there's no reason to use pandas.

See: pandas str functions

books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

books.genre = books.genre.str.split(', ')
print(books)

Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]

If you want this as a string, you can join the list again with:

books.genre = books.genre.str.join(',')
    # Or, all at once:
# books.genre = books.genre.str.split(', ').str.join(',')
    # Or, just replace spaces with nothing:
# books.genre = books.genre.str.replace(' ', '')
print(books)

# Output:

                  genre
0   drama,comedy,action
1  romance,sci-fi,drama
2                horror
  • Related