in my dataframe I have a column of strings like:
**type**
game/design_game
game/art_game
design/product design
fashion/accessories
games/tabletop games
art/digital art
art/public art
And I want to split it into two from / like:
main_cat subcat
game design_game
game art_game
I am applying split function:
df.column.str.split('/',n=1, expand = True)
But I get only the main_cat column and not subcats
Also I tried alternatively:
# new data frame with split value columns
new = df["column"].str.split("/", n = 1, expand = True)
# making separate first name column from new data frame
df["subcat"]= new[1]
# making separate last name column from new data frame
df["main_cat"]= new[0]
# df display
df.head(2)
But get keyerror for new[1]
Can someone please help me. Thanks!
CodePudding user response:
You could use a regex with named groups. Thus you don't need to worry about missing data.
The first part [^/]
matches any string without /
, then .*
matches the rest of the string:
df['type'].str.extract('(?P<main_cat>[^/] )/(?P<subcat>.*)')
output:
main_cat subcat
0 game design_game
1 game art_game
2 design product design
3 fashion accessories
4 games tabletop games
5 art digital art
6 art public art
NB. if you expect to have lines with only a main_cat and want to catch it:
df['type'].str.extract('(?P<main_cat>[^/] )/?(?P<subcat>.*)')
example if the last line was 'art' only:
main_cat subcat
6 art