Home > OS >  split string in a column into two columns
split string in a column into two columns

Time:09-30

in my dataframe I have a column of strings like:

**type**
game/design_game
game/art_game
design/product design 
fashion/accessories 
games/tabletop games 
art/digital art      
art/public art    

And I want to split it into two from / like:

main_cat       subcat        
game           design_game        
game           art_game      

I am applying split function:

   df.column.str.split('/',n=1, expand = True)

But I get only the main_cat column and not subcats

Also I tried alternatively:

 # new data frame with split value columns
 new = df["column"].str.split("/", n = 1, expand = True)

 # making separate first name column from new data frame
 df["subcat"]= new[1]

 # making separate last name column from new data frame
 df["main_cat"]= new[0]

 # df display
 df.head(2)

But get keyerror for new[1]

Can someone please help me. Thanks!

CodePudding user response:

You could use a regex with named groups. Thus you don't need to worry about missing data.

The first part [^/] matches any string without /, then .* matches the rest of the string:

df['type'].str.extract('(?P<main_cat>[^/] )/(?P<subcat>.*)')

output:

  main_cat          subcat
0     game     design_game
1     game        art_game
2   design  product design
3  fashion     accessories
4    games  tabletop games
5      art     digital art
6      art      public art

NB. if you expect to have lines with only a main_cat and want to catch it:

df['type'].str.extract('(?P<main_cat>[^/] )/?(?P<subcat>.*)')

example if the last line was 'art' only:

  main_cat          subcat
6      art                
  • Related