Home > front end >  Split one column into two columns with python pandas
Split one column into two columns with python pandas

Time:12-08

I have a df of cities, that show as:

| id | location         |
|----|------------------|
| 1  | New York (NY)    |
| 2  | Los Angeles (CA) |
| 3  | Houston (TX)     |

And I wish use some kind of split/strip that give me something like

| id | city             | state |
|----|------------------|-------|
| 1  | New York         |   NY  |
| 2  | Los Angeles      |   CA  |
| 3  | Houston          |   TX  |

Or even if three columns, been one original and other 2 made by code. I already try something like:

df[['city', 'state']] = df['location'].str.split("(", expand=True)
df['state'] = df['state'].str.strip(")")

Which works, but not so much 'cause there's a blank space after every city name and shouldn't. If I search a city, like:

df[df['city'] == 'Houston'] 

It will return nothing, but I have to code something like:

df[df['city'] == 'Houston '] # note the empty space after code

To give me something usefull, but that way gonna give me a headache when I made a merge or something like it.

So, someone have some kind of tips to hadle with this piece of code? I'm don't find nothing usefull in internet. It's always a simple split, or a simple strip. But I have faith that there's a more inteligent mode to do it.

CodePudding user response:

Well yeah why not df['city'] = df['city'].strip()?

CodePudding user response:

Use str.extract:

df = df.join(df.pop('location').str.extract(r'(.*)\s*\((.*)\)')
               .rename(columns={0: 'location', 1: 'state'}))
print(df)

# Output
   id      location state
0   1     New York     NY
1   2  Los Angeles     CA
2   3      Houston     TX
  • Related