I have a df of cities, that show as:
| id | location |
|----|------------------|
| 1 | New York (NY) |
| 2 | Los Angeles (CA) |
| 3 | Houston (TX) |
And I wish use some kind of split/strip that give me something like
| id | city | state |
|----|------------------|-------|
| 1 | New York | NY |
| 2 | Los Angeles | CA |
| 3 | Houston | TX |
Or even if three columns, been one original and other 2 made by code. I already try something like:
df[['city', 'state']] = df['location'].str.split("(", expand=True)
df['state'] = df['state'].str.strip(")")
Which works, but not so much 'cause there's a blank space after every city name and shouldn't. If I search a city, like:
df[df['city'] == 'Houston']
It will return nothing, but I have to code something like:
df[df['city'] == 'Houston '] # note the empty space after code
To give me something usefull, but that way gonna give me a headache when I made a merge or something like it.
So, someone have some kind of tips to hadle with this piece of code? I'm don't find nothing usefull in internet. It's always a simple split, or a simple strip. But I have faith that there's a more inteligent mode to do it.
CodePudding user response:
Well yeah why not df['city'] = df['city'].strip()
?
CodePudding user response:
Use str.extract
:
df = df.join(df.pop('location').str.extract(r'(.*)\s*\((.*)\)')
.rename(columns={0: 'location', 1: 'state'}))
print(df)
# Output
id location state
0 1 New York NY
1 2 Los Angeles CA
2 3 Houston TX