I have a dataframe with location column
A | location |
---|---|
a | Rehovot 2 |
b | Tel Aviv,, |
I want to extract only city names and remove all unnecessary characters. So final output will be
A | location |
---|---|
a | Rehovot |
b | Tel Aviv |
I have tried this code so far:
df['location'] = df['location'].str.extract(r'[a-zA-Z] \s?([a-zA-Z] )?')
but its giving the error that pattern contains no matching group.
Thanks
CodePudding user response:
The first line of code removes any special characters from the column you want(while keeping white spaces) and the second line of code removes and numbers from the column you want
df['location'] = df['location'].str.replace('[^\w\s]','')
df['location'] = df['location'].str.replace('\d ', '')
CodePudding user response:
you can do this in one line using list comprehension like :
df['location'] = [
''.join([el for el in city if el.isalpha() or el == ' ']) for city in df['location']
]
output:
A location
0 a Rehovot
1 b Tel Aviv
The .isalpha()
filters the string by keeping only alpha like characters (a, A, b, B...).