I have a dataframe containing addresses.
Addressess |
---|
10 Pentland Drive, Comiston, Edinburgh, EH10 6PX. |
Moray Place, Edinburgh, EH3 |
Carlton Street, Edinburgh |
The Bourse Apartments, 47 Timber Bush, Leith EH6 6QH |
I wish to write code in python and pandas that identifies if there is an 'EH' in the row and then moves this and all subsequent characters into another column. Thus achieving this:
Addressess | Post Code |
---|---|
10 Pentland Drive, Comiston, Edinburgh, . | EH10 6PX |
Moray Place, Edinburgh, | EH3 |
Carlton Street, Edinburgh | |
The Bourse Apartments, 47 Timber Bush, Leith EH6 6QH | EH6 6QH |
can anyone help?
CodePudding user response:
You can use str.extract
:
df[['Addressess', 'Post Code']] = df['Addressess'].str.extract(r'(.*?)\s*(\bEH\d [\s\w]*)?\W*$')
Or str.split
, if there is at least one row with a post code:
df[['Addressess', 'Post Code']] = df['Addressess'].str.split(r'\s*(?=\bEH\d*)', n=1, expand=True)
Output:
Addressess Post Code
0 10 Pentland Drive, Comiston, Edinburgh, EH10 6PX
1 Moray Place, Edinburgh, EH3
2 Carlton Street, Edinburgh NaN
3 The Bourse Apartments, 47 Timber Bush, Leith EH6 6QH
CodePudding user response:
With simple regex matching:
df['Post Code'] = df['Addressess'].str.extract('(EH. )').fillna('')
Addressess Post Code
0 10 Pentland Drive, Comiston, Edinburgh, EH10 6PX. EH10 6PX.
1 Moray Place, Edinburgh, EH3 EH3
2 Carlton Street, Edinburgh
3 The Bourse Apartments, 47 Timber Bush, Leith E... EH6 6QH
CodePudding user response:
Here is a way using a regex positive lookahead:
df['Addressess'].str.split(r'[0-9A-Za-z,] (?=EH)',expand=True).rename({0:'Addressess',1:'Post Code'},axis=1)