move string starting with specific characters into another column in pandas df-CodePudding

I have a dataframe containing addresses.

Addressess
10 Pentland Drive, Comiston, Edinburgh, EH10 6PX.
Moray Place, Edinburgh, EH3
Carlton Street, Edinburgh
The Bourse Apartments, 47 Timber Bush, Leith EH6 6QH

I wish to write code in python and pandas that identifies if there is an 'EH' in the row and then moves this and all subsequent characters into another column. Thus achieving this:

Addressess	Post Code
10 Pentland Drive, Comiston, Edinburgh, .	EH10 6PX
Moray Place, Edinburgh,	EH3
Carlton Street, Edinburgh
The Bourse Apartments, 47 Timber Bush, Leith EH6 6QH	EH6 6QH

can anyone help?

CodePudding user response：

You can use str.extract:

df[['Addressess', 'Post Code']] = df['Addressess'].str.extract(r'(.*?)\s*(\bEH\d [\s\w]*)?\W*$')

regex demo

Or str.split, if there is at least one row with a post code:

df[['Addressess', 'Post Code']] = df['Addressess'].str.split(r'\s*(?=\bEH\d*)', n=1, expand=True)

regex demo

Output:

                                     Addressess Post Code
0       10 Pentland Drive, Comiston, Edinburgh,  EH10 6PX
1                       Moray Place, Edinburgh,       EH3
2                     Carlton Street, Edinburgh       NaN
3  The Bourse Apartments, 47 Timber Bush, Leith   EH6 6QH

CodePudding user response：

With simple regex matching:

df['Post Code'] = df['Addressess'].str.extract('(EH. )').fillna('')

                                          Addressess  Post Code
0  10 Pentland Drive, Comiston, Edinburgh, EH10 6PX.  EH10 6PX.
1                        Moray Place, Edinburgh, EH3        EH3
2                          Carlton Street, Edinburgh           
3  The Bourse Apartments, 47 Timber Bush, Leith E...    EH6 6QH

CodePudding user response：

Here is a way using a regex positive lookahead:

df['Addressess'].str.split(r'[0-9A-Za-z,] (?=EH)',expand=True).rename({0:'Addressess',1:'Post Code'},axis=1)