I got the following df
housenumber | housenumber_extra |
---|---|
1 | |
2 | a |
55a | a |
55-4 | -4 |
3455 C |
How can i get the following df
housenumber | housenumber_extra |
---|---|
1 | |
2 | A |
55 | A |
55 | -4 |
3455 | C |
I filtered the alphabetic values from it with replace function. And then using the update function but its not working. Does someone know a better way
housenumber = df['huisnummer'].str.replace('[^a-zA-Z.,-/]', '')
df.set_index('serial', inplace=True)
df.update(huisnummer)
df.reset_index(inplace=True) # to recover the initial structure
CodePudding user response:
Use .str.extract
:
df['housenumber_extra'] = df['housenumber'].str.extract(r'\d ([^\d]. )$')[0].str.strip().fillna(df['housenumber_extra']).str.upper()
Output:
>>> df
housenumber housenumber_extra
0 1 NaN
1 2 A
2 55a A
3 55-4 -4
4 3455 C C
CodePudding user response:
It seems that you're actually not using the second column and can extract all info directly from the first column:
df[['housenumber', 'housenumber_extra']] = df['housenumber'].str.extract('(\d )\W*(\w*)')
output:
housenumber housenumber_extra
0 1
1 2
2 55 a
3 55 4
4 3455 C
Or even without assignment using named capturing groups:
df['housenumber'].str.extract('(?P<housenumber>\d )\W*(?P<housenumber_extra>\w*)')