I have a pandas dataframe of postcodes which have been concatenated with the two-letter country code. Some of these are Brazilian postcodes and I want to replace the last three characters of any postcode which starts with 'BR' with '000'.
import pandas as pd
data = ['BR86037-890', 'GBBB7', 'BR86071-570','BR86200-000','BR86026-480','BR86082-701', 'GBCW9', 'NO3140']
df = pd.DataFrame(data, columns=['postcode'])
I have tried the below, but it is not changing any of the postcodes:
if df['postcode'].str.startswith('BR').all(): df["postcode"] = df["postcode"].str.replace(r'.{3}$', '000')
CodePudding user response:
Use str.replace
with a capturing group:
df['postcode'] = df['postcode'].str.replace(r'(BR.*)...', r'\g<1>000', regex=True)
# or, more generic
df['postcode'] = df['postcode'].str.replace(r'(BR.*).{3}', r'\g<1>' '0'*3, regex=True)
Output:
postcode
0 BR86037-000
1 GBBB7
2 BR86071-000
3 BR86200-000
4 BR86026-000
5 BR86082-000
6 GBCW9
7 NO3140
CodePudding user response:
The code is not working because df['postcode'].str.startswith('BR').all()
will return a boolean value indicating whether all postcodes in the column start with 'BR'.
try this
data = ['BR86037-890', 'GBBB7', 'BR86071-570','BR86200-000','BR86026-480','BR86082-701', 'GBCW9', 'NO3140']
df = pd.DataFrame(data, columns=['postcode'])
mask = df['postcode'].str.startswith('BR')
df.loc[mask, 'postcode'] = df.loc[mask, 'postcode'].str.replace(r'.{3}$', '000')