How to replace last three characters of a string in a column if it starts with character-CodePudding

I have a pandas dataframe of postcodes which have been concatenated with the two-letter country code. Some of these are Brazilian postcodes and I want to replace the last three characters of any postcode which starts with 'BR' with '000'.

import pandas as pd

data = ['BR86037-890', 'GBBB7', 'BR86071-570','BR86200-000','BR86026-480','BR86082-701', 'GBCW9', 'NO3140']

df = pd.DataFrame(data, columns=['postcode'])

I have tried the below, but it is not changing any of the postcodes:

if df['postcode'].str.startswith('BR').all(): df["postcode"] = df["postcode"].str.replace(r'.{3}$', '000')

CodePudding user response：

Use str.replace with a capturing group:

df['postcode'] = df['postcode'].str.replace(r'(BR.*)...', r'\g<1>000', regex=True)

# or, more generic
df['postcode'] = df['postcode'].str.replace(r'(BR.*).{3}', r'\g<1>' '0'*3, regex=True)

Output:

      postcode
0  BR86037-000
1        GBBB7
2  BR86071-000
3  BR86200-000
4  BR86026-000
5  BR86082-000
6        GBCW9
7       NO3140

regex demo

CodePudding user response：

The code is not working because df['postcode'].str.startswith('BR').all() will return a boolean value indicating whether all postcodes in the column start with 'BR'.

try this

data = ['BR86037-890', 'GBBB7', 'BR86071-570','BR86200-000','BR86026-480','BR86082-701', 'GBCW9', 'NO3140']
df = pd.DataFrame(data, columns=['postcode'])

mask = df['postcode'].str.startswith('BR')
df.loc[mask, 'postcode'] = df.loc[mask, 'postcode'].str.replace(r'.{3}$', '000')