add a dot at 3rd position of the string with the help of regex in pandas-CodePudding

here is a sample table of the output I got while running this code

df['formatted_codes']=df['dx_code'].str.replace(r'(^\w{3}(?!$))',r'\1.',regex=True)

dx_id	dx_code	formatted_codes
1	A00	A00.
2	A000	A00.0
3	A001	A00.1
4	A009	A00.9
5	A01	A01.
6	S92113	S92.113
7	S92113D	S92.113D

but I want the '.' to apply only for characters more than 3 the output I want is like this

dx_id	dx_code	formatted_codes
1	A00	A00
2	A000	A00.0
3	A001	A00.1
4	A009	A00.9
5	A01	A01
6	S92113	S92.113
7	S92113D	S92.113D

so if anyone can help me with adjusting the regex code that would be helpful or if there is other way for add '.' at my desired location do tell

CodePudding user response：

Use str.rstrip to remove trailing dots from the formatted_codes column:

df["formatted_codes"] = df["formatted_codes"].str.rstrip('.')

CodePudding user response：

You need to use

df['formatted_codes']=df['dx_code'].str.replace(r'\w{3}(?!$)', r'\g<0>.', regex=True)

See the regex demo.

The \w{3}(?!$) regex finds three consecutive word chars that are not at the start of string and replaces the found text with the same text (the \g<0> backreference refers to the whole match value, no need for any extra capturing group around the whole pattern) and a dot char.