here is a sample table of the output I got while running this code
df['formatted_codes']=df['dx_code'].str.replace(r'(^\w{3}(?!$))',r'\1.',regex=True)
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00. |
2 | A000 | A00.0 |
3 | A001 | A00.1 |
4 | A009 | A00.9 |
5 | A01 | A01. |
6 | S92113 | S92.113 |
7 | S92113D | S92.113D |
but I want the '.' to apply only for characters more than 3 the output I want is like this
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00 |
2 | A000 | A00.0 |
3 | A001 | A00.1 |
4 | A009 | A00.9 |
5 | A01 | A01 |
6 | S92113 | S92.113 |
7 | S92113D | S92.113D |
so if anyone can help me with adjusting the regex code that would be helpful or if there is other way for add '.' at my desired location do tell
CodePudding user response:
Use str.rstrip
to remove trailing dots from the formatted_codes
column:
df["formatted_codes"] = df["formatted_codes"].str.rstrip('.')
CodePudding user response:
You need to use
df['formatted_codes']=df['dx_code'].str.replace(r'\w{3}(?!$)', r'\g<0>.', regex=True)
See the regex demo.
The \w{3}(?!$)
regex finds three consecutive word chars that are not at the start of string and replaces the found text with the same text (the \g<0>
backreference refers to the whole match value, no need for any extra capturing group around the whole pattern) and a dot char.