This is an example of a bigger dataframe. Imagine I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({"ID":["4SSS50FX","2TT1897FA"],
"VALUE":[13, 56]})
df
Out[2]:
ID VALUE
0 4SSS50FX 13
1 2TT1897FA 56
I would like to insert "-" in the strings from df["ID"] everytime it changes from number to text and from text to number. So the output should be like:
ID VALUE
0 4-SSS-50-FX 13
1 2-TT-1897-FA 56
I could create specific conditions for each case, but I would like to automate it for all the samples. Anyone could help me?
CodePudding user response:
You can use a regular expression with lookarounds.
df['ID'] = df['ID'].str.replace(r'(?<=\d)(?=[A-Z])|(?<=[A-Z])(?=\d)', '-')
The regexp matches an empty string that's either preceded by a digit and followed by a letter, or vice versa. This empty string is then replaced with -
.
CodePudding user response:
Use a regex.
>>> df['ID'].str.replace('(\d (?=\D)|\D (?=\d))', r'\1-', regex=True)
0 4-SSS-50-FX
1 2-TT-1897-FA
Name: ID, dtype: object
\d (?=\D)
means digits followed by non-digit.
\D (?=\d))
means non-digits followed by digit.
Either of those are replaced with themselves plus a -
character.