Regex replace first two letters within column in python-CodePudding

I have a dataframe such as

COL1
A_element_1_ _none 
C_BLOCA_element 
D_element_3
element_'
BasaA_bloc
B_basA_bloc
BbasA_bloc

and I would like to remove the first 2 letters within each row of COL1 only if they are within that list :

the_list =['A_','B_','C_','D_']

Then I should get the following output:

COL1
element_1_ _none 
BLOCA_element 
element_3
element_'
BasaA_bloc
basA_bloc
BbasA_bloc

So far I tried the following :

df['COL1']=df['COL1'].str.replace("A_","")
df['COL1']=df['COL1'].str.replace("B_","")
df['COL1']=df['COL1'].str.replace("C_","")
df['COL1']=df['COL1'].str.replace("D_","")

But it also remove the pattern such as in row2 A_ and does not remove only the first 2 letters...

CodePudding user response：

If the values to replace in the_list always have that format, you could also consider using str.replace with a simple pattern matching an uppercase char A-D followed by an underscore at the start of the string ^[A-D]_

import pandas as pd

strings = [
    "A_element_1_ _none ",
    "C_BLOCA_element ",
    "D_element_3",
    "element_'",
    "BasaA_bloc",
    "B_basA_bloc",
    "BbasA_bloc"
]

df = pd.DataFrame(strings, columns=["COL1"])
df['COL1'] = df['COL1'].str.replace(r"^[A-D]_", "")

print(df)

Output

                COL1
0  element_1_ _none 
1     BLOCA_element 
2          element_3
3          element_'
4         BasaA_bloc
5          basA_bloc
6         BbasA_bloc

CodePudding user response：

You can also use apply() function from pandas. So if the string is with the concerned patterns, we ommit the two first caracters else return the whole string.

d["COL1"] = d["COL1"].apply(lambda x: x[2:] if x.startswith(("A_","B_","C_","D_")) else x)