Home > OS >  Regex replace first two letters within column in python
Regex replace first two letters within column in python

Time:11-10

I have a dataframe such as

COL1
A_element_1_ _none 
C_BLOCA_element 
D_element_3
element_'
BasaA_bloc
B_basA_bloc
BbasA_bloc

and I would like to remove the first 2 letters within each row of COL1 only if they are within that list :

the_list =['A_','B_','C_','D_'] 

Then I should get the following output:

COL1
element_1_ _none 
BLOCA_element 
element_3
element_'
BasaA_bloc
basA_bloc
BbasA_bloc

So far I tried the following :

df['COL1']=df['COL1'].str.replace("A_","")
df['COL1']=df['COL1'].str.replace("B_","")
df['COL1']=df['COL1'].str.replace("C_","")
df['COL1']=df['COL1'].str.replace("D_","")

But it also remove the pattern such as in row2 A_ and does not remove only the first 2 letters...

CodePudding user response:

If the values to replace in the_list always have that format, you could also consider using str.replace with a simple pattern matching an uppercase char A-D followed by an underscore at the start of the string ^[A-D]_

import pandas as pd

strings = [
    "A_element_1_ _none ",
    "C_BLOCA_element ",
    "D_element_3",
    "element_'",
    "BasaA_bloc",
    "B_basA_bloc",
    "BbasA_bloc"
]

df = pd.DataFrame(strings, columns=["COL1"])
df['COL1'] = df['COL1'].str.replace(r"^[A-D]_", "")

print(df)

Output

                COL1
0  element_1_ _none 
1     BLOCA_element 
2          element_3
3          element_'
4         BasaA_bloc
5          basA_bloc
6         BbasA_bloc

CodePudding user response:

You can also use apply() function from pandas. So if the string is with the concerned patterns, we ommit the two first caracters else return the whole string.

d["COL1"] = d["COL1"].apply(lambda x: x[2:] if x.startswith(("A_","B_","C_","D_")) else x)
  • Related