I have a dataframe such as
COL1
A_element_1_ _none
C_BLOCA_element
D_element_3
element_'
BasaA_bloc
B_basA_bloc
BbasA_bloc
and I would like to remove the first 2 letters
within each row of COL1
only if they are within that list :
the_list =['A_','B_','C_','D_']
Then I should get the following output:
COL1
element_1_ _none
BLOCA_element
element_3
element_'
BasaA_bloc
basA_bloc
BbasA_bloc
So far I tried the following :
df['COL1']=df['COL1'].str.replace("A_","")
df['COL1']=df['COL1'].str.replace("B_","")
df['COL1']=df['COL1'].str.replace("C_","")
df['COL1']=df['COL1'].str.replace("D_","")
But it also remove the pattern such as in row2 A_
and does not remove only the first 2 letters...
CodePudding user response:
If the values to replace in the_list
always have that format, you could also consider using str.replace with a simple pattern matching an uppercase char A-D followed by an underscore at the start of the string ^[A-D]_
import pandas as pd
strings = [
"A_element_1_ _none ",
"C_BLOCA_element ",
"D_element_3",
"element_'",
"BasaA_bloc",
"B_basA_bloc",
"BbasA_bloc"
]
df = pd.DataFrame(strings, columns=["COL1"])
df['COL1'] = df['COL1'].str.replace(r"^[A-D]_", "")
print(df)
Output
COL1
0 element_1_ _none
1 BLOCA_element
2 element_3
3 element_'
4 BasaA_bloc
5 basA_bloc
6 BbasA_bloc
CodePudding user response:
You can also use apply()
function from pandas. So if the string is with the concerned patterns, we ommit the two first caracters else return the whole string.
d["COL1"] = d["COL1"].apply(lambda x: x[2:] if x.startswith(("A_","B_","C_","D_")) else x)