I have a dataframe like this:
fict={'well':['10B23','10B23','10B23','10B23','10B23','10B23'],
'tag':['15B22|TestSep_OutletFlow','15B22|TestSep_GasOutletFlow','15B22|TestSep_WellNum','15B22|TestSep_GasPresValve','15B22|TestSep_Temp','WHT']}
df=pd.DataFrame(dict)
df
well tag
0 10B23 15B22|TestSep_OutletFlow
1 10B23 15B22|TestSep_GasOutletFlow
2 10B23 15B22|TestSep_WellNum
3 10B23 15B22|TestSep_GasPresValve
4 10B23 15B22|TestSep_Temp
5 10B23 WHT
Now I'd like to replace anything before | in column of tag to a string like 11A22, so the dataframe after replace should look like this:
well tag
0 10B23 11A22|TestSep_OutletFlow
1 10B23 11A22|TestSep_GasOutletFlow
2 10B23 11A22|TestSep_WellNum
3 10B23 11A22|TestSep_GasPresValve
4 10B23 11A22|TestSep_Temp
5 10B23 WHT
I am thinking to use regular expression with group to replace group by a string, something in my mind look like this
df['tag2']=df['tag'].str.replace(r'([a-z0-9]*)|TestSep_[a-z0-9]*','11A22',regex=True)
then i got result of
well tag tag2
0 10B23 15B22|TestSep_OutletFlow 11A2211A22B11A2211A22|11A2211A2211A22O11A2211A...
1 10B23 15B22|TestSep_GasOutletFlow 11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
2 10B23 15B22|TestSep_WellNum 11A2211A22B11A2211A22|11A2211A2211A22W11A2211A...
3 10B23 15B22|TestSep_GasPresValve 11A2211A22B11A2211A22|11A2211A2211A22G11A2211A...
4 10B23 15B22|TestSep_Temp 11A2211A22B11A2211A22|11A2211A2211A22T11A2211A22
5 10B23 WHT 11A22W11A22H11A22T11A22
Thanks for your help
CodePudding user response:
(|
) is a special character in regex, you need to escape it.
df["tag2"] = df["tag"].str.replace(r"^\w*\|", "11A22|", regex=True)
Output :
print(df)
well tag tag2
0 10B23 15B22|TestSep_OutletFlow 11A22|TestSep_OutletFlow
1 10B23 15B22|TestSep_GasOutletFlow 11A22|TestSep_GasOutletFlow
2 10B23 15B22|TestSep_WellNum 11A22|TestSep_WellNum
3 10B23 15B22|TestSep_GasPresValve 11A22|TestSep_GasPresValve
4 10B23 15B22|TestSep_Temp 11A22|TestSep_Temp
5 10B23 WHT WHT