I would like to replace some targets in two particular columns with the new characters. Here is my data.
classes = [('2.7.2.3', 'a primary alcohol',
'an aldehyde'),
('2.7.1.3', 'a secondary alcohol',
'a ketone'),
('3.1.1.3', 'an aldehyde NADP( )',
'a 3-oxoacyl-[ACP] NADPH'),
('3.1.1.3', '3-oxoacyl-[ACP] NAD( )',
'2,3-dioxo-L-gulonate NADH'),
('2.7.2.3', 'D-ribitol 5-phosphate NADP( )',
'a primary alcohol H( )'),
('1.7.99.4', '2,3-dioxo-L-gulonate NAD( )',
'D-ribulose 5-phosphate NADH'),
('1.1.1.304', 'L-iditol NAD( )', ' H( ) keto-L-sorbose NADH'),
('2.7.4.3', 'H2O', 'oxidized coenzyme F420-1'),
('4.1.1.68', 'myo-inositol NAD( )', ' H( ) NADH a secondary alcohol')]
labels = ['Ko_EC','From', 'to']
alls = pd.DataFrame.from_records(classes, columns=labels)
I want to replace all
and some unique characters, namely,S = ['H2O','NADP( )','NADPH','NAD( )', 'NADH', 'H( )']
. My code is :
alls['From'] = alls['From'].str.replace(" ", "")
alls['to'] = alls['to'].str.replace(" ", "")
S = ['H2O','NADP()','NADPH','NAD()', 'NADH', 'H()']
alls
However, it reported re.error: nothing to repeat at position 2
.
The expected results, in which all the special targets included in S list were replaced, are:
Ko_EC From to
0 2.7.2.3 a primary alcohol an aldehyde
1 2.7.1.3 a secondary alcohol a ketone
2 3.1.1.3 an aldehyde a 3-oxoacyl-[ACP]
3 3.1.1.3 3-oxoacyl-[ACP] 2,3-dioxo-L-gulonate
4 2.7.2.3 D-ribitol 5-phosphate a primary alcohol
5 1.7.99.4 2,3-dioxo-L-gulonate D-ribulose 5-phosphate
6 1.1.1.304 L-iditol keto-L-sorbose
7 2.7.4.3 oxidized coenzyme F420-1
8 4.1.1.68 myo-inositol a secondary alcohol
CodePudding user response:
For the strings in list you can use a lambda function:
S = ['H2O','NADP()','NADPH','NAD()', 'NADH', 'H()']
def list_remove(x):
return ' '.join([el for el in x.split(' ') if el not in S])
alls['From'] = alls['From'].apply(lambda x: list_remove(x))
alls['to'] = alls['to'].apply(lambda x: list_remove(x))
CodePudding user response:
S = ['H2O','NADP( )','NADPH','NAD( )', 'NADH', 'H( )', ' ']
S
# escape the regex special characters in S list
# then create an OR string using join for use with replace
alls['From']=alls['From'].str.replace(rf"{'|'.join(map(re.escape, S))}", "", regex=True)
alls['to'] =alls['to'].str.replace(rf"{'|'.join(map(re.escape, S))}", "", regex=True)
alls
Ko_EC From to
0 2.7.2.3 a primary alcohol an aldehyde
1 2.7.1.3 a secondary alcohol a ketone
2 3.1.1.3 an aldehyde a 3-oxoacyl-[ACP]
3 3.1.1.3 3-oxoacyl-[ACP] 2,3-dioxo-L-gulonate
4 2.7.2.3 D-ribitol 5-phosphate a primary alcohol
5 1.7.99.4 2,3-dioxo-L-gulonate D-ribulose 5-phosphate
6 1.1.1.304 L-iditol keto-L-sorbose
7 2.7.4.3 oxidized coenzyme F420-1
8 4.1.1.68 myo-inositol a secondary alcohol