I would like to separate the string that include parenthese-CodePudding

This is one of list on my csv file.

Name                      Combined Score
NDUFAF7 (pp) TRMT10C      0.911
NDUFAF7 (pp) PPARGC1A     0.846

It is separate by the (pp), and now i would like to separate into two column.

df[['preferredName_A','preferredName_B']] = df['name'].str.split('\s \(.*$', expand=True)

for the code above i will only receive the name infront and the name after (pp) is gone.

I try

df[['preferredName_A','preferredName_B']] =df['name'].str.split('(pp)', expand=True)

and i will receive the error

CodePudding user response：

your second solution is so close

df[["new_1", "new_2"]] = df["Name"].str.split(r"\s*\(pp\)\s*", expand=True)

escaped the parens
put some room for spaces around with \s*
to ensure \ doesn't really escape anything, r"

to get

                    Name  Combined Score    new_1     new_2
0   NDUFAF7 (pp) TRMT10C           0.911  NDUFAF7   TRMT10C
1  NDUFAF7 (pp) PPARGC1A           0.846  NDUFAF7  PPARGC1A

CodePudding user response：

You could use str.extract here:

df[["preferredName_A", "preferredName_B"]] = df["Name"].str.extract(r'(\S (?:\s \S )*) \(pp\) (\S (?:\s \S )*)')