This is one of list on my csv file.
Name Combined Score
NDUFAF7 (pp) TRMT10C 0.911
NDUFAF7 (pp) PPARGC1A 0.846
It is separate by the (pp), and now i would like to separate into two column.
df[['preferredName_A','preferredName_B']] = df['name'].str.split('\s \(.*$', expand=True)
for the code above i will only receive the name infront and the name after (pp) is gone.
I try
df[['preferredName_A','preferredName_B']] =df['name'].str.split('(pp)', expand=True)
and i will receive the error
CodePudding user response:
your second solution is so close
df[["new_1", "new_2"]] = df["Name"].str.split(r"\s*\(pp\)\s*", expand=True)
- escaped the parens
- put some room for spaces around with \s*
- to ensure \ doesn't really escape anything, r"
to get
Name Combined Score new_1 new_2
0 NDUFAF7 (pp) TRMT10C 0.911 NDUFAF7 TRMT10C
1 NDUFAF7 (pp) PPARGC1A 0.846 NDUFAF7 PPARGC1A
CodePudding user response:
You could use str.extract
here:
df[["preferredName_A", "preferredName_B"]] = df["Name"].str.extract(r'(\S (?:\s \S )*) \(pp\) (\S (?:\s \S )*)')