Home > front end >  pandas/regex: Remove the string after the hyphen or parenthesis character (including) carry string a
pandas/regex: Remove the string after the hyphen or parenthesis character (including) carry string a

Time:12-04

I have a dataframe contains one column which has multiple strings separated by the comma, but in this string, I want to remove all matter after hyphen (including hyphen), main point is after in some cases hyphen is not there but directed parenthesis is there so I also want to remove that as well and carry all the after the comma how can I do it? You can see this case in last row.

dd = pd.DataFrame()
dd['sin'] = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

Expected output

dd['sin']
# output 
U147 U35
P01 P02
P3 P032
P034
P23F5 P04

Want to carry only string before the hyphen or parenthesis or any special character.

CodePudding user response:

The following code seems to reproduce your desired result:

dd['sin'] = dd['sin'].str.split(", ")
dd = dd.explode('sin').reset_index()
dd['sin'] = dd['sin'].str.replace('\W.*', '', regex=True)

Which gives dd['sin'] as:

0     U147
1      U35
2      P01
3      P02
4       P3
5     P032
6     P034
7    P23F5
8      P04
Name: sin, dtype: object

The call of .reset_index() in the second line is optional depending on whether you want to preserve which row that piece of the string came from.

CodePudding user response:

You can use the following regex:

r"-\d{2}|\([EBP]CM\)|\s"


Here is the code:

sin = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

dd = pd.DataFrame()
dd['sin'] = sin
dd['sin'] = dd['sin'].str.replace(r'-\d{2}|\([EBP]CM\)|\s', '', regex=True)
print(dd)

OUTPUT:

         sin
0   U147,U35
1    P01,P02
2    P3,P032
3       P034
4  P23F5,P04



EDIT

Or use this line to remove the comma:

dd['sin'] = dd['sin'].str.replace(r'-\d{2}|\([EBP]CM\)|\s', '', regex=True).str.replace(',',' ')

OUTPUT:

         sin
0   U147 U35
1    P01 P02
2    P3 P032
3       P034
4  P23F5 P04
  • Related