Home > Net >  Remove last 3 characters from strings if they fit specific pattern in pandas
Remove last 3 characters from strings if they fit specific pattern in pandas

Time:12-07

In my data frame, I have a lot of sting values in Column A that are very inconsistent.

One thing I want to do is that if the last 3 characters fit a specific pattern of a dash (-) followed by two numbers, I would like to remove the dash and two numbers.

So something like:

2X-VA-0561001-SBJ02-NI-01 would become 2X-VA-0561001-SBJ02-NI

Something like:

A.2-FW-74174-KB02-0000232-HT would remain the same

I'd ideally like to create a new column Column B to put these new values, keeping Column A

I think something like this would work, based on something I've done previously, but I can quite figure it out:

df['Column B'] = df['Column A'].str.replace(r'SOMETHING GOES HERE', '', regex=True)

CodePudding user response:

Use regex -\d{2}$ - \d{2} is for match 2 digits and $ for end of strings:

df['Column B'] = df['Column A'].str.replace(r'-\d{2}$', '', regex=True)   
print (df)
                       Column A                      Column B
0     2X-VA-0561001-SBJ02-NI-01        2X-VA-0561001-SBJ02-NI
1  A.2-FW-74174-KB02-0000232-HT  A.2-FW-74174-KB02-0000232-HT
  • Related