In my data frame, I have a lot of sting values in Column A
that are very inconsistent.
One thing I want to do is that if the last 3 characters fit a specific pattern of a dash (-) followed by two numbers, I would like to remove the dash and two numbers.
So something like:
2X-VA-0561001-SBJ02-NI-01
would become 2X-VA-0561001-SBJ02-NI
Something like:
A.2-FW-74174-KB02-0000232-HT
would remain the same
I'd ideally like to create a new column Column B
to put these new values, keeping Column A
I think something like this would work, based on something I've done previously, but I can quite figure it out:
df['Column B'] = df['Column A'].str.replace(r'SOMETHING GOES HERE', '', regex=True)
CodePudding user response:
Use regex -\d{2}$
- \d{2}
is for match 2 digits and $
for end of strings:
df['Column B'] = df['Column A'].str.replace(r'-\d{2}$', '', regex=True)
print (df)
Column A Column B
0 2X-VA-0561001-SBJ02-NI-01 2X-VA-0561001-SBJ02-NI
1 A.2-FW-74174-KB02-0000232-HT A.2-FW-74174-KB02-0000232-HT