Imagine I have a few values like
test_val1 = 'E 18TH ST AND A AVE'
test_val2 = 'E 31ST ST AND A AVE'
I want to find the 18th, 31st, etc., and replace it with 18/31 - basically removing the suffix but keep the entire string as such.
Expected value
test_val1 = 'E 18 ST AND A AVE'
test_val2 = 'E 31 ST AND A AVE'
Please note that I do not want to remove the "St" which corresponds to 'street', so a blind replacement is not possible.
My approach was to use below (for 'th' at the moment), but it doesn't work since the function cannot keep the value/text in memory to return it.
import regex as re
test_val1.replace('\d{1,}TH', '\d{1,}', regex=True)
I have a column full of these values, so a solution that I can run/apply on a Pnadas column would be really helpful.
CodePudding user response:
You mentioned it doesn't work since the function cannot keep the value/text in memory to return it. Is it mandatory NOT to store the value to a different variable?
t1 = 'E 18TH ST AND A AVE'
for t1 in column: #t1 is address in the dataframe column
t2 = t1.split()
t2[1] = re.sub(r'(TH|ST)', '',t2[1])
t1 = ' '.join(t2)
CodePudding user response:
I think I can help with the REGEX replacement. It seems like the function that you want to use is actually sub
instead of replace
.
This is the function signature:
re.sub(pattern, repl, string[, count, flags])
Check the official documentation.
Also here is an outstanding answer to a similar question.