Regex: if string ends with numerical values - drop it, OR if string ends with single letter

Here is a sample of a dataset I have:

ID	Project
1	21st Townhouse 318
2	The Residences 6
3	Villanova Tower B
4	The Hills H
5	City Park

I need to transform 'Project' column so that:

if a row ends with numeric values, they should be dropped
if a row ends with single letter, it should be dropped
else, leave as it is

Here is how I want it to look like:

ID	Project
1	21st Townhouse
2	The Residences
3	Villanova Tower
4	The Hills
5	City Park

I tried to search for some solution, and found this(for first condition with numeric values only):

df['Project']=df.Project[~((df.Project.astype(str).str.match("(.*\d)")) & (df.Project.astype(str).str.len() > 1))]

It worked, however, I tried to apply it for the second condition as well:

df['Project']=df.Project[~((df.Project.astype(str).str.match("(.*\w)")) & (df.Project.astype(str).str.len() == 1))]

But, It failed

Can you help me, please? Thank you!

CodePudding user response：

You can use

df['Project']=df['Project'].str.replace(r'\s (?:\d |[A-Za-z])$', '', regex=True)

See the regex demo.

Details:

\s - one or more whitespaces
(?:\d |[A-Za-z]) - a non-capturing group matching either
- \d - one or more digits
- | - or
- [A-Za-z] - an ASCII letter (use [^\W\d_] to match any Unicode letter)
$ - end of string.

If your strings have no whitespace before the last number/letter, you may use

df['Project'].str.replace(r'\b(?:\d |[A-Za-z])$', '', regex=True).str.rstrip()

The \b matches a word boundary making sure there is no letter, digit or _ right before the final digits/letter, and .str.rstrip() will strip the trailing whitespace.