Home > Enterprise >  Use regex to replace words before any digit with nothing
Use regex to replace words before any digit with nothing

Time:05-12

I want to replace words and spaces that appear before a digit in a string with nothing. For example, for the string = 'Juice of 1/2', I want to return '1/2'. I tried the following, but it did not work.

string = "Juice of 1/2"
new = string.replace(r"^. ?(?=\d)", "")

Also I am trying to perform this on every cell of a list of columns using the following code. How would I incorporate the new regex pattern into the existing pattern of r"(|)|?

df[pd.Index(cols2)   "_clean"] = (
    df[cols2]
    .apply(lambda col: col.str.replace(r"\(|\)|,", "", regex=True))

)

CodePudding user response:

You might be able to phrase this using str.extract:

df["col2"] = df["col2"].str.extract(r'([0-9/-] )')

CodePudding user response:

. ? will match anything, including other digits. It will also match the / in 1/2. Since you only want to replace letters and spaces, use [a-z\s] .

You also have to use re.sub(), not string.replace() (in Pandas, .str.replace() processes regular expressions by default).

new = re.sub(r'[a-z\s] (?=\d)', '', string, flags=re.I)

CodePudding user response:

May be something like this might work.

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[A-Za-z\s] "

test_str = "Juice of 1/2 hede"

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

  • Related