I want to replace words and spaces that appear before a digit in a string with nothing. For example, for the string = 'Juice of 1/2', I want to return '1/2'. I tried the following, but it did not work.
string = "Juice of 1/2"
new = string.replace(r"^. ?(?=\d)", "")
Also I am trying to perform this on every cell of a list of columns using the following code. How would I incorporate the new regex pattern into the existing pattern of r"(|)|?
df[pd.Index(cols2) "_clean"] = (
df[cols2]
.apply(lambda col: col.str.replace(r"\(|\)|,", "", regex=True))
)
CodePudding user response:
You might be able to phrase this using str.extract
:
df["col2"] = df["col2"].str.extract(r'([0-9/-] )')
CodePudding user response:
. ?
will match anything, including other digits. It will also match the /
in 1/2
. Since you only want to replace letters and spaces, use [a-z\s]
.
You also have to use re.sub()
, not string.replace()
(in Pandas, .str.replace()
processes regular expressions by default).
new = re.sub(r'[a-z\s] (?=\d)', '', string, flags=re.I)
CodePudding user response:
May be something like this might work.
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"[A-Za-z\s] "
test_str = "Juice of 1/2 hede"
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.