Sample data:
/SomeText/2016-11-11
/SomeText/2016-11-11/13.40.48
/SomeText/15.T-06-26/00.00.00
I tried to compile a list of regex patterns but it was not working. I am not well skilled with regex patterns. What I am trying to do is make a list of regex patterns that matches anything after:
/SomeText/
and remove it from the column and store the new results in a new pandas column.
CodePudding user response:
There are various options but I believe best practice is to use pd.apply().
So you want to create a function to apply to every row in a column and then apply it to that column such as:
df = df["<column_name>"].apply(lambda x: re.sub(r"\/[a-zA-Z]*\/", "", x))
To explain what this is doing, it's applying a one time function...
(lambda x:re.sub(r"\/[a-zA-Z]*\/", "", x)
to every x in the column that is supplied.
The re.sub bit is matching a forward slash (with "/") then any number of letters (with "[a-zA-Z]*"), and then another forward slash. It is replace anything that matches this with an empty string.
CodePudding user response:
The correct regex to match "/SomeText/" is \/SomeText\/
. Note that I am assuming here that you do have a specific string you want to clean out -- rather than cleaning out all alphabetic characters, etc.
Let's say your data is stored in a column called "orig" of dataframe df. You can separate out "/SomeText/" and the text that follows like this:
df['orig'].str.extract(r"(\/SomeText\/)(.*)")
CodePudding user response:
Use str.extract
:
df = pd.DataFrame({'Text': ['/SomeText/2016-11-11',
'/SomeText/2016-11-11/13.40.48',
'/SomeText/15.T-06-26/00.00.00']})
df['Text'] = df['Text'].str.extract(r'(.*SomeText)/')
print(df)
# Output:
Text
0 /SomeText
1 /SomeText
2 /SomeText