I am studying other people's code df
and I face a similar problem to this where the data is joined whatsoever:
Names
--------
NurseJohn
SoldierJohn
TeacherJohn
DriverJohn
CEOJohn
How can I remove the words before John?
It can be removed like this but I don't understand how it was removed
df['Names'] = df['Names'].str.replace(".*(?=John)", "", regex=True)
Can someone explain to us what happened in (".*(?=John)", "", regex=True)
? and with that, is there other way to do this straightforwardly?
CodePudding user response:
Actually, the regex pattern you should have used is:
.*(?=John$)
This pattern says to match all content, greedily, until hitting the content John
at the very end of the Names
column. Note that it does not consume John
, it only asserts that it follows, before stopping the match.
Your updated code:
df["Names"] = df["Names"].str.replace(r'.*(?=John$)', '')
CodePudding user response:
ya so...your using regex...regex is a tool ever lang ive worked with uses to search strings(text). Regex = Regular Expression. next you are using regex to exclude anything before "John", then replace with "" witch is an empty string.
so to read it from left to right:
- call dataframe col 'Names'
- for string in col, replace ALL(*) before "John" with empty string(""), using regex