I have this column in my dataset with that contains the following data sample:
player |
---|
David Johnson* \JohnDa08 |
Kareem Hunt*\HuntKa00 |
Melvin Gordon\GordMe00 |
and I'm trying to make it look like this using Python:
player |
---|
David Johnson |
Kareem Hunt |
Melvin Gordon |
Please help.
CodePudding user response:
In Python when you have a str
and you want to remove a substring you can use the .replace
method:
>>> a = "Hello!"
>>> a.replace("Hello", '')
'!'
>>> a = a.replace("Hello", '')
In your case the most simple thing to do is this:
>>> s = "Kareem Hunt*\\HuntKa00"
>>> s = s.replace("*\\", ' ').replace("00", '')
Or, to be more sure about the 00
to be removed from the back of the string:
>>> s = s.replace("*\\", ' ').removesuffix("00")
Since your strings are not all ending with 00
but some others are ending for example with 08
, I would suggest this:
>>> s = s.replace("*\\", ' ')[:-2]
which excludes the last two characters from the string
CodePudding user response:
You can split
on the first special character and get the first chunk:
df['player'] = df['player'].str.split(r'[^\w ]', n=1).str[0]
Or, using replace
:
df['player'] = df['player'].str.replace(r'[^\w ].*$', '', regex=True)
Output:
player
0 David Johnson
1 Kareem Hunt
2 Melvin Gordon