A weird icon is repeated after different words/tokens. An example is shown below:
Thus far, I have removed it using replace command, however this can become tedious if it is done individually for each word.
The symbol shown in the image is represented as \x9d.
The current python code is shown below:
import re
text = ['unstable',
'people\x9d.',
'pattern',
'real',
'thought',
'fearful',
'represent',
'contrarians\x9d',
'greedy',
'interesting',
'behaviour',
'opposite']
text = [k.replace('basket\x9d.', 'basket') for k in text]
text = [k.replace('people\x9d.', 'people') for k in text]
text = [k.replace('portfolios.\x9d', 'portfolios') for k in text]
I have tried to detect patterns using re.sub but have not been successfully in achieving this.
text = [re.sub('\x9d', '', str(k)) for k in text]
This code will remove the word completely.
CodePudding user response:
Here, you need to remove a sequence of two chars, \x9d
and .
.
You can use a simple str.replace
in a list comprehension:
text = [k.replace('\x9d.', '') for k in text]
See the Python demo:
import re
text = ['unstable','people\x9d.','pattern','real','thought','fearful','represent','contrarians\x9d','greedy','interesting','behaviour','opposite']
text = [k.replace('\x9d.', '') for k in text]
print(text)
# => ['unstable', 'people', 'pattern', 'real', 'thought', 'fearful', 'represent', 'contrarians\x9d', 'greedy', 'interesting', 'behaviour', 'opposite']