Home > other >  Removing consecutive symbols/characters found after a word for multiple tokens
Removing consecutive symbols/characters found after a word for multiple tokens

Time:10-04

A weird icon is repeated after different words/tokens. An example is shown below:

symbol

Thus far, I have removed it using replace command, however this can become tedious if it is done individually for each word.

The symbol shown in the image is represented as \x9d. The current python code is shown below:

import re
 text = ['unstable',
 'people\x9d.',
 'pattern',
 'real',
 'thought',
 'fearful',
 'represent',
 'contrarians\x9d',
 'greedy',
 'interesting',
 'behaviour',
 'opposite']
  text = [k.replace('basket\x9d.', 'basket') for k in text]
  text = [k.replace('people\x9d.', 'people') for k in text]
  text = [k.replace('portfolios.\x9d', 'portfolios') for k in text]

I have tried to detect patterns using re.sub but have not been successfully in achieving this.

text = [re.sub('\x9d', '', str(k)) for k in text] 

This code will remove the word completely.

CodePudding user response:

Here, you need to remove a sequence of two chars, \x9d and ..

You can use a simple str.replace in a list comprehension:

text = [k.replace('\x9d.', '') for k in text]

See the Python demo:

import re
text = ['unstable','people\x9d.','pattern','real','thought','fearful','represent','contrarians\x9d','greedy','interesting','behaviour','opposite']
text = [k.replace('\x9d.', '') for k in text]
print(text)
# => ['unstable', 'people', 'pattern', 'real', 'thought', 'fearful', 'represent', 'contrarians\x9d', 'greedy', 'interesting', 'behaviour', 'opposite']
  • Related