Home > Mobile >  Replacing replaces in a faster way
Replacing replaces in a faster way

Time:07-27

I'm filtering lots of tweets and while I was doing tests on how to filter each character I ended up with this:

x = open(string, encoding='utf-8')
text = x.read()
text = re.sub(r'http\S '   '\n', '', text, )
text = re.sub(r'http\S ', '', text,)  # removes links
text = re.sub(r'@\S '   '\n', '', text)
text = re.sub(r'@\S ', '', text)  # removes usernames
text = text.replace('0', '').replace('1', '').replace('2', '').replace('3', '') \
    .replace('4', '').replace('5', '').replace('6', '').replace('7', '').replace('8', '').replace('9', '') \
    .replace(',', '').replace('"', '').replace('“', '').replace('?', '').replace('¿', '').replace(':', '') \
    .replace(';', '').replace('-', '').replace('!', '').replace('¡', '').replace('.', '').replace('ℹ', '') \
    .replace('\'', '').replace('[', '').replace(']', '').replace('   ', '').replace('  ', '').replace('”', '') \
    .replace('º', '').replace(' ', '').replace('#', '').replace('\n', '').replace('·', '\n')
text = remove_emoji(text).lower()
x.close()

Which was useful because I could test many things but now I think that I'm not going to modify this anymore so it's ready to be optimized, how could I make it faster? All the replaces replace with nothing except .replace('·', '\n')

CodePudding user response:

Not necessarily faster, but way easier to read would be something like this:

for char in "#<>$ %!&`*|?=/{}:\\@ ';."   '"':
    string = string.replace(char, '')

CodePudding user response:

You can achieve most of this with string maketrans and translate methods - they let you define a mapping from any single char to any given string

s = "asd123.?fgh"

translations = {"1":"", "2":"", "3":"", ".":"\n", "?": ""}
print(s.translate(s.maketrans(translations)))

It will do all the changes in a single pass through the string, making it much faster.

CodePudding user response:

Taken from this solution.
The re module (should already be installed in python) seems to work.

For example,

import re
string = "abccbdac"
re.sub('b|c', '', string) #ada

In this case, running re.sub('b|c', '', string) would return "ada". The pipeline is used as a separator between characters to replace.

  • Related