How can I make my code faster? The code performs poorly for large input
Input: texts (list of lists)
texts = [['HAMLET', 'HAMLET', 'THE', 'THE', 'THE']]
Output: replace all strings in texts that appear less than 3 with R
modified_texts = ['R', 'R' ,'THE', 'THE', 'THE']
This is what I have so far:
flat_texts = [item for sublist in texts for item in sublist]
modified_texts = []
for x in set(flat_texts):
if flat_texts.count(x) < 3:
while x in flat_texts:
modified_texts = [item.replace(x,"R") for item in flat_texts]
return modified_texts
CodePudding user response:
You can use collections.Counter
, and ... if ... else ...
clause:
from collections import Counter
texts = [['HAMLET', 'HAMLET', 'THE', 'THE', 'THE']]
texts_flat = [x for sublist in texts for x in sublist]
counter = Counter(texts_flat)
output = [x if counter[x] >= 3 else 'R' for x in texts_flat]
print(output) # ['R', 'R', 'THE', 'THE', 'THE']
CodePudding user response:
You could use itertools.chain
to flatten the list and collections.Counter
to count the occurrences. Then a list comprehension to change the elements:
from itertools import chain
from collections import Counter
c = Counter(chain.from_iterable(texts))
texts2 = [['R' if c[e]<3 else e for e in l]
for l in texts]
Output: [['R', 'R', 'THE', 'THE', 'THE']]
flat output variant
from itertools import chain
from collections import Counter
c = Counter(chain.from_iterable(texts))
texts2 = ['R' if c[e]<3 else e for l in texts
for e in l]
Output: ['R', 'R', 'THE', 'THE', 'THE']