How to make list comprehension faster?-CodePudding

How can I make my code faster? The code performs poorly for large input

Input: texts (list of lists)

texts = [['HAMLET', 'HAMLET', 'THE', 'THE', 'THE']]

Output: replace all strings in texts that appear less than 3 with R

modified_texts = ['R', 'R' ,'THE', 'THE', 'THE']

This is what I have so far:

flat_texts = [item for sublist in texts for item in sublist]
modified_texts = []
for x in set(flat_texts):
 if flat_texts.count(x) < 3:
   while x in flat_texts:
     modified_texts = [item.replace(x,"R") for item in flat_texts]
return modified_texts

CodePudding user response：

You can use collections.Counter, and ... if ... else ... clause:

from collections import Counter

texts = [['HAMLET', 'HAMLET', 'THE', 'THE', 'THE']]
texts_flat = [x for sublist in texts for x in sublist]

counter = Counter(texts_flat)
output = [x if counter[x] >= 3 else 'R' for x in texts_flat]
print(output) # ['R', 'R', 'THE', 'THE', 'THE']

CodePudding user response：

You could use itertools.chain to flatten the list and collections.Counter to count the occurrences. Then a list comprehension to change the elements:

from itertools import chain
from collections import Counter

c = Counter(chain.from_iterable(texts))
texts2 = [['R' if c[e]<3 else e for e in l]
          for l in texts]

Output: [['R', 'R', 'THE', 'THE', 'THE']]

flat output variant

from itertools import chain
from collections import Counter

c = Counter(chain.from_iterable(texts))
texts2 = ['R' if c[e]<3 else e for l in texts
          for e in l]

Output: ['R', 'R', 'THE', 'THE', 'THE']