Home > OS >  How could I make this sub-string finder more efficient and cleaner?
How could I make this sub-string finder more efficient and cleaner?

Time:09-26

This was the first solution that came into my mind, but I can't really imagine another one.

strings = [input("String: ") for i in range(int(input("How many strings?")))]
num = 0

for i, subA in enumerate(strings):
    for j,subB in enumerate(strings):
        if (i != j) and (subB in subA):
            num  = 1
print("There are", num, "substrings")

CodePudding user response:

strings = [input("String: ") for i in range(int(input("How many strings?")))]
num = 0

for i, subA in enumerate(strings[:-1]):
    for subB in strings[i 1:]:
        if subB in subA or subA in subB:
            num  = 1
print("There are", num, "substrings")

Now, I'm not exactly sure that I understand what you're trying to do, but this ensures that each string is compared to all others only once. It's untested so don't take my word for it though.

I'm assuming that, given an array of strings, you want to find how many of these strings are contain the any other string as substring. Note that if this is the case there is a the question of what to do with identical strings in the list.

CodePudding user response:

In your implementation you're reinventing itertools.permutations() which returns permutations of elements with given length. You can compare lists generated with nested for loop (from your code sample) and permutations():

from itertools import permutations

strings = ["a", "aa", "b"]
res = []
for i, subA in enumerate(strings):
    for j, subB in enumerate(strings):
        if i != j:
            res.append((subA, subB))
print("Nested loop:", res)
res = list(permutations(strings, 2))
print("permutations():", res)

You need to check whether each of element is a substring of another element, so you can iterate over pairs returned from permutations() and test does first element contain second (or vise versa). Let's do it with simple list comprehension:

from itertools import permutations

strings = ["a", "aa", "b"]
res = [a in b for a, b in permutations(strings, 2)]
# will return [True, False, False, False, False, False]

In python True is 1 and False is 0 (docs). So to count how many strings are substrings we can pass a generator expression into a sum().

from itertools import permutations

strings = ["a", "aa", "b"]
num = sum(a in b for a, b in permutations(strings, 2))
# will return 1

You can also use itertools.starmap() to call operator.contains() (same as a in b) with every pair returned by permutations().

from operator import contains
from itertools import permutations, starmap

strings = ["a", "aa", "b"]
num = sum(starmap(contains, permutations(strings, 2)))

Here is a bit improved version of your code:

from operator import contains
from itertools import permutations, starmap

count = input("How many strings? ")
if count.isdecimal() and (count := int(count)):
    strings = []
    while count:
        item = input(f"String ({count} left): ")
        if item:  # skip empty strings
            strings.append(item)
            count -= 1
    num = sum(starmap(contains, permutations(strings, 2)))
    print("There", "are" if num > 1 else "is", num or "no",
          "substring"   "s" * (num != 1))
else:
    print(f'"{count}" is not a valid positive number')

P.S. Some notes on performance.

Because of the method how sum() process iterable you can make some patches on code with generator expression to work faster.

sum([1 for a, b in permutations(strings, 2) if a in b])

will be slightly faster than

sum(a in b for a, b in permutations(strings, 2))

Why? Take a look on next questions:

  1. Why is summing list comprehension faster than generator expression?;
  2. Why is any (True for ... if cond) much faster than any (cond for ...)?.
  • Related