Home > Back-end >  How can I filter only strings in a list that only contain certain characters in Python?
How can I filter only strings in a list that only contain certain characters in Python?

Time:11-07

For this project I am working on, I have a list of strings of equal length (so the length may vary), and I only want the strings whose substrings can be found in a string that I have specified.

Will elaborate further, but the following is the expected result.

# list of strings
["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]

# a subset of this list is not included in a new list because they do not contain a/b/c/d
# -> this list of strings
["aa", "ab", "ac", "bd", "db", "cb"]
# in other words, "ct", "ra", "pq" are excluded

However, when I print the new list of valid strings as shown below, every string except for "pq" is included. It seems to yield a similar result as using the any() function, which would be great if only it does what I want it to.

list_of_strings = ["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]

def isValidString(seq):
    for substring in seq:
        if substring in "abcd":
            return True

valid_strings = []

for sequence in list_of_strings:
    if isValidString(sequence):
        valid_strings.append(sequence)

print(valid_strings)
# Output : ['aa', 'ct', 'ab', 'ac', 'bd', 'ra', 'db', 'cb']

CodePudding user response:

Everything looks very good, but we need only stop checking when we find characters we wish to exclude. Otherwise, we must continue to check the sequence.

list_of_strings = ["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]

def isValidString(seq):
    for substring in seq:
        if substring not in "abcd":
            return False
    return True
    

valid_strings = []

for sequence in list_of_strings:
    if isValidString(sequence):
        valid_strings.append(sequence)
    print("\n")

print(valid_strings)

CodePudding user response:

If you asking this type of output then you can use this code output

['aa', 'ab', 'ac', 'bd', 'db', 'cb']

code

list_of_strings = ["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]

def isValidString(seq):
    for substring in ["a","b","c","d"]:
      for substring2 in ["a","b","c","d"]:
        if substring substring2 in seq:
            return True

valid_strings = []

for sequence in list_of_strings:
    if isValidString(sequence):
        valid_strings.append(sequence)

print(valid_strings)

CodePudding user response:

using yield keyword.

def string_match(list_of_strings):
  for data in list_of_strings:
    for char_data in data:
        if char_data in "abcd":
         yield data
            break
for data in string_match(list_of_strings):
   print(data)

CodePudding user response:

Counting the number of matches ?

list_of_strings = ["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]

def isValidString(seq):
    nb_matches = 0
    
    for substring in seq:
        if substring in "abcd":
            nb_matches  = 1
    if nb_matches == len(seq):
        return True

valid_strings = []

for sequence in list_of_strings:
    if isValidString(sequence):
        valid_strings.append(sequence)

print(valid_strings)

CodePudding user response:

data = ["aa", "ct", "ab", "ac", "bd", "ra", "db", "pq", "cb"]
searchstring =  ["aa", "ab", "ac", "bd", "db", "cb"]

newdata = [s for s in data if s in searchstring]
print(newdata)

returns

['aa', 'ab', 'ac', 'bd', 'db', 'cb']
  • Related