Home > Software engineering >  exhaustive search over a list of complex strings without modifying original input
exhaustive search over a list of complex strings without modifying original input

Time:01-17

I am attempting to create a minimal algorithm to exhaustively search for duplicates over a list of strings and remove duplicates using an index to avoid changing cases of words and their meanings.

The caveat is the list has such words Blood, blood, DNA, ACTN4, 34-methyl-O-carboxy, Brain, brain-facing-mouse, BLOOD and so on.

I only want to remove the duplicate 'blood' word, keep the first occurrence with the first letter capitalized, and not modify cases of any other words. Any suggestions on how should I proceed?

Here is my code

def remove_duplicates(list_of_strings):
""" function that takes input of a list of strings, 
uses index to iterate over each string lowers each string 
and returns a list of strings with no duplicates, does not modify the original strings
an exhaustive search to remove duplicates using index of list and list of string"""

list_of_strings_copy = list_of_strings
try:
    for i in range(len(list_of_strings)):
        list_of_strings_copy[i] = list_of_strings_copy[i].lower()
        word = list_of_strings_copy[i]
        for j in range(len(list_of_strings_copy)):
            if word == list_of_strings_copy[j]:
                list_of_strings.pop(i)
                j =1
except Exception as e:
    print(e)
return list_of_strings

CodePudding user response:

Make a dictionary, {text.lower():text,...}.

d={}
for item in list_of_strings:
    if item.lower() not In d:
        d[item.lower()] = item

d.values() should be what you want.

CodePudding user response:

I think something like the following would do what you need:

def remove_duplicates(list_of_strings):
    new_list = [] # create empty return list
    for string in list_of_strings: # iterate through list of strings
        string = string[0].capitalize()   string[1:].lower() # ensure first letter is capitalized and rest are low case
        if string not in new_list: # check string is not duplicate in retuned list
            new_list.append(string) # if string not in list append to returned list
    return new_list # return end list
    
strings = ["Blood", "blood", "DNA", "ACTN4", "34-methyl-O-carboxy", "Brain", "brain-facing-mouse", "BLOOD"]
returned_strings = remove_duplicates(strings)
print(returned_strings)

(For reference this was written in Python 3.10)

  • Related