Home > Enterprise >  String comparison between elements in list
String comparison between elements in list

Time:12-02

I am trying to compare elements between two lists. One list is predefined which is a pattern with which new lists has to be compared. The comparison should be done between elements of same index between lists. Example: list1[0] has to be only compared with list2[0], list1[1] has to be only compared with list2[1] etc. The output should only return as True if all the elements match. The issue I am facing is, one element in predefined pattern has a part which will be dynamic, when comparing I have to ignore. How can I achieve this

pattern = ['Hi', 'my' , 'name is <xxxxxxxxxxx> age <yy>']

This is defined pattern. Here the contents inside <> is dynamic and has to be ignored.

when comparing list2 = ['Hi', 'my' , 'name is soku age 21'] should be true.
list3 = ['Hi', 'my', 'soku'] should be false

How can I achieve this because normal element to element string comparison wont work.

Another example

pattern = ['A', 'B', 'C_<xxxx>_AB']
list1 = ['A', 'B', 'C_aaaa112=22_AB']

This should be true

CodePudding user response:

One approach is to use all and re.fullmatch:

import re

pattern = ['Hi', 'my', 'name is .  age \d{2}']
list2 = ['Hi', 'my', 'name is soku age 21']
list3 = ['Hi', 'my', 'soku']

print(all(re.fullmatch(p, l) for p, l in zip(pattern, list2)))
print(all(re.fullmatch(p, l) for p, l in zip(pattern, list3)))

Output

True
False

As an alternative you could use the following pattern:

pattern = ['Hi', 'my', 'name is \S  age \d{2}']

to avoid matching whitespaces characters.

The pattern:

. 

matches any character including whitespace, while

\S 

matches any character which is not a whitespace character. Moreover the pattern:

\d{2}

will match two contiguous digits.

To build the pattern dynamically from user input, you could do something like below:

pattern = ['Hi', 'my', 'name is <xxxxxxxxxxx> age <yy>']
regex_pattern = [re.sub(r"<. ?>", r". ", s) for s in pattern]
print(all(re.fullmatch(p, l) for p, l in zip(regex_pattern, list2)))
print(all(re.fullmatch(p, l) for p, l in zip(regex_pattern, list3)))

Output

True
False

CodePudding user response:

If you want to do that in pure python, without libraries, you can add to a variable that stores the year and the variable, then you can use replace to replace the name and the year in the last string of the lists. You should now have multiple lists of strings where the variable element has been replaced with a character that you want so we can assume that the variable are considered equal. Then we check normally if they are the same string

  • Related