Home > Software engineering >  Pattern matching for finding siblings
Pattern matching for finding siblings

Time:12-08

Consider this scenario were I have some sequence number in form of list of String, where a dot represent a level of tree.

I want to find the count of siblings of every sequence number. I tried and below is my code


list = ["33.1", "33.5", "33.15", "33.1.1", "33.1.2", "33.1.11", "34.5", "33.2.9"] 

for item in list:
    numberOfSiblings = 0
    splitText = item.rsplit(".", 1)
    
    j = 0
    while( j < len(list)):
        if( list[j].startswith(splitText[0]) and len(item) == len(list[j]) and item != list[j] ):
            numberOfSiblings  =1
        j =1
    
    print(numberOfSiblings)

Here according to items in list..starting at zero index.. "33.1" will have sibling "33.5" and "33.15". (Note : "34.5" will not be counted as it does not start with "33") , so count will be 2. Similarly count of siblings for "33.5" and "33.15" will be 2 also.

Moving at index 3, element "33.1.1" will have siblings "33.1.2" and "33.1.11" only and so on for rest of the items accordingly.

Output should be like:

element => No.Of.Siblings

33.1 => 2  
33.5 => 2  
33.15 => 2  
33.1.1 => 2  
33.1.2 => 2  
33.1.11 => 2  
34.5 => 0  
33.2.9 => 0 

Here problem is with the length check in while block. Please help to implement this scenario..Thanks in advance

CodePudding user response:

Easiest way to do this is split each 2 strings into separate parts, and compare all parts piece by piece except the last one element (which can differ)

l = ["33.1", "33.5", "33.15", "33.1.1", "33.1.2", "33.1.11", "34.5", "33.2.9"] 

for item in l:
    numberOfSiblings = 0
    splitText = item.split(".")
    for otherItem in l:
        otherSplitText = otherItem.split(".")
        if item != otherItem and len(splitText)==len(otherSplitText) and all(a == b for a, b in zip(splitText[:-1], otherSplitText[:-1])):
            numberOfSiblings  = 1
    print(f"{item} {numberOfSiblings}")

EDIT: Added O(n) solution based on Counter

from collections import Counter

l = ["33.1", "33.5", "33.15", "33.1.1", "33.1.2", "33.1.11", "34.5", "33.2.9"] 
c = Counter()
for item in l:
    head, tail = item.rsplit(".", 1)
    c[head]  = 1

for item in l:
    head, tail = item.rsplit(".", 1)
    print(f"{item} {c[head] - 1}")

CodePudding user response:

Since you consider the sibling to end at the last dot we can remove the last number of each string and count how many of them appear in the same list after the removal of the last digit. Remove 1 because of the self count.

This approach uses O(n^2) complexity.

lst = ["33.1", "33.5", "33.15", "33.1.1", "33.1.2", "33.1.11", "34.5", "33.2.9"] 
d_s = ['.'.join(x.split('.')[:-1]) for x in lst]
d = {i:0 for i in lst}
for item in d:
    d[item] = d_s.count('.'.join(item.split('.')[:-1])) - 1

d
{'33.1': 2,
 '33.5': 2,
 '33.15': 2,
 '33.1.1': 2,
 '33.1.2': 2,
 '33.1.11': 2,
 '34.5': 0,
 '33.2.9': 0}
  • Related