Home > Back-end >  Word Matching from one list to other using regex python
Word Matching from one list to other using regex python

Time:04-26

I have two lists like below and want to match these two lists and get the value of them:

characteristic = [
    ['length', 'width', 'height', 'Thread length', 'space'],
    ['fname', 'lname','length','space']
]
value = [
    ['length 34','width ab23','Thread length 8ah ajf','space','height 0av'],
    ['fname avd', 'lname ash','space fat','length ere']
]

The output I want like this.

Note* If someone solve the problem I will really thankful to him.

Characteristic Value
length 34 sd
Width ab23
height 0av
Thread length 8ah ajf
space none
fname avd
lname ash
space fat
length ere

I am trying to solve the problem using for loop but this finds length two times in value.

temp_str = {}
for x in characteristic:
  for z in value:  
      if x in z:
        temp_str = z.replace(x,'')
        temp_str  = ','
        #print(x)
        print(temp_str)

CodePudding user response:

Another solution, without re:

characteristic = [
    ["length", "width", "height", "Thread length", "space"],
    ["fname", "lname"],
]
value = [
    ["length 34", "width ab23", "Thread length 8ah ajf", "space", "height 0av"],
    ["fname avd", "lname ash"],
]

ch = [c for l in characteristic for c in l]
vals = [v for l in value for v in l]

out = []
for c in ch:
    for v in vals:
        if v.startswith(c) and v[len(c) :].strip() != "":
            out.append((c, v[len(c) :].strip()))
            break
    else:
        out.append((c, None))
print(out)

Prints:

[
    ("length", "34"),
    ("width", "ab23"),
    ("height", "0av"),
    ("Thread length", "8ah ajf"),
    ("space", None),
    ("fname", "avd"),
    ("lname", "ash"),
]

Output as a dataframe:

df = pd.DataFrame(out, columns=["Characteristic", "Value"])
print(df)

Prints:

  Characteristic    Value
0         length       34
1          width     ab23
2         height      0av
3  Thread length  8ah ajf
4          space     None
5          fname      avd
6          lname      ash

CodePudding user response:

You could try the following:

import re

characteristic = [
    ['length', 'width', 'height', 'Thread length', 'space'],
    ['fname', 'lname', 'length', 'space']
]
value = [
    ['length 34', 'width ab23', 'Thread length 8ah ajf', 'space', 'height 0av'],
    ['fname avd', 'lname ash', 'space fat', 'length ere']
]

result = []
for char, val in zip(characteristic, value):
    char = sorted(char, key=len, reverse=True)
    pattern = "("   "|".join(char)   r")\s*(. )?"
    pattern = re.compile(pattern)
    result.extend(pattern.search(string).groups() for string in val)

Regex-pattern for the 1. sublist of characteristic:

(Thread length|length|height|width|space)\s*(. )?
  • (length|width|height|Thread length|space): 1. capture group with or pattern inside
  • \s*: As much whitespace as possible.
  • (. )?: 2. capture group, optional ?, with anything until the end of the string in it . , but a least one element .

Result:

[('length', '34'),
 ('width', 'ab23'),
 ('Thread length', '8ah ajf'),
 ('space', None),
 ('height', '0av'),
 ('fname', 'avd'),
 ('lname', 'ash'),
 ('space', 'fat'),
 ('length', 'ere')]
  • Related