I have two lists like below and want to match these two lists and get the value of them:
characteristic = [
['length', 'width', 'height', 'Thread length', 'space'],
['fname', 'lname','length','space']
]
value = [
['length 34','width ab23','Thread length 8ah ajf','space','height 0av'],
['fname avd', 'lname ash','space fat','length ere']
]
The output I want like this.
Note* If someone solve the problem I will really thankful to him.
Characteristic | Value |
---|---|
length | 34 sd |
Width | ab23 |
height | 0av |
Thread length | 8ah ajf |
space | none |
fname | avd |
lname | ash |
space | fat |
length | ere |
I am trying to solve the problem using for loop but this finds length two times in value
.
temp_str = {}
for x in characteristic:
for z in value:
if x in z:
temp_str = z.replace(x,'')
temp_str = ','
#print(x)
print(temp_str)
CodePudding user response:
Another solution, without re
:
characteristic = [
["length", "width", "height", "Thread length", "space"],
["fname", "lname"],
]
value = [
["length 34", "width ab23", "Thread length 8ah ajf", "space", "height 0av"],
["fname avd", "lname ash"],
]
ch = [c for l in characteristic for c in l]
vals = [v for l in value for v in l]
out = []
for c in ch:
for v in vals:
if v.startswith(c) and v[len(c) :].strip() != "":
out.append((c, v[len(c) :].strip()))
break
else:
out.append((c, None))
print(out)
Prints:
[
("length", "34"),
("width", "ab23"),
("height", "0av"),
("Thread length", "8ah ajf"),
("space", None),
("fname", "avd"),
("lname", "ash"),
]
Output as a dataframe:
df = pd.DataFrame(out, columns=["Characteristic", "Value"])
print(df)
Prints:
Characteristic Value
0 length 34
1 width ab23
2 height 0av
3 Thread length 8ah ajf
4 space None
5 fname avd
6 lname ash
CodePudding user response:
You could try the following:
import re
characteristic = [
['length', 'width', 'height', 'Thread length', 'space'],
['fname', 'lname', 'length', 'space']
]
value = [
['length 34', 'width ab23', 'Thread length 8ah ajf', 'space', 'height 0av'],
['fname avd', 'lname ash', 'space fat', 'length ere']
]
result = []
for char, val in zip(characteristic, value):
char = sorted(char, key=len, reverse=True)
pattern = "(" "|".join(char) r")\s*(. )?"
pattern = re.compile(pattern)
result.extend(pattern.search(string).groups() for string in val)
Regex-pattern for the 1. sublist of characteristic
:
(Thread length|length|height|width|space)\s*(. )?
(length|width|height|Thread length|space)
: 1. capture group withor
pattern inside\s*
: As much whitespace as possible.(. )?
: 2. capture group, optional?
, with anything until the end of the string in it.
, but a least one element
Result:
[('length', '34'),
('width', 'ab23'),
('Thread length', '8ah ajf'),
('space', None),
('height', '0av'),
('fname', 'avd'),
('lname', 'ash'),
('space', 'fat'),
('length', 'ere')]