Home > Net >  Python for loop matching list items at different indexes
Python for loop matching list items at different indexes

Time:05-28

I have data structured like so:

data=
[
(120,150,150,160,"word1"),
(152,150,170,160,"word2"),
(172,150,200,160,"word3"),
(202,290,240,300,"word4"),
(300,150,350,160,"word5"),
(202,200,240,210,"word6"),
(242,200,260,210,"word7")
]

I want to return any words in data where the difference between the 3rd number of the current list and the first number of the next item is less than 5 AND the difference between the 4th number of the current list item and the 4th number is less than 2 in an array. I then want to append all those arrays to a master list.

So this would be the result of the function applied to data:

final=
[[
(120,150,150,160,"word1"),
(152,150,170,160,"word2"),
(172,150,200,160,"word3")
],
[
(202,200,240,210,"word6"),
(242,200,260,210,"word7")
]]

word4 is not included because data[2][3]-data[3][3]>2

word5 is not included because data[3][2]-data[4][0]>2

My current attempt handles 90% of the words correctly but combines words that don't fulfill the requirements on occasion:

temp=[]
final=[]
for i,j in enumerate(data[:-1]):
   if(j[2]-data[i 1][0]<5) and (j[3]-data[i 1][3]<2):
       if len(temp)<1:
            temp.append(j[0:4])
       temp.append(data[i 1][0:4])
   else:
       final.append(temp)
       temp=[]
if temp:
   final.append(temp)

EDIT: Here is a real world example of the above algorithm failing:

data=
[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'), 
(250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]

Expected output:

final=
[[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary')
]]

Actual output:

final=
[[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'),
(250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]]

CodePudding user response:

You need to compare the numbers and adding abs since the diff might be negative. I also prettified your code a bit:

data = [
    (38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),
    (94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'),
    (250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]

temp = []
final = []
for index, item in enumerate(data[:-1]):
    if abs(item[2] - data[index   1][0]) < 5 and abs(item[3] - data[index   1][3]) < 2:
        if len(temp) < 1:
            temp.append(item[0:4])
        temp.append(data[index   1][0:4])
    else:
        final.append(temp)
        temp = []
if temp:
    final.append(temp)

print(final)
  • Related