I have data structured like so:
data=
[
(120,150,150,160,"word1"),
(152,150,170,160,"word2"),
(172,150,200,160,"word3"),
(202,290,240,300,"word4"),
(300,150,350,160,"word5"),
(202,200,240,210,"word6"),
(242,200,260,210,"word7")
]
I want to return any words in data where the difference between the 3rd number of the current list and the first number of the next item is less than 5 AND the difference between the 4th number of the current list item and the 4th number is less than 2 in an array. I then want to append all those arrays to a master list.
So this would be the result of the function applied to data:
final=
[[
(120,150,150,160,"word1"),
(152,150,170,160,"word2"),
(172,150,200,160,"word3")
],
[
(202,200,240,210,"word6"),
(242,200,260,210,"word7")
]]
word4 is not included because data[2][3]-data[3][3]>2
word5 is not included because data[3][2]-data[4][0]>2
My current attempt handles 90% of the words correctly but combines words that don't fulfill the requirements on occasion:
temp=[]
final=[]
for i,j in enumerate(data[:-1]):
if(j[2]-data[i 1][0]<5) and (j[3]-data[i 1][3]<2):
if len(temp)<1:
temp.append(j[0:4])
temp.append(data[i 1][0:4])
else:
final.append(temp)
temp=[]
if temp:
final.append(temp)
EDIT: Here is a real world example of the above algorithm failing:
data=
[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'),
(250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]
Expected output:
final=
[[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary')
]]
Actual output:
final=
[[
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'),
(250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]]
CodePudding user response:
You need to compare the numbers and adding abs
since the diff might be negative. I also prettified your code a bit:
data = [
(38.0, 296.7943420410156, 90.86400604248047, 310.7943420410156, 'Contract'),
(94.7560043334961, 296.7943420410156, 154.6480102, 310.7943420, 'Summary'),
(250.64453125, 317.38818359375, 266.743530, 325.88818359375, 'This')
]
temp = []
final = []
for index, item in enumerate(data[:-1]):
if abs(item[2] - data[index 1][0]) < 5 and abs(item[3] - data[index 1][3]) < 2:
if len(temp) < 1:
temp.append(item[0:4])
temp.append(data[index 1][0:4])
else:
final.append(temp)
temp = []
if temp:
final.append(temp)
print(final)