pretty new to Python in general, but hopefully my question will make sense.
I have a list that contains lists with irregular lengths, and I am trying to cover the list into a data frame structure so I can save it as a CSV. I want to also make sure i have a way of combining the lists that are within their family lists by adding IDs so i can combine them later.
An overview: i ran easyocr on several images which extract texts from images, so i want the imageID so i can find the text and relate it back to the image it was nested within.
An example of data looks like this:
[[],
[],
[([[163, 165], [219, 165], [219, 185], [163, 185]],
'HXRESLAGET',
0.6614451489762804)],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[([[185, 47], [257, 47], [257, 63], [185, 63]],
'Ahabngue',
0.0021960531212282365),
([[330.01941932430907, 26.803883864861817],
[375.8479983040051, 14.47000105999682],
[378.98058067569093, 29.196116135138183],
[334.1520016959949, 40.52999894000318]],
'',
0.0),
([[281.0, 40.0],
[331.87056283872016, 26.507942743332098],
[335.0, 40.0],
[284.12943716127984, 53.4920572566679]],
'o',
0.026024344610394934)],
[],
[],
[([[71.60127388858555, 57.08292994374676],
[116.99796908286363, 67.93629984577466],
[113.39872611141445, 81.91707005625324],
[68.00203091713637, 71.06370015422534]],
'eiao]',
0.04165130315364712)],
[]]
The structure that I am seeking as an end goal is this:
image number | boundingbox | letters | confidence |
---|---|---|---|
1 | [281.0, 40.0],[331.87056283872016, 26.507942743332098] | Ahabngue | 0.026024344610394934 |
2 | [334.1520016959949, 40.52999894000318] | eiao | 0.6614451489762804 |
3 | [330.01941932430907, 26.803883864861817] | fgss | 0.026024344610394934 |
3 | [375.8479983040051, 14.47000105999682] | sewqw | 0.04165130315364712 |
3 | [378.98058067569093, 29.196116135138183] | o | 0.08534330315364712 |
4 | [375.8479983040051, 14.47000105999682] | HXRESLAGET | 0.5130315364712 |
5 | [330.01941932430907, 26.803883864861817] | dsd | 0.15364712 |
I tried:
results_bound = []
for i in results_:
for j in i:
results_bound.append(results_[i][j][0])
and received, which i know doesn't work, clearly, as i was just trying to do one list at a time (the bounding box column), but my main goal was to combine them all together:
TypeError Traceback (most recent call last)
/tmp/ipykernel_18959/2194324409.py in <module>
2 for i in results_:
3 for j in range(1,20):
----> 4 results_bound.append(results_[i][j][0])
TypeError: list indices must be integers or slices, not list
hopefully the question is ok, let me know if you need anymore details or information.
CodePudding user response:
First you could use print()
to see what you have in variables - it helps to see what is needed in code.
Using for
-loops (without range()
) you get lists, not indexes, so results_[i][j][0]
is wrong.
If you have empty list then you should skip them - ie. if i: ....
to run code only when i
is not empty.
data = [[],
[],
[([[163, 165], [219, 165], [219, 185], [163, 185]],
'HXRESLAGET',
0.6614451489762804)],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[([[185, 47], [257, 47], [257, 63], [185, 63]],
'Ahabngue',
0.0021960531212282365),
([[330.01941932430907, 26.803883864861817],
[375.8479983040051, 14.47000105999682],
[378.98058067569093, 29.196116135138183],
[334.1520016959949, 40.52999894000318]],
'',
0.0),
([[281.0, 40.0],
[331.87056283872016, 26.507942743332098],
[335.0, 40.0],
[284.12943716127984, 53.4920572566679]],
'o',
0.026024344610394934)],
[],
[],
[([[71.60127388858555, 57.08292994374676],
[116.99796908286363, 67.93629984577466],
[113.39872611141445, 81.91707005625324],
[68.00203091713637, 71.06370015422534]],
'eiao]',
0.04165130315364712)],
[]]
# ---
number = 0
output_data = []
for row in data:
#print(row)
if row: # skip empty list
for item in row:
#print(item)
number = 1
boundingbox, letters, confidence = item
print('image number:', number)
print('boundingbox:', boundingbox)
print('letters:', letters)
print('confidence:', confidence)
print('----')
output_data.append( [number, boundingbox, letters, confidence] )
# ---
import pandas as pd
df = pd.DataFrame(output_data, columns=['number', 'boundingbox', 'letters', 'confidence'])
print(df.to_string())
Result
image number: 1
boundingbox: [[163, 165], [219, 165], [219, 185], [163, 185]]
letters: HXRESLAGET
confidence: 0.6614451489762804
----
image number: 2
boundingbox: [[185, 47], [257, 47], [257, 63], [185, 63]]
letters: Ahabngue
confidence: 0.0021960531212282365
----
image number: 3
boundingbox: [[330.01941932430907, 26.803883864861817], [375.8479983040051, 14.47000105999682], [378.98058067569093, 29.196116135138183], [334.1520016959949, 40.52999894000318]]
letters:
confidence: 0.0
----
image number: 4
boundingbox: [[281.0, 40.0], [331.87056283872016, 26.507942743332098], [335.0, 40.0], [284.12943716127984, 53.4920572566679]]
letters: o
confidence: 0.026024344610394934
----
image number: 5
boundingbox: [[71.60127388858555, 57.08292994374676], [116.99796908286363, 67.93629984577466], [113.39872611141445, 81.91707005625324], [68.00203091713637, 71.06370015422534]]
letters: eiao]
confidence: 0.04165130315364712
----
number boundingbox letters confidence
0 1 [[163, 165], [219, 165], [219, 185], [163, 185]] HXRESLAGET 0.661445
1 2 [[185, 47], [257, 47], [257, 63], [185, 63]] Ahabngue 0.002196
2 3 [[330.01941932430907, 26.803883864861817], [375.8479983040051, 14.47000105999682], [378.98058067569093, 29.196116135138183], [334.1520016959949, 40.52999894000318]] 0.000000
3 4 [[281.0, 40.0], [331.87056283872016, 26.507942743332098], [335.0, 40.0], [284.12943716127984, 53.4920572566679]] o 0.026024
4 5 [[71.60127388858555, 57.08292994374676], [116.99796908286363, 67.93629984577466], [113.39872611141445, 81.91707005625324], [68.00203091713637, 71.06370015422534]] eiao] 0.041651