Home > Software engineering >  Loop, extract and save a list within list as a dataframe
Loop, extract and save a list within list as a dataframe

Time:11-23

pretty new to Python in general, but hopefully my question will make sense.

I have a list that contains lists with irregular lengths, and I am trying to cover the list into a data frame structure so I can save it as a CSV. I want to also make sure i have a way of combining the lists that are within their family lists by adding IDs so i can combine them later.

An overview: i ran easyocr on several images which extract texts from images, so i want the imageID so i can find the text and relate it back to the image it was nested within.

An example of data looks like this:

[[],
 [],
 [([[163, 165], [219, 165], [219, 185], [163, 185]],
   'HXRESLAGET',
   0.6614451489762804)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [([[185, 47], [257, 47], [257, 63], [185, 63]],
   'Ahabngue',
   0.0021960531212282365),
  ([[330.01941932430907, 26.803883864861817],
    [375.8479983040051, 14.47000105999682],
    [378.98058067569093, 29.196116135138183],
    [334.1520016959949, 40.52999894000318]],
   '',
   0.0),
  ([[281.0, 40.0],
    [331.87056283872016, 26.507942743332098],
    [335.0, 40.0],
    [284.12943716127984, 53.4920572566679]],
   'o',
   0.026024344610394934)],
 [],
 [],
 [([[71.60127388858555, 57.08292994374676],
    [116.99796908286363, 67.93629984577466],
    [113.39872611141445, 81.91707005625324],
    [68.00203091713637, 71.06370015422534]],
   'eiao]',
   0.04165130315364712)],
 []]

The structure that I am seeking as an end goal is this:

image number boundingbox letters confidence
1 [281.0, 40.0],[331.87056283872016, 26.507942743332098] Ahabngue 0.026024344610394934
2 [334.1520016959949, 40.52999894000318] eiao 0.6614451489762804
3 [330.01941932430907, 26.803883864861817] fgss 0.026024344610394934
3 [375.8479983040051, 14.47000105999682] sewqw 0.04165130315364712
3 [378.98058067569093, 29.196116135138183] o 0.08534330315364712
4 [375.8479983040051, 14.47000105999682] HXRESLAGET 0.5130315364712
5 [330.01941932430907, 26.803883864861817] dsd 0.15364712

I tried:

results_bound = []
for i in results_:
    for j in i:
      results_bound.append(results_[i][j][0])

and received, which i know doesn't work, clearly, as i was just trying to do one list at a time (the bounding box column), but my main goal was to combine them all together:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_18959/2194324409.py in <module>
      2 for i in results_:
      3     for j in range(1,20):
----> 4       results_bound.append(results_[i][j][0])

TypeError: list indices must be integers or slices, not list

hopefully the question is ok, let me know if you need anymore details or information.

CodePudding user response:

First you could use print() to see what you have in variables - it helps to see what is needed in code.

Using for-loops (without range()) you get lists, not indexes, so results_[i][j][0] is wrong.

If you have empty list then you should skip them - ie. if i: .... to run code only when i is not empty.

data = [[],
 [],
 [([[163, 165], [219, 165], [219, 185], [163, 185]],
   'HXRESLAGET',
   0.6614451489762804)],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [([[185, 47], [257, 47], [257, 63], [185, 63]],
   'Ahabngue',
   0.0021960531212282365),
  ([[330.01941932430907, 26.803883864861817],
    [375.8479983040051, 14.47000105999682],
    [378.98058067569093, 29.196116135138183],
    [334.1520016959949, 40.52999894000318]],
   '',
   0.0),
  ([[281.0, 40.0],
    [331.87056283872016, 26.507942743332098],
    [335.0, 40.0],
    [284.12943716127984, 53.4920572566679]],
   'o',
   0.026024344610394934)],
 [],
 [],
 [([[71.60127388858555, 57.08292994374676],
    [116.99796908286363, 67.93629984577466],
    [113.39872611141445, 81.91707005625324],
    [68.00203091713637, 71.06370015422534]],
   'eiao]',
   0.04165130315364712)],
 []]

# ---

number = 0

output_data = []

for row in data:
    #print(row)

    if row: # skip empty list

        for item in row:
            #print(item)

            number  = 1
            boundingbox, letters, confidence = item

            print('image number:', number)
            print('boundingbox:', boundingbox)
            print('letters:', letters)
            print('confidence:', confidence)
            print('----')

            output_data.append( [number, boundingbox, letters, confidence] )
            
# ---

import pandas as pd

df = pd.DataFrame(output_data, columns=['number', 'boundingbox', 'letters', 'confidence'])

print(df.to_string())

Result

image number: 1
boundingbox: [[163, 165], [219, 165], [219, 185], [163, 185]]
letters: HXRESLAGET
confidence: 0.6614451489762804
----
image number: 2
boundingbox: [[185, 47], [257, 47], [257, 63], [185, 63]]
letters: Ahabngue
confidence: 0.0021960531212282365
----
image number: 3
boundingbox: [[330.01941932430907, 26.803883864861817], [375.8479983040051, 14.47000105999682], [378.98058067569093, 29.196116135138183], [334.1520016959949, 40.52999894000318]]
letters: 
confidence: 0.0
----
image number: 4
boundingbox: [[281.0, 40.0], [331.87056283872016, 26.507942743332098], [335.0, 40.0], [284.12943716127984, 53.4920572566679]]
letters: o
confidence: 0.026024344610394934
----
image number: 5
boundingbox: [[71.60127388858555, 57.08292994374676], [116.99796908286363, 67.93629984577466], [113.39872611141445, 81.91707005625324], [68.00203091713637, 71.06370015422534]]
letters: eiao]
confidence: 0.04165130315364712
----



   number                                                                                                                                                           boundingbox     letters  confidence
0       1                                                                                                                      [[163, 165], [219, 165], [219, 185], [163, 185]]  HXRESLAGET    0.661445
1       2                                                                                                                          [[185, 47], [257, 47], [257, 63], [185, 63]]    Ahabngue    0.002196
2       3  [[330.01941932430907, 26.803883864861817], [375.8479983040051, 14.47000105999682], [378.98058067569093, 29.196116135138183], [334.1520016959949, 40.52999894000318]]                0.000000
3       4                                                      [[281.0, 40.0], [331.87056283872016, 26.507942743332098], [335.0, 40.0], [284.12943716127984, 53.4920572566679]]           o    0.026024
4       5    [[71.60127388858555, 57.08292994374676], [116.99796908286363, 67.93629984577466], [113.39872611141445, 81.91707005625324], [68.00203091713637, 71.06370015422534]]       eiao]    0.041651
  • Related