Home > Software design >  Filtering values from generator object
Filtering values from generator object

Time:06-10

I have this generator type data.

type(head)
----------
generator

Its values looks like this

for x in head:
    print(x)

  Record {
  field1         '2022060611121280041700000070046713963'
  field2         '2022-06-06 01:11:29'
  field3         'NIL'
  }

I'm thinking if it's possible to convert this to data frame? I could probably create a script that would loop the content of Record but I'm hoping there's a much cleaner way.

CodePudding user response:

As long as the generated contents fit into memory, then pandas can consume it:

from pandas import DataFrame

# head is a generator
df = DataFrame([x for x in head])

If the contents of the generator are too large, then you can iterate over chunks of data (using toolz) and store each chunk, e.g. to csv:

from pandas import DataFrame
from toolz import partition_all

n_elements = 100

for n, x in enumerate(partition_all(n_elements, head)):
    df = DataFrame(x)
    if n==0:
        df.to_csv('test.csv', index=False, mode='w')
    else:
        df.to_csv('test.csv', index=False, mode='a', header=False)
  • Related