Extract all key values with python panda from a big json file-CodePudding

Good day, everyone!

I'm reading and processing a very huge json file with the Python Panda module; here's my code:

import pandas as pd
file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
for c in chunks:
    print(c)

This prints all values and keys, however, I only want the list of keys that are present in my data.

i.e. given

{name: john, surname: white, country: USA}
{name: alex, country: UK}
{surname: red, e: [email protected], country: France}
{name: tracy, surname: blue, country: UK}

my code should return:

[name, surname, e, country]

Thank you for your help

CodePudding user response：

You can use set

import pandas as pd

file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
setOfKeys = set()

for c in chunks:
    setOfKeys |= set(c.keys())

print(list(setOfKeys))

CodePudding user response：

Ishan Shishodiya comment direct me in the right direction by mentioning the dataframe. Here the code update if someone needs it:

import pandas as pd
file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
listOfKeys = []
for c in chunks:
    for key in c.keys():
        if key not in listOfKeys:
            listOfKeys.append(key)

print(listOfKeys)