Good day, everyone!
I'm reading and processing a very huge json file with the Python Panda module; here's my code:
import pandas as pd
file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
for c in chunks:
print(c)
This prints all values and keys, however, I only want the list of keys that are present in my data.
i.e. given
{name: john, surname: white, country: USA}
{name: alex, country: UK}
{surname: red, e: [email protected], country: France}
{name: tracy, surname: blue, country: UK}
my code should return:
[name, surname, e, country]
Thank you for your help
CodePudding user response:
You can use set
import pandas as pd
file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
setOfKeys = set()
for c in chunks:
setOfKeys |= set(c.keys())
print(list(setOfKeys))
CodePudding user response:
Ishan Shishodiya comment direct me in the right direction by mentioning the dataframe. Here the code update if someone needs it:
import pandas as pd
file='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(file, lines=True, chunksize = 100)
listOfKeys = []
for c in chunks:
for key in c.keys():
if key not in listOfKeys:
listOfKeys.append(key)
print(listOfKeys)