Write column header to csv from dataframe if it doesn't exist pandas-CodePudding

I'm fetching data from an API and trying to store it in CSV. The API returns an array of objects like this:

[
 {name:'Alice',age:22, address:'123 Downtown'},
 {name:'Bob',age:25, contact:'1111'}
]

To parse and store, this is what I'm currently doing:

    while not allFetched:
        response = callApi()
        x = []
        for item in response['data']:
            df = pd.DataFrame.from_dict([item])
            x.append(df)
        new = pd.concat(x, ignore_index=True)
        new.to_csv("Data.csv", mode='a', index=False, header=fileExists)

This adds the header once (if not fileExists) but messes up the column because the headers mismatch. I want to create the column header if it doesn't exist and have Nan in missing values as follows:

name,age,address,contact
Alice,22,123 Downtown,Nan
Bob,25,Nan,1111

CodePudding user response：

You can try this:

import pandas as pd


data = [
    {'name':'Alice','age':22, 'address':'123 Downtown'},
    {'name':'Bob','age':25, 'contact':'1111'}
]

df = pd.json_normalize(data)
df = df.fillna('Nan')

print(df)

df.to_csv("Data.csv", mode='a', index=False)

Terminal Output:

    name  age       address contact
0  Alice   22  123 Downtown     Nan
1    Bob   25           Nan    1111

CSV Output:

name,age,address,contact
Alice,22,123 Downtown,Nan
Bob,25,Nan,1111

CodePudding user response：

What you are getting is a json response. You can parse json with such structure using pd.json_normalize() method:

response = callApi()
dta = pd.json_normalize(response['data'])

Result:

|    | name   |   age | address      |   contact |
|---:|:-------|------:|:-------------|----------:|
|  0 | Alice  |    22 | 123 Downtown |       nan |
|  1 | Bob    |    25 | nan          |      1111 |

Then you can save the data frame to csv:

dta.to_csv("Data.csv", mode='a', index=False)