Convert DataFrame to JSON in python-CodePudding

I have a dataframe that I would like to convert to json format by selecting the columns. And since I have a lot of lines, I can't do everything by hand

I have a dataframe that looks this :

Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
    'Price': [22000,25000,27000,35000, 29000],
    'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}

df = pd.DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])


            Brand  Price Liscence Plate
0  Honda Civic     22000  ABC 123
1  Toyota Corolla  25000  XYZ 789
2  Ford Focus      27000  CBA 321
3  Audi A4         35000  ZYX 987
4  NaN             29000  DEF 456
5  Ford            27000  DEF 466
6  Audi A1         35000  ABC 123

And I have to convert to this :

data = {"form": [
             {"Liscence Plate": "ABC 123",
              "Brand": ["Honda Civic", "Audi A1"
],
              "Price": ["22000", "35000"]},
{"Liscence Plate": "XYZ 789",
              "Brand": ["Toyota Corolla",
],
              "Price": ["25000"]},
{"Liscence Plate": "CBA 321",
              "Brand": ["Ford Focus",
],
              "Price": ["27000"]},
{"Liscence Plate": "ZYX 987",
              "Brand": ["Audi A4",
],
              "Price": ["35000"]},
{"Liscence Plate": "DEF 456",
              "Brand": ["NaN", "Ford"
],
              "Price": ["29000", "27000"]}

CodePudding user response：

Have a look at the .to_json() function. It will allow you to easily convert a DataFrame to json. You can change the schema of the json by supplying the orient argument.

This will work well enough, but it will not give you lists for the Brand and Price keys. If you want more flexibility, you can first use the .to_dict() function with the same orient argument, do your changes, and then convert to json using json.dump().

Edit: Based on your edit, I think you want to group by the license plate first? In that case you can do:

df.groupby('Liscence Plate').agg(list).reset_index().to_json('records')

to aggregate to lists and convert to json.

CodePudding user response：

So you want this?

df.to_json(orient='records')

Outputs:

[{
    "Brand": "Honda Civic",
    "Price": 22000,
    "Liscence Plate": "ABC 123"
}, {
    "Brand": "Toyota Corolla",
    "Price": 25000,
    "Liscence Plate": "XYZ 789"
}, {
    "Brand": "Ford Focus",
    "Price": 27000,
    "Liscence Plate": "CBA 321"
}, {
    "Brand": "Audi A4",
    "Price": 35000,
    "Liscence Plate": "ZYX 987"
}, {
    "Brand": null,
    "Price": 29000,
    "Liscence Plate": "DEF 456"
}]

Edit:

df = df.groupby('Liscence Plate').agg({'Brand': lambda x: list(x), 'Price': lambda x: list(x)}).reset_index()
df.to_json(orient='records')

[{
    "Liscence Plate": "ABC 123",
    "Brand": ["Honda Civic"],
    "Price": [22000]
}, {
    "Liscence Plate": "CBA 321",
    "Brand": ["Ford Focus"],
    "Price": [27000]
}, {
    "Liscence Plate": "DEF 456",
    "Brand": [null, "Ford F-150"],
    "Price": [29000, 33000]
}, {
    "Liscence Plate": "XYZ 789",
    "Brand": ["Toyota Corolla"],
    "Price": [25000]
}, {
    "Liscence Plate": "ZYX 987",
    "Brand": ["Audi A4"],
    "Price": [35000]
}]

CodePudding user response：

Using pandas.DataFrame.iterrows and building 'manually' the result.

data = {'form' : [
            {k:[str(s[k])] if t == list else str(s[k])
                for k, t in (("Liscence Plate", str), ("Brand", list), ("Price", list))}
            for _, s in df.iterrows()]
       }

>>> data
{'form': [
    {'Brand': ['Honda Civic'], 'Liscence Plate': 'ABC 123', 'Price': ['22000']},
    {'Brand': ['Toyota Corolla'], 'Liscence Plate': 'XYZ 789', 'Price': ['25000']},
    {'Brand': ['Ford Focus'], 'Liscence Plate': 'CBA 321', 'Price': ['27000']},
    {'Brand': ['Audi A4'], 'Liscence Plate': 'ZYX 987', 'Price': ['35000']},
    {'Brand': ['nan'], 'Liscence Plate': 'DEF 456', 'Price': ['29000']}
    ]
}

Which is quite close to what you are looking for.