I have a dataframe that I would like to convert to json format by selecting the columns. And since I have a lot of lines, I can't do everything by hand
I have a dataframe that looks this :
Cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4', np.nan],
'Price': [22000,25000,27000,35000, 29000],
'Liscence Plate': ['ABC 123', 'XYZ 789', 'CBA 321', 'ZYX 987', 'DEF 456']}
df = pd.DataFrame(Cars,columns= ['Brand', 'Price', 'Liscence Plate'])
Brand Price Liscence Plate
0 Honda Civic 22000 ABC 123
1 Toyota Corolla 25000 XYZ 789
2 Ford Focus 27000 CBA 321
3 Audi A4 35000 ZYX 987
4 NaN 29000 DEF 456
5 Ford 27000 DEF 466
6 Audi A1 35000 ABC 123
And I have to convert to this :
data = {"form": [
{"Liscence Plate": "ABC 123",
"Brand": ["Honda Civic", "Audi A1"
],
"Price": ["22000", "35000"]},
{"Liscence Plate": "XYZ 789",
"Brand": ["Toyota Corolla",
],
"Price": ["25000"]},
{"Liscence Plate": "CBA 321",
"Brand": ["Ford Focus",
],
"Price": ["27000"]},
{"Liscence Plate": "ZYX 987",
"Brand": ["Audi A4",
],
"Price": ["35000"]},
{"Liscence Plate": "DEF 456",
"Brand": ["NaN", "Ford"
],
"Price": ["29000", "27000"]}
CodePudding user response:
Have a look at the .to_json() function. It will allow you to easily convert a DataFrame to json. You can change the schema of the json by supplying the orient
argument.
This will work well enough, but it will not give you lists for the Brand and Price keys. If you want more flexibility, you can first use the .to_dict()
function with the same orient
argument, do your changes, and then convert to json using json.dump().
Edit: Based on your edit, I think you want to group by the license plate first? In that case you can do:
df.groupby('Liscence Plate').agg(list).reset_index().to_json('records')
to aggregate to lists and convert to json.
CodePudding user response:
So you want this?
df.to_json(orient='records')
Outputs:
[{
"Brand": "Honda Civic",
"Price": 22000,
"Liscence Plate": "ABC 123"
}, {
"Brand": "Toyota Corolla",
"Price": 25000,
"Liscence Plate": "XYZ 789"
}, {
"Brand": "Ford Focus",
"Price": 27000,
"Liscence Plate": "CBA 321"
}, {
"Brand": "Audi A4",
"Price": 35000,
"Liscence Plate": "ZYX 987"
}, {
"Brand": null,
"Price": 29000,
"Liscence Plate": "DEF 456"
}]
Edit:
df = df.groupby('Liscence Plate').agg({'Brand': lambda x: list(x), 'Price': lambda x: list(x)}).reset_index()
df.to_json(orient='records')
[{
"Liscence Plate": "ABC 123",
"Brand": ["Honda Civic"],
"Price": [22000]
}, {
"Liscence Plate": "CBA 321",
"Brand": ["Ford Focus"],
"Price": [27000]
}, {
"Liscence Plate": "DEF 456",
"Brand": [null, "Ford F-150"],
"Price": [29000, 33000]
}, {
"Liscence Plate": "XYZ 789",
"Brand": ["Toyota Corolla"],
"Price": [25000]
}, {
"Liscence Plate": "ZYX 987",
"Brand": ["Audi A4"],
"Price": [35000]
}]
CodePudding user response:
Using pandas.DataFrame.iterrows
and building 'manually' the result.
data = {'form' : [
{k:[str(s[k])] if t == list else str(s[k])
for k, t in (("Liscence Plate", str), ("Brand", list), ("Price", list))}
for _, s in df.iterrows()]
}
>>> data
{'form': [
{'Brand': ['Honda Civic'], 'Liscence Plate': 'ABC 123', 'Price': ['22000']},
{'Brand': ['Toyota Corolla'], 'Liscence Plate': 'XYZ 789', 'Price': ['25000']},
{'Brand': ['Ford Focus'], 'Liscence Plate': 'CBA 321', 'Price': ['27000']},
{'Brand': ['Audi A4'], 'Liscence Plate': 'ZYX 987', 'Price': ['35000']},
{'Brand': ['nan'], 'Liscence Plate': 'DEF 456', 'Price': ['29000']}
]
}
Which is quite close to what you are looking for.