I have the starting data frame looking like this:
my_list = [
['Japan', 'Flowers'],
['United States', 'Guns', 'yes'],
['Japan', 'Sushi'],
['South Korea', 'Sunscreen']
]
df = pd.DataFrame(my_list, columns=["country", "sector", "flag"])
I need the output to be in this exact format of a dictionary for an API request, but I can't format it exactly as it needs to be using to_dict() method. The "id" is the number of row from the data frame where the data is located.
{"groups":
[
{
"name": "Japan",
"groups":
[
{"name": "Flowers", "ids": [0]}
],
"groups":
[
{"name": "Sushi", "ids": [2]}
]
},
{
"name": "United States",
"groups":
[
{
"name": "Guns",
"groups":
[
{"name": "yes", "ids": [1]}
]
}
]
},
{
"name": "South Korea",
"groups":
[
{"name": "Sunscreen",
"ids": [3]}
]
}
]
}
CodePudding user response:
Try:
out = []
for idx, g in df.groupby("country"):
out.append({"name": idx})
ids = {}
for i, s in g["sector"].iteritems():
ids.setdefault(s, []).append(i)
out[-1]["groups"] = [{"name": k, "ids": v} for k, v in ids.items()]
out = {"groups": out}
print(out)
Prints:
{
"groups": [
{
"name": "Japan",
"groups": [
{"name": "Flowers", "ids": [0]},
{"name": "Sushi", "ids": [2]},
],
},
{"name": "South Korea", "groups": [{"name": "Sunscreen", "ids": [3]}]},
{"name": "United States", "groups": [{"name": "Guns", "ids": [1]}]},
]
}