I have a task to transform data from csv file to nested JSON file using python for web usage. I tried the Python code in this article. The desired output would be one member_id
shows one time in the JSON file, same for the tag_name
under one member_id
. The problem is when I tried groupby
with only member_id
, the tag_name
'm1' would show multiple times under 'abc123' for example. If I tried groupby
with both member_id
and tag_name
, 'abc123' would show up two times for tag 'm1' and 'm2'. I've been googling for a while but most resolutions only for one nested(not sure if I use the right term). Please let me know if any possible way to do this.
Sample Code:
import json
import pandas as pd
df = pd.read_csv('../detail.csv', sep=',', header=0
, index_col=False
, dtype = {'member_id':str,'tag_name':str,'detail_name':str,'detail_value':str} )
group = df.groupby(['member_id','tag_name'])
finalList, finalDict = [], {}
for key, value in group:
dictionary, dictionary1, dictList, dictList1 = {}, {}, [], []
j = group.get_group(key).reset_index(drop=True)
dictionary['member_id'] = j.at[0,'member_id']
dictionary1['tag_name'] = j.at[0,'tag_name']
for i in j.index:
anotherDict = {}
anotherDict['detail_name'] = j.at[i,'detail_name']
anotherDict['detail_value'] = j.at[i,'detail_value']
dictList1.append(anotherDict.copy())
dictionary1['detail'] = dictList1
dictList.append(dictionary1)
dictionary['tag'] = dictList
finalList.append(dictionary)
json.dumps(finalList,ensure_ascii = False)
detail.csv:
member_id, tag_name, detail_name, detail_value
-------------------------------------------------------
abc123, m1, Service_A, 20
abc123, m1, Service_B, 20
abc123, m2, Service_C, 10
xyz456, m3, Service A, 5
Desired Output JSON:
{ "member_id": "abc123",
"tag":[ {"tag_name": "m1",
"detail":[{ "detail_name": "Service_A",
"detail_value": "20"},
{ "detail_name": "Service_B",
"detail_value": "20"}]},
{"tag_name": "m2",
"detail":[{ "detail_name": "Service_C",
"detail_value": "10"}]}]},
{ "member_id": "xyz456",
"tag":[{"tag_name": "m3",
"detail":[{ "detail_name": "Service_A",
"detail_value": "5"}]}]}
CodePudding user response:
I'm not aware of pandas functions allowing to do achieve that directly. Moreover, you introduce keys that are not part of your initial dataframe (tag
, detail
). So implementing a generic solution seems difficult.
However, if you don't have more columns than stated in your question, you can go through your dataframe, grouping column after column:
result = []
for member_id, member_df in df.groupby('member_id'):
member_dict = {'member_id': member_id}
member_dict['tag'] = []
for tag_name, tag_df in member_df.groupby('tag_name'):
tag_dict = {'tag_name': tag_name}
tag_dict['detail'] = []
for detail_name, detail_df in tag_df.groupby('detail_name'):
detail_dict = {'detail_name': detail_name}
detail_dict['detail_value'] = detail_df.detail_value.mean() # should be only one value, taking 'mean' just in case
tag_dict['detail'].append(detail_dict)
member_dict['tag'].append(tag_dict)
result.append(member_dict)
print(json.dumps(result, indent=4))
Output:
[
{
"member_id": "abc123",
"tag": [
{
"tag_name": "m1",
"detail": [
{
"detail_name": "Service_A",
"detail_value": 20.0
},
{
"detail_name": "Service_B",
"detail_value": 20.0
}
]
},
{
"tag_name": "m2",
"detail": [
{
"detail_name": "Service_C",
"detail_value": 10.0
}
]
}
]
},
{
"member_id": "xyz456",
"tag": [
{
"tag_name": "m3",
"detail": [
{
"detail_name": "Service A",
"detail_value": 5.0
}
]
}
]
}
]