I need to convert data from csv format which have a lot of records (about 3 thousands) to list of objects / dicts. I started with Pandas but now, I am not sure it was a good choice. Files contains 5 columns. The structure of csv files are looks like:
readTimestamp school_subject graduate full_name term
1611658200000 mathematics 3 Edd Ston 2
1611658200000 physics 5 Edd Ston 2
1611658200000 foreign language 5 Edd Ston 2
1611658200000 geography 4 Edd Ston 2
1611658200000 history 3 Edd Ston 2
1611658200000 Informatics 4 Kate Slow 1
1611658200000 chemistry 5 Kate Slow 1
1611658200000 mathematics 5 Kate Slow 1
1611658200000 foreign language 5 Kate Slow 1
I need to receive structures as:
[
{
"readTimestamp": 123123123,
"full_name": "Edd Ston",
"term": 2,
"schools_subject": [
{
"mathematics": 3,
"phisics": 5,
"foreign language": 5,
"geography": 4,
"history": 3
}
]
},
{
"readTimestamp": 345345345,
"full_name": "Kate Slow",
"term": 1,
"schools_subject": [
{
"Informatics": 4,
"chemistry": 3,
"mathematics": 5,
"foreign language": 5
}
]
}
]
Till now I received:
df = df.groupby(['readTimestamp','full_name','term']).apply(lambda x: x[['school_subject', 'graduate']].to_dict(orient='records')).to_dict()
{(1611658200000, 'Edd Ston', 2): [{'school_subject': 'mathematics', 'graduate': 3}, {'school_subject': 'physics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}, {'school_subject': 'geography', 'graduate': 4}, {'school_subject': 'history', 'graduate': 3}], (1611658200000, 'Kate Slow', 1): [{'school_subject': 'Informatics', 'graduate': 4}, {'school_subject': 'chemistry', 'graduate': 5}, {'school_subject': 'mathematics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}]}
I will be grateful for your help and for explaining where I made a mistake
CodePudding user response:
I think your solution is possible little change - crete dictionaries per groups and then convert to dict with orient='records'
:
d = (df.groupby(['readTimestamp','full_name','term'])
.apply(lambda x: x.set_index('school_subject')['graduate'].to_dict())
.reset_index(name='schools_subject')
.to_dict(orient='records'))
print (d)
[{
'readTimestamp': 1611658200000,
'full_name': 'Edd Ston',
'term': 2,
'schools_subject': {
'mathematics': 3,
'physics': 5,
'foreign language': 5,
'geography': 4,
'history': 3
}
}, {
'readTimestamp': 1611658200000,
'full_name': 'Kate Slow',
'term': 1,
'schools_subject': {
'Informatics': 4,
'chemistry': 5,
'mathematics': 5,
'foreign language': 5
}
}]