Home > Blockchain >  Convert data from csv to list of dict
Convert data from csv to list of dict

Time:09-22

I need to convert data from csv format which have a lot of records (about 3 thousands) to list of objects / dicts. I started with Pandas but now, I am not sure it was a good choice. Files contains 5 columns. The structure of csv files are looks like:

readTimestamp   school_subject  graduate    full_name   term
1611658200000   mathematics 3   Edd Ston    2
1611658200000   physics 5   Edd Ston    2
1611658200000   foreign language    5   Edd Ston    2
1611658200000   geography   4   Edd Ston    2
1611658200000   history 3   Edd Ston    2
1611658200000   Informatics 4   Kate Slow   1
1611658200000   chemistry   5   Kate Slow   1
1611658200000   mathematics 5   Kate Slow   1
1611658200000   foreign language    5   Kate Slow   1

I need to receive structures as:

[
  {
    "readTimestamp": 123123123,
    "full_name": "Edd Ston",
    "term": 2,
    "schools_subject": [
      {
        "mathematics": 3,
        "phisics": 5,
        "foreign language": 5,
        "geography": 4,
        "history": 3
      }
    ]
  },
  {
    "readTimestamp": 345345345,
    "full_name": "Kate Slow",
    "term": 1,
    "schools_subject": [
      {
        "Informatics": 4,
        "chemistry": 3,
        "mathematics": 5,
        "foreign language": 5
      }
    ]
  }
]

Till now I received:

df = df.groupby(['readTimestamp','full_name','term']).apply(lambda x: x[['school_subject', 'graduate']].to_dict(orient='records')).to_dict()    


{(1611658200000, 'Edd Ston', 2): [{'school_subject': 'mathematics', 'graduate': 3}, {'school_subject': 'physics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}, {'school_subject': 'geography', 'graduate': 4}, {'school_subject': 'history', 'graduate': 3}], (1611658200000, 'Kate Slow', 1): [{'school_subject': 'Informatics', 'graduate': 4}, {'school_subject': 'chemistry', 'graduate': 5}, {'school_subject': 'mathematics', 'graduate': 5}, {'school_subject': 'foreign language', 'graduate': 5}]}

I will be grateful for your help and for explaining where I made a mistake

CodePudding user response:

I think your solution is possible little change - crete dictionaries per groups and then convert to dict with orient='records':

d = (df.groupby(['readTimestamp','full_name','term'])
       .apply(lambda x: x.set_index('school_subject')['graduate'].to_dict())
       .reset_index(name='schools_subject')
       .to_dict(orient='records'))

print (d)

[{
    'readTimestamp': 1611658200000,
    'full_name': 'Edd Ston',
    'term': 2,
    'schools_subject': {
        'mathematics': 3,
        'physics': 5,
        'foreign language': 5,
        'geography': 4,
        'history': 3
    }
}, {
    'readTimestamp': 1611658200000,
    'full_name': 'Kate Slow',
    'term': 1,
    'schools_subject': {
        'Informatics': 4,
        'chemistry': 5,
        'mathematics': 5,
        'foreign language': 5
    }
}]
  • Related