Home > database >  How to convert dictionary with multiple keys-values pairs to DataFrame
How to convert dictionary with multiple keys-values pairs to DataFrame

Time:11-14

I try to clean the data with this code

empty = {}
    mess = lophoc_clean.query("lop_diemquatrinh.notnull()")[['lop_id', 'lop_diemquatrinh']]
    keys = []
    values = []
    for index, rows  in mess.iterrows():
        if len(rows['lop_diemquatrinh']) >4:
            values.append(rows['lop_diemquatrinh'])
            keys.append(rows['lop_id'])
    df = pd.DataFrame(dict(zip(keys, values)), index = [0]).transpose()
    df.columns = ['data']

The result is a dictionary like this

     {'data': {37: '[{"date_update":"31-03-2022","diemquatrinh":"6.0"}]',
      38: '[{"date_update":"11-03-2022","diemquatrinh":"6.25"}]',
      44: '[{"date_update":"25-12-2021","diemquatrinh":"6.0"},{"date_update":"28-04-2022","diemquatrinh":"6.25"},{"date_update":"28-07-2022","diemquatrinh":"6.5"}]',
      1095: '[{"date_update":null,"diemquatrinh":null}]'}}

However, I don't know how to make them into a DataFrame with 3 columns like this. Please help me. Thank you!

id updated_at diemquatrinh
38 11-03-2022 6.25
44 25-12-2021 6.0
44 28-04-2022 6.25
44 28-07-2022 6.5
1095 null null

CodePudding user response:

Here you go.

from json import loads
from pprint import pp

import pandas as pd


def get_example_data():
    return [
        dict(id=38, updated_at="2022-03-11", diemquatrinh=6.25),
        dict(id=44, updated_at="2021-12-25", diemquatrinh=6),
        dict(id=44, updated_at="2022-04-28", diemquatrinh=6.25),
        dict(id=1095, updated_at=None),
    ]


df = pd.DataFrame(get_example_data())
df["updated_at"] = pd.to_datetime(df["updated_at"])

print(df.dtypes, "\n")
pp(loads(df.to_json()))
print()
print(df, "\n")
pp(loads(df.to_json(orient="records")))

It produces this output:

id                       int64
updated_at      datetime64[ns]
diemquatrinh           float64
dtype: object 

{'id': {'0': 38, '1': 44, '2': 44, '3': 1095},
 'updated_at': {'0': 1646956800000,
                '1': 1640390400000,
                '2': 1651104000000,
                '3': None},
 'diemquatrinh': {'0': 6.25, '1': 6.0, '2': 6.25, '3': None}}

     id updated_at  diemquatrinh
0    38 2022-03-11          6.25
1    44 2021-12-25          6.00
2    44 2022-04-28          6.25
3  1095        NaT           NaN 

[{'id': 38, 'updated_at': 1646956800000, 'diemquatrinh': 6.25},
 {'id': 44, 'updated_at': 1640390400000, 'diemquatrinh': 6.0},
 {'id': 44, 'updated_at': 1651104000000, 'diemquatrinh': 6.25},
 {'id': 1095, 'updated_at': None, 'diemquatrinh': None}]

Either of the JSON datastructures would be acceptable input for creating a new DataFrame from scratch.

  • Related