I try to clean the data with this code
empty = {}
mess = lophoc_clean.query("lop_diemquatrinh.notnull()")[['lop_id', 'lop_diemquatrinh']]
keys = []
values = []
for index, rows in mess.iterrows():
if len(rows['lop_diemquatrinh']) >4:
values.append(rows['lop_diemquatrinh'])
keys.append(rows['lop_id'])
df = pd.DataFrame(dict(zip(keys, values)), index = [0]).transpose()
df.columns = ['data']
The result is a dictionary like this
{'data': {37: '[{"date_update":"31-03-2022","diemquatrinh":"6.0"}]',
38: '[{"date_update":"11-03-2022","diemquatrinh":"6.25"}]',
44: '[{"date_update":"25-12-2021","diemquatrinh":"6.0"},{"date_update":"28-04-2022","diemquatrinh":"6.25"},{"date_update":"28-07-2022","diemquatrinh":"6.5"}]',
1095: '[{"date_update":null,"diemquatrinh":null}]'}}
However, I don't know how to make them into a DataFrame with 3 columns like this. Please help me. Thank you!
id | updated_at | diemquatrinh |
---|---|---|
38 | 11-03-2022 | 6.25 |
44 | 25-12-2021 | 6.0 |
44 | 28-04-2022 | 6.25 |
44 | 28-07-2022 | 6.5 |
1095 | null | null |
CodePudding user response:
Here you go.
from json import loads
from pprint import pp
import pandas as pd
def get_example_data():
return [
dict(id=38, updated_at="2022-03-11", diemquatrinh=6.25),
dict(id=44, updated_at="2021-12-25", diemquatrinh=6),
dict(id=44, updated_at="2022-04-28", diemquatrinh=6.25),
dict(id=1095, updated_at=None),
]
df = pd.DataFrame(get_example_data())
df["updated_at"] = pd.to_datetime(df["updated_at"])
print(df.dtypes, "\n")
pp(loads(df.to_json()))
print()
print(df, "\n")
pp(loads(df.to_json(orient="records")))
It produces this output:
id int64
updated_at datetime64[ns]
diemquatrinh float64
dtype: object
{'id': {'0': 38, '1': 44, '2': 44, '3': 1095},
'updated_at': {'0': 1646956800000,
'1': 1640390400000,
'2': 1651104000000,
'3': None},
'diemquatrinh': {'0': 6.25, '1': 6.0, '2': 6.25, '3': None}}
id updated_at diemquatrinh
0 38 2022-03-11 6.25
1 44 2021-12-25 6.00
2 44 2022-04-28 6.25
3 1095 NaT NaN
[{'id': 38, 'updated_at': 1646956800000, 'diemquatrinh': 6.25},
{'id': 44, 'updated_at': 1640390400000, 'diemquatrinh': 6.0},
{'id': 44, 'updated_at': 1651104000000, 'diemquatrinh': 6.25},
{'id': 1095, 'updated_at': None, 'diemquatrinh': None}]
Either of the JSON datastructures would be acceptable input for creating a new DataFrame from scratch.