I have a dictionary with structure:
Level 1:
- id (int)
- username (str)
- meta (contain a string of Kpi_info)
This is a dictionary:
dict = {'id': 206, 'username': 'hantran','meta': '{"kpi_info":\
{"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}},\
"2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}'
}
My desire result is a DataFame like this:
id | username | Year | Month | revenue | kpi | result |
---|---|---|---|---|---|---|
206 | hantran | 2021 | 1 | 2000 | 2100 | 0 |
206 | hantran | 2021 | 2 | 2500 | 2000 | 1 |
206 | hantran | 2022 | 1 | 3000 | 2500 | 1 |
206 | hantran | 2022 | 2 | 2500 | 3000 | 0 |
Apparently, similar question has been discussed here. However, the solution only work for 3-level dictionary. I don't know how to make it work for my 1-level dictionary with most of the needed information is in a string.
CodePudding user response:
If the string in your dictionary is valid json, it can easily be converted into a dictionary:
from json import loads
d = {'id': 206, 'username': 'hantran', 'meta': '{"kpi_info": {"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}}, "2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}}'}
d['meta'] = loads(d['meta'])
However, the representation of the dictionary in your code is not a valid dictionary, as it is missing a closing }
. There's no easy way to deal with errors like these, so you should check if your actual data has this problem, or whether you should check the code you share more carefully.
Note that you shouldn't call a dictionary dict
, since doing so will shadow the actual dict
type and you won't be able to access it normally after that.
With the dictionary d
it's now fairly easy to construct a DataFrame as needed:
from pandas import DataFrame
df = DataFrame([
{
'id': d['id'], 'username': d['username'],
'year': int(k1), 'month': int(k2),
'revenue': d2['revenue'], 'kpi': d2['kpi'], 'result': d2['result']
}
for k1, d1 in d['meta']['kpi_info'].items()
for k2, d2 in d1.items()
])
print(df)
Result:
id username year month revenue kpi result
0 206 hantran 2021 1 2000 2100 0
1 206 hantran 2021 2 2500 2000 1
2 206 hantran 2022 1 3000 2500 1
3 206 hantran 2022 2 2500 3000 0