I have a table that includes a column that includes a dictionary. In this dictionary, there is a key, and a list of dictionary values as follow:
id | dict_vals |
---|---|
123 | {'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},...,{'a_t':67,'b_t':41,'c_t':6}]} |
345 | {'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6},...{'a_t':34,'b_t':23,'c_t':}]} |
Inside the nested dictionary list of values, all the keys are the same, expect for the last dictionary (e.g., a_t, b_t...). What I am trying to do, is to drop the last dictionary and convert everything else to column as the following:
id | a | b | c |
---|---|---|---|
123 | 1 | 8 | 7 |
123 | 14 | 6 | 8 |
123 | 9 | 4 | 9 |
345 | 5 | 82 | 72 |
345 | 4 | 64 | 81 |
345 | 5 | 3 | 6 |
For the last dictionary (e.g., a_t, b_t), I have been able to separate them and convert them with the following code.
values = table.dict_vals.str.replace("'", '"').apply(json.loads).tolist()
df = pd.DataFrame(values)
df.key_a = df.key_a.apply(lambda x: x[-1])
data_split = df["dict_vals"].apply(pd.Series)
Unfortunately, I am not sure how this method can be used to separate all the other dictionaries. Any help is appreciated! Thank you!
CodePudding user response:
Try this:
- For removing the last element in
list
you can returnlist
with slicing like this[:-1]
. - You can extract dataframe from each
dict
and usepd.merge
for merging all dataframes. - You can use
functools.reduce
for merging multipledf
s. - You can use
pandas.apply
withaxis=1
and iterate each row and find the id from the originaldf
and add to createddf
.
from functools import reduce
import ast
df_res = reduce(lambda x, y: pd.merge(x, y, how='outer'),
df.apply(lambda x: pd.DataFrame(ast.literal_eval(x['dict_vals'])['key_a'][:-1]).assign(id=df['id'][x.name]), axis=1)).set_index('id')
print(df_res)
Output:
a b c
id
123 1 8 7
123 14 6 8
123 9 4 9
345 5 82 72
345 4 64 81
345 5 3 6
Input DataFrame:
df = pd.DataFrame({
'id' : [123, 345],
'dict_vals' : [{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9}, {'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':9}]}]
})
CodePudding user response:
First add the data
import pandas as pd
keys = [
{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},{'a_t':67,'b_t':41,'c_t':6}]},
{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':2}]}
]
Then some vars. There should be 1 id for each key that is in the keys list
ids = [123, 345]
id, a, b, c, = [], [], [], []
First for loop is loop for setting the ids of the dicts. Then we get dict from the list as a var values Then it's seperating the dict.
for idIndex in range(len(ids)):
values = keys[idIndex]["key_a"] # here is dict index of idIndex
for i in range(len(values) - 1): # here will drop last dict
value = values[i]
id.append(ids[idIndex]) # here your id
a.append(value["a"])
b.append(value["b"])
c.append(value["c"])
Finaly create data frame
df = pd.DataFrame({"id":id, "a": a, "b": b, "c": c}) # create data frame
print(df)
Output:
id a b c
0 123 1 8 7
1 123 14 6 8
2 123 9 4 9
3 345 5 82 72
4 345 4 64 81
5 345 5 3 6