Python: Extracting all data from dictionary columns and put them into columns-CodePudding

I have a table that includes a column that includes a dictionary. In this dictionary, there is a key, and a list of dictionary values as follow:

id	dict_vals
123	{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},...,{'a_t':67,'b_t':41,'c_t':6}]}
345	{'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6},...{'a_t':34,'b_t':23,'c_t':}]}

Inside the nested dictionary list of values, all the keys are the same, expect for the last dictionary (e.g., a_t, b_t...). What I am trying to do, is to drop the last dictionary and convert everything else to column as the following:

id	a	b	c
123	1	8	7
123	14	6	8
123	9	4	9
345	5	82	72
345	4	64	81
345	5	3	6

For the last dictionary (e.g., a_t, b_t), I have been able to separate them and convert them with the following code.

values = table.dict_vals.str.replace("'", '"').apply(json.loads).tolist()
df = pd.DataFrame(values)
df.key_a = df.key_a.apply(lambda x: x[-1])
data_split = df["dict_vals"].apply(pd.Series)

Unfortunately, I am not sure how this method can be used to separate all the other dictionaries. Any help is appreciated! Thank you!

CodePudding user response：

Try this:

For removing the last element in list you can return list with slicing like this [:-1].
You can extract dataframe from each dict and use pd.merge for merging all dataframes.
You can use functools.reduce for merging multiple dfs.
You can use pandas.apply with axis=1 and iterate each row and find the id from the original df and add to created df.

from functools import reduce
import ast

df_res = reduce(lambda x, y: pd.merge(x, y, how='outer'), 
                df.apply(lambda x: pd.DataFrame(ast.literal_eval(x['dict_vals'])['key_a'][:-1]).assign(id=df['id'][x.name]), axis=1)).set_index('id')


print(df_res)

Output:

      a   b   c
id             
123   1   8   7
123  14   6   8
123   9   4   9
345   5  82  72
345   4  64  81
345   5   3   6

Input DataFrame:

df = pd.DataFrame({
    'id' : [123, 345],
    'dict_vals' : [{'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9}, {'a_t':67,'b_t':41,'c_t':6}]},
                   {'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':9}]}]
})

CodePudding user response：

First add the data

import pandas as pd

keys = [
    {'key_a':[{'a':1,'b':8,'c':7},{'a':14,'b':6,'c':8},{'a':9,'b':4,'c':9},{'a_t':67,'b_t':41,'c_t':6}]},
    {'key_a':[{'a':5,'b':82,'c':72},{'a':4,'b':64,'c':81},{'a':5,'b':3,'c':6}, {'a_t':34,'b_t':23,'c_t':2}]}
]

Then some vars. There should be 1 id for each key that is in the keys list

ids = [123, 345]
id, a, b, c, = [], [], [], []

First for loop is loop for setting the ids of the dicts. Then we get dict from the list as a var values Then it's seperating the dict.

for idIndex in range(len(ids)):
    values = keys[idIndex]["key_a"]  # here is dict index of idIndex
    for i in range(len(values) - 1):  # here will drop last dict
        value = values[i]

        id.append(ids[idIndex])  # here your id
        a.append(value["a"])
        b.append(value["b"])
        c.append(value["c"])

Finaly create data frame

df = pd.DataFrame({"id":id, "a": a, "b": b, "c": c})  # create data frame
print(df)

Output:

    id   a   b   c
0  123   1   8   7
1  123  14   6   8
2  123   9   4   9
3  345   5  82  72
4  345   4  64  81
5  345   5   3   6