How to create a dataframe using a list of dictionaries that also consist of lists-CodePudding

I have a list of dictionaries that also consist of lists and would like to create a dataframe using this list. For example, the data looks like this:

lst = [{'France': [[12548, ABC], [45681, DFG], [45684, HJK]]},
 {'USA': [[84921, HJK], [28917, KLESA]]},
 {'Japan':[[38292, ASF], [48902, DSJ]]}]

And this is the dataframe I'm trying to create

Country      Amount    Code
France       12548     ABC
France       45681     DFG
France       45684     HJK
USA          84921     HJK
USA          28917     KLESA
Japan        38292     ASF
Japan        48902     DSJ

As you can see, the keys became column values of the country column and the numbers and the strings became the amount and code columns. I thought I could use something like the following, but it's not working.

df = pd.DataFrame(lst)

CodePudding user response：

You probably need to transform the data into a format that Pandas can read.

Original data

data = [
    {"France": [[12548, "ABC"], [45681, "DFG"], [45684, "HJK"]]},
    {"USA": [[84921, "HJK"], [28917, "KLESA"]]},
    {"Japan": [[38292, "ASF"], [48902, "DSJ"]]},
]

Transforming the data

new_data = []
for country_data in data:
    for country, values in country_data.items():
        new_data  = [{"Country": country, "Amount": amt, "Code": code} for amt, code in values]

Create the dataframe

df = pd.DataFrame(new_data)

Ouput

  Country  Amount   Code
0  France   12548    ABC
1  France   45681    DFG
2  France   45684    HJK
3     USA   84921    HJK
4     USA   28917  KLESA
5   Japan   38292    ASF
6   Japan   48902    DSJ

CodePudding user response：

df = pd.concat([pd.DataFrame(elem) for elem in list])
df = df.apply(lambda x: pd.Series(x.dropna().values)).stack()
df = df.reset_index(level=[0], drop=True).to_frame(name = 'vals')
df = pd.DataFrame(df["vals"].to_list(),index= df.index, columns=['Amount', 'Code']).sort_index()
print(df)

output:

        Amount   Code
France   12548    ABC
USA      84921    HJK
Japan    38292    ASF
France   45681    DFG
USA      28917  KLESA
Japan    48902    DSJ
France   45684    HJK

CodePudding user response：

Use nested list comprehension for flatten data and pass to DataFrame constructor:

lst = [
    {"France": [[12548, "ABC"], [45681, "DFG"], [45684, "HJK"]]},
    {"USA": [[84921, "HJK"], [28917, "KLESA"]]},
    {"Japan": [[38292, "ASF"], [48902, "DSJ"]]},
]

L = [(country, *x) for country_data in lst 
                   for country, values in country_data.items() 
                   for x in values]

df = pd.DataFrame(L, columns=['Country','Amount','Code'])
print (df)
  Country  Amount   Code
0  France   12548    ABC
1  France   45681    DFG
2  France   45684    HJK
3     USA   84921    HJK
4     USA   28917  KLESA
5   Japan   38292    ASF
6   Japan   48902    DSJ

CodePudding user response：

Build a new dictionary that combines the individual dicts into one, before concatenating the dataframes:

new_dict = {}
for ent in lst:
    for key, value in ent.items():
        new_dict[key] = pd.DataFrame(value, columns = ['Amount', 'Code'])

pd.concat(new_dict, names=['Country']).droplevel(1).reset_index()

  Country  Amount   Code
0  France   12548    ABC
1  France   45681    DFG
2  France   45684    HJK
3     USA   84921    HJK
4     USA   28917  KLESA
5   Japan   38292    ASF
6   Japan   48902    DSJ