Nested dictionary to Dataframe in the most efficient way possible-CodePudding

I have a nested dictionary like this one:

my_dict[user_profile][user_id][level] = [[9999, 'Heavy Purchaser', 340, 'Star_chest', 999, 1000],
   [9999, 'Heavy Purchaser', 340, 'Star_chest', 998, 5],
   [9999, 'Heavy Purchaser', 340, 'Star_chest', 3, 1],
   [9999, 'Heavy Purchaser', 340, 'Star_chest', 4, 1]]

Basically, per each user_profile, user_id I'm collecting the rewards received per level. The number of lists contained in dict[user_profile][user_id][level] is variable and not fix.

A reward looks like this : [9999, 'Heavy Purchaser', 340, 'Star_chest', 999, 1000]

I want to create a DF of rewards using the most efficient and fastest solution. In the end this is what I want:

 ID      user_profile     user_id  Chest_type  item_code  amount
9999  'Heavy Purchaser'  340       'Star_chest'  999      1000
9999  'Heavy Purchaser'  340       'Star_chest'  4        1
9999  'Heavy Purchaser'  340       'Star_chest'  3        1

I tried to append each single list using df.loc[df.shape[0]] = list_with_rewards, but it's taking too much time. Any suggestion ?

CodePudding user response：

The data that you are starting with is not a nested dictionary, it is just a nested list. You may want to consider transitioning to a nested dictionary that would seem to make more sense for the type of data you are gathering... But that is another question. :)

In pandas, generally the last thing you want to do is add to a data frame row by row, or anything row by row in general. If you look through the dox for data frame, there are several ways to create from data, based on data structure or file type and data orientation. Your data is a "list of lists" where each list can be interpreted as a "record" or one row in a datframe or database. So, you can just use the from_records() construct. Behold:

In [7]: import pandas as pd

In [8]: data = [[9999, 'Heavy Purchaser', 340, 'Star_chest', 999, 1000],
   ...:    [9999, 'Heavy Purchaser', 340, 'Star_chest', 998, 5],
   ...:    [9999, 'Heavy Purchaser', 340, 'Star_chest', 3, 1],
   ...:    [9999, 'Heavy Purchaser', 340, 'Star_chest', 4, 1]]

In [9]: type(data)
Out[9]: list

In [10]: pd.DataFrame.from_records(data, columns=['ID', 'user', 'user_id', 'chest', 'count', 'amount'])
Out[10]: 
     ID             user  user_id       chest  count  amount
0  9999  Heavy Purchaser      340  Star_chest    999    1000
1  9999  Heavy Purchaser      340  Star_chest    998       5
2  9999  Heavy Purchaser      340  Star_chest      3       1
3  9999  Heavy Purchaser      340  Star_chest      4       1