I have a dictionary of dictionaries. The "outer" dictionary is a time series (110 rows) and each key is a date. Inside I have yet another dictionary, where the keys are names of groups. For each date there is a varying number of groups. Each group is a dataframe, with a fixed number of columns (15 variables), but the number of rows changes for each group. I want to pick only 4 variables to be extracted to Excel.
The job is to create an .xlsx file in Python, name the file, loop over keys (dates), create a new Excel sheet, where sheet name must be the date (key). Then I want to extract data from each group. All the groups, for the specific date, must be listed in the sheet as rows and then the 4 variables as columns.
Data example:
dict_1 = {dict: 110}
'2014-03-01' = {dict: 11}
'2014-04-01' = {dict: 10}
'group_1' = {DataFrame: (4, 15)}
'group_2' = {DataFrame: (2, 15)}
I imagine a nested loop like this:
writer = pd.ExcelWriter('Output.xlsx')
for key in dict_1:
(Here I want to add new sheet and name it based on date/key)
(then loop over groups to extract data to sheet)
I am new to Python and working with dictionaries. Hope to get a push in the right direction. Just enough to get me started.
Thank you!
CodePudding user response:
I explained using code's comments.
columns_of_interest = ['col1', 'col2', 'col3', 'col4']
writer = pd.ExcelWriter('Output.xlsx')
for date, subdict in dict_1.items():
# subdict.values() has all the dataframes, we concatenate them
df = pd.concat(list(subdict.values()))
# then we pick our columns
df = df[columns_of_interest]
# and save it as a spread sheet.
df.to_excel(writer, sheet_name=str(date))
writer.close()