How to get the string/key from a dictionary of data frames in pandas?-CodePudding

Sure it's simple but can't figure it out.

I've got a code that creates a Dataframe from each tab of a xlsx file, then adds the df to a dictionary of data frames with the sheet name as the key (e.g. df_dict['sheet 1'] etc. The code uses a loop where the string variable sheet_name is assigned to the actual sheet name.

I'm now at a point where I want to tell my code to tidy the df up in specific ways according to which df I'm currently referring to, but I can't find how to refer to the Dataframe key. My code uses a string variable 'sheet_name' in place of the actual key as it loops through sheets.

E.g I want to say 'if df_dict[sheet_name] == df_dict['sheet 1'] then do x'

I think I'm basically getting lost of how to refer to the 'string' or 'key' part of my dataframe in the dictionary in that context.

df_dict ={}
counter = 0
list_to_do = ['sheet1','sheet2']

# Defines directory as a path - not just a string
directory = os.fsencode(file_path)

# For each file in the folder
for file in os.listdir(directory):
    
    # Define the filename
    filename = os.fsdecode(file)
    
    # If it's a spreadsheet - work on it
    if filename.endswith(".xlsx") or filename.endswith(".xlsx") or filename.endswith(".xlsm"):
        print("Working on "   filename)
        
        # Add one to the counter to show how many files are worked through and which one you are on
        counter = counter 1
        print('Counter = '   str(counter))
        
        # Load the workbook from the given path and filename
        wb = load_workbook(filename = file_path   "/"   filename)
        
        # Unprotect each sheet and save with new name
        for sheet in wb:
            sheet.protection.disable()
            
            # Get the name of the current worksheet
            sheet_name = sheet.title
            print('Working on '   sheet_name)
            
            if sheet_name in list_to_do:
                # Loads the excel file from the path, specifying the sheet
                df_dict[sheet_name] = pd.read_excel(file_path   "/"   filename,sheet_name)

                # Get rid of unwanted rows at the top according to the specific sheet
                # for all the keys in the dictionary of dataframes with rows to remove at the top
                
                """if df_dict[sheet_name] == 'sheet1': Do clean up stuff specific to sheet1"""

CodePudding user response：

Based on my understanding of your question, this is how I would tackle the problem. Can add error handling etc shall you need it

sheet_1 = pd.DataFrame({"a":[1,2,3],"b":[3,4,5]})
sheet_2 = pd.DataFrame({"c":[1,2,3],"d":[3,4,5]})
dict_sheet = {"s1":sheet_1,"s2":sheet_2}

def callback_1(df):
    print(len(df))

def callback_2(df):
    print(df.shape)

dict_callback = {"s1":callback_1,"s2":callback_2}
for key,callback in dict_callback.items():
    if key in dict_sheet.keys():
        df = dict_sheet[key]
        output = callback(df)

CodePudding user response：

I've got a code that creates a Dataframe from each tab of a xlsx file, then adds the df to a dictionary of data frames with the sheet name as the key (e.g. df_dict['sheet 1'] etc. The code uses a loop where the string variable sheet_name is assigned to the actual sheet name.

You can use sheet_name=None or sheet_name=list_to_do to load your sheets into a dictionary where keys are the sheet names and values are the dataframes:

dfs = pd.read_excel('myfile.xlsx', sheet_name=list_to_do)
if 'sheet1' in dfs:
    # do stuff here
    dfs['sheet1'] = cleaned_df