Sure it's simple but can't figure it out.
I've got a code that creates a Dataframe from each tab of a xlsx file, then adds the df to a dictionary of data frames with the sheet name as the key (e.g. df_dict['sheet 1'] etc. The code uses a loop where the string variable sheet_name is assigned to the actual sheet name.
I'm now at a point where I want to tell my code to tidy the df up in specific ways according to which df I'm currently referring to, but I can't find how to refer to the Dataframe key. My code uses a string variable 'sheet_name' in place of the actual key as it loops through sheets.
E.g I want to say 'if df_dict[sheet_name] == df_dict['sheet 1'] then do x'
I think I'm basically getting lost of how to refer to the 'string' or 'key' part of my dataframe in the dictionary in that context.
df_dict ={}
counter = 0
list_to_do = ['sheet1','sheet2']
# Defines directory as a path - not just a string
directory = os.fsencode(file_path)
# For each file in the folder
for file in os.listdir(directory):
# Define the filename
filename = os.fsdecode(file)
# If it's a spreadsheet - work on it
if filename.endswith(".xlsx") or filename.endswith(".xlsx") or filename.endswith(".xlsm"):
print("Working on " filename)
# Add one to the counter to show how many files are worked through and which one you are on
counter = counter 1
print('Counter = ' str(counter))
# Load the workbook from the given path and filename
wb = load_workbook(filename = file_path "/" filename)
# Unprotect each sheet and save with new name
for sheet in wb:
sheet.protection.disable()
# Get the name of the current worksheet
sheet_name = sheet.title
print('Working on ' sheet_name)
if sheet_name in list_to_do:
# Loads the excel file from the path, specifying the sheet
df_dict[sheet_name] = pd.read_excel(file_path "/" filename,sheet_name)
# Get rid of unwanted rows at the top according to the specific sheet
# for all the keys in the dictionary of dataframes with rows to remove at the top
"""if df_dict[sheet_name] == 'sheet1': Do clean up stuff specific to sheet1"""
CodePudding user response:
Based on my understanding of your question, this is how I would tackle the problem. Can add error handling etc shall you need it
sheet_1 = pd.DataFrame({"a":[1,2,3],"b":[3,4,5]})
sheet_2 = pd.DataFrame({"c":[1,2,3],"d":[3,4,5]})
dict_sheet = {"s1":sheet_1,"s2":sheet_2}
def callback_1(df):
print(len(df))
def callback_2(df):
print(df.shape)
dict_callback = {"s1":callback_1,"s2":callback_2}
for key,callback in dict_callback.items():
if key in dict_sheet.keys():
df = dict_sheet[key]
output = callback(df)
CodePudding user response:
I've got a code that creates a Dataframe from each tab of a xlsx file, then adds the df to a dictionary of data frames with the sheet name as the key (e.g. df_dict['sheet 1'] etc. The code uses a loop where the string variable sheet_name is assigned to the actual sheet name.
You can use sheet_name=None
or sheet_name=list_to_do
to load your sheets into a dictionary where keys are the sheet names and values are the dataframes:
dfs = pd.read_excel('myfile.xlsx', sheet_name=list_to_do)
if 'sheet1' in dfs:
# do stuff here
dfs['sheet1'] = cleaned_df