My current script is able to loop through a directory and make multi level changes for sheets through pandas. I want to add columns in my dataframe which are based off the sheet
There are 2 sheets in my final dataframe with same columns but one sheet is named 'Voting Members' and the other is named 'Removed Members'. I need a column added to the dataframe called 'Status' where for the voting members sheet under the Status column all rows say 'Active' and for the 'Removed Members' sheet all rows say "Cancelled" example-
original dataframe for sheetname 'Voting Members'
Zip FirstName LastName
60618 Amur Shonan
62960 Karla Cunnungham
Desired output
Zip FirstName LastName Status
60618 Amur Shonan Active
62960 Karla Cunnungham Active
So basically, if there are 20 rows, there should be a status column having 20 'actives; for sheetname 'Voting Members; and 20 rows of 'cancelled' for the sheet 'Removed members'.
My code so far -
import pandas as pd
import os
column_name_update_map = {'LocationName': 'Company Name','StreetAddress':'Address','City':'City','State':'State'}
cols_to_drop = ['TypeDescription','LastName','FirstName','StreetAddress2']
for file in os.listdir("C:/Users/hh/Desktop/oo/python/Partner Matching"):
if file.startswith("TVC"):
dfs = pd.read_excel(file, sheet_name=None)
output = dict()
for ws, df in dfs.items():
if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
continue
#drop unneeded columns
temp = df.drop(cols_to_drop, errors="ignore", axis=1)
#rename columns
temp = temp.rename(columns=column_name_update_map)
#drop empty columns
temp = temp.dropna(how="all", axis=1)
output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
How can I implement this within my code so as to get the status column for sheets within the file?
CodePudding user response:
Within your inner loop, try something like this:
if ws in ["Voting Members", "Removed Members"]:
temp = df
temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
temp = temp[["Zip", "FirstName", "LastName", "Status"]]
output[ws] = temp