Home > Blockchain >  How do I add a column to my dataframe for sheet related attributes in Python?
How do I add a column to my dataframe for sheet related attributes in Python?

Time:05-04

My current script is able to loop through a directory and make multi level changes for sheets through pandas. I want to add columns in my dataframe which are based off the sheet

There are 2 sheets in my final dataframe with same columns but one sheet is named 'Voting Members' and the other is named 'Removed Members'. I need a column added to the dataframe called 'Status' where for the voting members sheet under the Status column all rows say 'Active' and for the 'Removed Members' sheet all rows say "Cancelled" example-

original dataframe for sheetname 'Voting Members'

Zip FirstName   LastName  
60618   Amur    Shonan    
62960   Karla   Cunnungham 

Desired output

Zip FirstName   LastName  Status
60618   Amur    Shonan    Active
62960   Karla   Cunnungham Active

So basically, if there are 20 rows, there should be a status column having 20 'actives; for sheetname 'Voting Members; and 20 rows of 'cancelled' for the sheet 'Removed members'.

My code so far -

import pandas as pd
import os
column_name_update_map = {'LocationName': 'Company Name','StreetAddress':'Address','City':'City','State':'State'}  
cols_to_drop =  ['TypeDescription','LastName','FirstName','StreetAddress2']     

for file in os.listdir("C:/Users/hh/Desktop/oo/python/Partner Matching"):
    if file.startswith("TVC"):
        dfs = pd.read_excel(file, sheet_name=None)
        output = dict()
        for ws, df in dfs.items():
            if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
                continue
            #drop unneeded columns
            temp = df.drop(cols_to_drop, errors="ignore", axis=1)
            #rename columns
            temp = temp.rename(columns=column_name_update_map)
            #drop empty columns
            temp = temp.dropna(how="all", axis=1)
            output[ws] = temp
        writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()

How can I implement this within my code so as to get the status column for sheets within the file?

CodePudding user response:

Within your inner loop, try something like this:

            if ws in ["Voting Members", "Removed Members"]:
                temp = df
                temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                temp = temp[["Zip", "FirstName", "LastName", "Status"]]
                output[ws] = temp
  • Related