How do i drop a specific column from a sheet in my xls file in python at index level?-CodePudding

I have multiple sheets in my xlsx file. The specific sheet I have is named Cancelled Members.

The dataframe of this sheet(Cancelled Members) looks like

Address State Zip Status Status Date  Partner
xx      NY    110  G      O      1     V

I want to drop the first Status column from this specific sheet.

I tried

import pandas as pd
from openpyxl import load_workbook 
temp = pd.read_excel(file, sheet_name=None,skiprows=5)
temp = if ws.startswith("Cancelled"): temp.drop(temp.columns[[3]], axis=1)

I tried to drop it at index level [[3]] but I receive an invalid syntax error though. How do I drop the col within that sheet?

CodePudding user response：

You can do what you're asking like this:

import pandas as pd
temp = pd.DataFrame(columns=['Address', 'State', 'Zip', 'Status', 'Status', 'Date', 'Partner'],
    data=[['xx','NY',110,'G','O',1,'V']])
print(temp)
temp.columns = [col   str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
print(temp)

Output:

  Address State  Zip Status Status  Date Partner
0      xx    NY  110      G      O     1       V
  Address State  Zip Status  Date Partner
0      xx    NY  110      O     1       V

The temp.columns assignment appends a unique number to each column named Status, thereby ensuring there are no duplicate columns with that name. We then drop the column at the position you have in mind, and then restore the name of any remaining columns that start with Status to again be Status.

The full test code looks like this:

file="PSI 001.xlsx"
import pandas as pd
from openpyxl import load_workbook 
dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
output = dict()
for ws, df in dfs.items():
    if ws.startswith("Cancelled"):
        temp = df
        temp.columns = [col   str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
        temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
        output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
    df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()

I tested this with a file named PSI 001.xlsx and a sheet named Cancelled containing the following:

skip                        
skip                        
skip                        
skip                        
skip                        
Address State   Zip Status  Status  Date    Partner
xx  NY  110 G   O   1   V

... and it produced a filed named PSI 001 (updated headers) with a sheet named as above with the following contents:

Address State   Zip Status  Date    Partner
xx  NY  110 O   1   V

CodePudding user response：

You can try doing this:

# loop through each in dictionary of dataframes
for sheet in temp:
    # check if sheet starts with 'Cancelled'
    if sheet.startswith('Cancelled'):
        # for each column in dataframe
        for column in list(temp[sheet]):
            # check if column is named 'Status'
            if column == 'Status':
                # drop column if True
                temp[sheet] = temp[sheet].drop(column, axis=1)