I have multiple sheets in my xlsx file. The specific sheet I have is named Cancelled Members.
The dataframe of this sheet(Cancelled Members) looks like
Address State Zip Status Status Date Partner
xx NY 110 G O 1 V
I want to drop the first Status column from this specific sheet.
I tried
import pandas as pd
from openpyxl import load_workbook
temp = pd.read_excel(file, sheet_name=None,skiprows=5)
temp = if ws.startswith("Cancelled"): temp.drop(temp.columns[[3]], axis=1)
I tried to drop it at index level [[3]] but I receive an invalid syntax error though. How do I drop the col within that sheet?
CodePudding user response:
You can do what you're asking like this:
import pandas as pd
temp = pd.DataFrame(columns=['Address', 'State', 'Zip', 'Status', 'Status', 'Date', 'Partner'],
data=[['xx','NY',110,'G','O',1,'V']])
print(temp)
temp.columns = [col str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
print(temp)
Output:
Address State Zip Status Status Date Partner
0 xx NY 110 G O 1 V
Address State Zip Status Date Partner
0 xx NY 110 O 1 V
The temp.columns
assignment appends a unique number to each column named Status
, thereby ensuring there are no duplicate columns with that name. We then drop the column at the position you have in mind, and then restore the name of any remaining columns that start with Status
to again be Status
.
The full test code looks like this:
file="PSI 001.xlsx"
import pandas as pd
from openpyxl import load_workbook
dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
output = dict()
for ws, df in dfs.items():
if ws.startswith("Cancelled"):
temp = df
temp.columns = [col str(i) if col == "Status" else col for i, col in enumerate(temp.columns)]
temp = temp.drop(temp.columns[3], axis=1).rename(columns={col:"Status" for col in temp.columns if col.startswith("Status")})
output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
I tested this with a file named PSI 001.xlsx
and a sheet named Cancelled
containing the following:
skip
skip
skip
skip
skip
Address State Zip Status Status Date Partner
xx NY 110 G O 1 V
... and it produced a filed named PSI 001 (updated headers)
with a sheet named as above with the following contents:
Address State Zip Status Date Partner
xx NY 110 O 1 V
CodePudding user response:
You can try doing this:
# loop through each in dictionary of dataframes
for sheet in temp:
# check if sheet starts with 'Cancelled'
if sheet.startswith('Cancelled'):
# for each column in dataframe
for column in list(temp[sheet]):
# check if column is named 'Status'
if column == 'Status':
# drop column if True
temp[sheet] = temp[sheet].drop(column, axis=1)