My final objective is to create a column called 'Status' that indicates if active or cancelled based on the name of a sheet. I need it to check if the sheetname start with the word 'Full Member List'. If so then Active, else the Status column should be Cancelled. How do I do this below? I only need help with the one line in this code where I have commented #need help with below line. I get an invalid syntax error for that line
My attempt-
import pandas as pd
import os
from openpyxl import load_workbook
cols_to_drop = ['PSI ID','PSIvet Region','PSIvet region num','Fax','County','Ship state']
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State','Billing state':'State'}
for file in os.listdir("C:/Users/hh/Desktop/autotranscribe/python/Matching"):
if file.startswith("PSI"):
dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
output = dict()
for ws, df in dfs.items():
if any(ws.startswith(x) for x in ["New Members", "PVCC"]):
continue
temp = df
#need help with below line
temp['Status'] = "Active" if any(ws.startwith(x) for x in == "Full Member List" else "Cancelled" )
#drop unneeded columns
temp = df.drop(cols_to_drop, errors="ignore", axis=1)
#rename columns
temp = temp.rename(columns=column_name_update_map)
#drop empty columns
temp = temp.dropna(how="all", axis=1)
temp['Partner'] = "PSI"
output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
CodePudding user response:
I think you need to:
- fix the indenting in your code on the line reading
temp = df
- fix the typo "startwith" to be
startswith
- consider adding logic to ignore files containing
(updated headers)
- change the line you are asking about to this:
temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"
An updated version of your code looks like this:
import pandas as pd
import os
from openpyxl import load_workbook
cols_to_drop = ['PSI ID','PSIvet Region','PSIvet region num','Fax','County','Ship state']
column_name_update_map = {'Account name': 'Company Name','Billing address':'Address','Billing city':'City','Billing State':'State','Billing state':'State'}
for file in os.listdir("."):
if file.startswith("PSI") and "(updated headers)" not in file:
dfs = pd.read_excel(file, sheet_name=None,skiprows=5)
output = dict()
for ws, df in dfs.items():
if any(ws.startswith(x) for x in ["New Members", "PVCC"]):
continue
temp = df
temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"
#drop unneeded columns
temp = df.drop(cols_to_drop, errors="ignore", axis=1)
#rename columns
temp = temp.rename(columns=column_name_update_map)
#drop empty columns
temp = temp.dropna(how="all", axis=1)
temp['Partner'] = "PSI"
output[ws] = temp
writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
To test this, I created an xlsx file named PSI 001.xlsx
with a sheet named Full Member List 001
containing the following anchored at cell A1:
will skip
will skip
will skip
will skip
will skip
foo
1
2
3
The output was stored to a file named PSI 001 (updated headers).xlsx
with a sheet named as above with the following contents anchored at cell A1:
foo Status Partner
1 Active PSI
2 Active PSI
3 Active PSI
CodePudding user response:
In case you want to check for full string "Full Member List" in start of your sheetname.
temp['Status'] = "Active" if ws.startswith("Full Member List") else "Cancelled"
To check, if either of words "Full", "Member", "List" appears in the sheetname:
for x in "Full Member List".split(" "):
if ws.startswith(x):
temp["Status"] = "Active"
break
if temp["Status"] != "Active":
temp["Status"] = "Cancelled"