I am reading excel files in folder and loading into dataframe. I am fetching values from some columns where some excel files have columns missing that I am looking. In that case, I want to populate that missing column as N/A for that excel file and continue processing so I can have save all my results in dataframe. I am learning python and I need help here. Below is my code
from pathlib import Path
import pandas as pd
p = Path(path to excel)
filtered_files = [x for x in p.glob("**/*.xlsx") if not x.name.__contains__("AC0")]
for i, file in enumerate(filtered_files):
full_df = pd.read_excel(file, sheet_name=[1], header=1)
df = full_df[1]
# get column
col_1_name = df.columns[2]
ded_ind_df = df[df[col_1_name] == 'DED Individual']
ded_fmem_df = df[df[col_1_name] == 'DED Family Member']
result[i] = {
'IND DED INN': list(ded_ind_df['In-Network\nVALUE']),
'DED FAM INN':list(ded_fmem_df ['In-Network\nVALUE']),
'IND DED OON': list(ded_ind_df['Out-of-Network\nVALUE']),
'DED FAM OON':list(ded_fmem_df ['Out-of-Network\nVALUE'])
}
result = pd.DataFrame.from_dict(result)
When I run, I am getting below error
IND DED OON': list(ded_ind_df['Out-of-Network\nVALUE']),
indexer = self.columns.get_loc(key)
raise KeyError(key) from err
KeyError: 'Out-of-Network\nVALUE'
This is because one of the excel has Out of Network column is not there. In this case, I want to skip and continue processing next file.
CodePudding user response:
You could try using one of these options before your result[i] = ...
line. To create the columns but using empty values
for column in ["In-Network\nVALUE", "Out-of-Network\nVALUE"]:
if column not in ded_ind_df.columns:
ded_ind_df[column] = None
# result[i] = {...
But, if you want to skip the iteration and jump to the next one:
if any(column not in ded_ind_df.columns for column in ["In-Network\nVALUE", "Out-of-Network\nVALUE"]):
continue # Skip current interation
# result[i] = {...