If in the input data first_name and last_name is null then remove those records from df and put into new dataframe error df with extra columns "rejeted_reason": ['first_name,'last_name] is empty.
Input Data:
customer_number|first_name|middle_name|last_name|gender
90617174||Aliari||Male
92154246|Roberto||Intriago Nunez|Male
07605348|E|A|Christodoulou|Male
80284242|Ritchie|O||Male
Error File :
customer_number|first_name|middle_name|last_name|gender|rejection_reason
90617174||Aliari||Male|["first_name","last_name] is empty
80284242|Ritchie|O||Male|["last_name"] is empty
Output File:
customer_number|first_name|middle_name|last_name|gender
92154246|Roberto||Intriago Nunez|Male
07605348|E|A|Christodoulou|Male
Code Tried:
newList = ['first_name','last_name']
for index,row in df.iterrows():
error_col = []
temp_dic = []
for col in newList:
if (row[col] == '' or pd.isna(row[col]) or pd.isnull(row[col])):
error_col.append(col)
row["rejection_reason"] = col ' is empty'
df.drop(index, inplace=True)
temp_dic.append(row)
print("temp dic:", temp_dic)
Error
raise KeyError(f"{labels[mask]} not found in axis")
KeyError: '[0] not found in axis'
CodePudding user response:
Since you drop the row each time you see empty column in that row, you are removing a row more than once. So after removing the row for the first time, you see KeyError
. This should work:
newList = ['first_name','last_name']
temp_dic = []
for index,row in df.iterrows():
error_col = []
for col in newList:
if (row[col] == '' or pd.isna(row[col]) or pd.isnull(row[col])):
error_col.append(col)
if len(error_col) > 0:
df.drop(index, inplace=True)
temp_dic.append(row)
row["rejection_reason"] = str(error_col) ' is empty'
print("temp dic:", temp_dic)
But I recommend doing this instead of processing each row:
newList = ['first_name','last_name']
def check_columns(row):
error_col = []
for col in newList:
if (row[col] == '' or pd.isna(row[col]) or pd.isnull(row[col])):
error_col.append(col)
if len(error_col) > 0:
return str(error_col) ' is empty'
else:
return ''
df['rejection_reason'] = df.apply(check_columns, axis=1)
df_error = df[df['rejection_reason'] != '']
df_output = df[df['rejection_reason'] == '']
CodePudding user response:
given the input datframe df, in order to filter out the rows where 'last_name' or 'first_name' are null the following will work:
filter_df = df[~(df['first_name'].isnull() | df['last_name'].isnull())]
In order to create the error dataframe containing rows that have 'last_name' or 'first_name' null and the corresponding error the following code will work:
error_df = df[(df['first_name'].isnull() | df['last_name'].isnull())].copy()
error_df.loc[error_df['first_name'].isnull(), "rejeted_reason"] = "['first_name'] is empty."
error_df.loc[error_df['last_name'].isnull(), "rejeted_reason"] = "['last_name'] is empty."
error_df.loc[(error_df['first_name'].isnull() & error_df['last_name'].isnull()), "rejeted_reason"] = "['first_name', 'last_name'] is empty."
Output of filter_df given the above input:
Output of error_df given the above input: