I have a dictionary with correct values that I want to be cross checked to the values found in a data frame. I want this operation to be in a function for use with other code later on.
import pandas as pd
d=[['Aland Islands','Cars','[email protected]']]
df=pd.DataFrame(d,columns=['country','industry','Email'])
errors={}
valid_dict={"country": ["Afghanistan", "Aland Islands"],"industry": ["Automotive", "Banking / Finance"]}
valid_dict={k:v for k, v in valid_dict.items() if k in df.columns.values}
This is just checking to makes sure all the keys and items in valid_dict are column names in the data frame. This works as expected no changes needed here just adding for context.
Here's the problem child of the code. I have tried to create a function but I am new with making functions. I want to compare the valid_dict key and items to the column names and values in the data frame and print a simple statement
def validate(df, valid_dict):
{i:k for k, v in valid_dict.items() for i in v}
for c in valid_dict:
if df[c] in list(c):
return
else:
for c in valid_dict:
for i in df.index:
errors={ "row": i,
"column": c,
"message": "This is an invalid entry, fill in " c " accordingly" }
return errors,df
print(validate(df, valid_dict))
I know this code is a mess I have tried all different kinds of things but I cant get the results I want.
desired output is:
errors={ "row": 0, column": industry, "message": "This is an invalid entry, fill in " industry " accordingly" }
how to cross check a dictionary to a data frame to identify values not found in the set list of items in the dictionary?
For the scenario asked if a column has 10 values and 5 were errors I want it to print all 5 errors.
CodePudding user response:
# inver the dictionary
d={i:k for k, v in valid_dict.items() for i in v}
# map industry and when its null, return an error message
# else the valid industry name
df['check']=df['industry'].mask(df['industry'].map(d).isna(), f"An invalid Value found in {col}")
df
country industry Email check
0 Aland Islands Cars [email protected] An invalid Value found in industry
FUNCTION:
def validate(col='industry', d=valid_dict):
# column to validate
# dictionary
d={i:k for k, v in valid_dict.items() for i in v}
# map column to dictionary
s=df[col].mask(df[col].map(d).isna(), "An invalid Value found in industry")
# return the rows where the mapping had failed
return s[s.map(d).isna()]
validate('industry')
0 An invalid Value found in industry
2 An invalid Value found in industry
Name: industry, dtype: object