Home > Net >  Cross Referencing dictionary Items to values in a data frame
Cross Referencing dictionary Items to values in a data frame

Time:11-03

I have a dictionary with correct values that I want to be cross checked to the values found in a data frame. I want this operation to be in a function for use with other code later on.

import pandas as pd

d=[['Aland Islands','Cars','[email protected]']]
df=pd.DataFrame(d,columns=['country','industry','Email'])

errors={}

valid_dict={"country": ["Afghanistan", "Aland Islands"],"industry": ["Automotive", "Banking / Finance"]}
valid_dict={k:v for k, v in valid_dict.items() if k in df.columns.values}

This is just checking to makes sure all the keys and items in valid_dict are column names in the data frame. This works as expected no changes needed here just adding for context.

Here's the problem child of the code. I have tried to create a function but I am new with making functions. I want to compare the valid_dict key and items to the column names and values in the data frame and print a simple statement

def validate(df, valid_dict):
    {i:k for k, v in valid_dict.items() for i in v}
    for c in valid_dict:
        if df[c] in list(c):
              return 
        else:
              for c in valid_dict:
                  for i in df.index:
                      errors={ "row": i,                                    
                     "column": c,                                                 
                     "message": "This is an invalid entry, fill in " c  " accordingly" }  
                  return errors,df


print(validate(df, valid_dict))

I know this code is a mess I have tried all different kinds of things but I cant get the results I want.

desired output is: errors={ "row": 0, column": industry, "message": "This is an invalid entry, fill in " industry " accordingly" }

how to cross check a dictionary to a data frame to identify values not found in the set list of items in the dictionary?

For the scenario asked if a column has 10 values and 5 were errors I want it to print all 5 errors.

CodePudding user response:

# inver the dictionary
d={i:k for k, v in valid_dict.items() for i in v}

# map industry and when its null, return an error message
# else the valid industry name
df['check']=df['industry'].mask(df['industry'].map(d).isna(), f"An invalid Value found in {col}")
df
country     industry    Email   check
0   Aland Islands   Cars    [email protected]  An invalid Value found in industry

FUNCTION:

def validate(col='industry', d=valid_dict):
    # column to validate
    # dictionary
    d={i:k for k, v in valid_dict.items() for i in v}
    
    # map column to dictionary 
    s=df[col].mask(df[col].map(d).isna(), "An invalid Value found in industry")
    
    # return the rows where the mapping had failed
    return s[s.map(d).isna()]

   
validate('industry') 
0    An invalid Value found in industry
2    An invalid Value found in industry
Name: industry, dtype: object
  • Related