Home > Mobile >  How To Check If A Pandas Column Value Appears As A Key In A Dictionary
How To Check If A Pandas Column Value Appears As A Key In A Dictionary

Time:11-26

I have a column in a Pandas dataframe and I wanted to double check if these column values exist as a key in a dictionary. Currently the column values have the abbreviations of the states of the US. I've found a dictionary that contains all the abbreviations of the US states alongside their full name. What I wanted to do was to check the column for if the abbreviations match the keys in the dictionary without mapping it to their value. This will allow my to check if any code in the column doesn't exist.

Here is the dictionary:

{   
    "AL": "Alabama",
    "AK": "Alaska",
    "AS": "American Samoa",
    "AZ": "Arizona",
    "AR": "Arkansas",
    "CA": "California",
    "CO": "Colorado",
    "CT": "Connecticut",
    "DE": "Delaware",
    "DC": "District Of Columbia",
    "FM": "Federated States Of Micronesia",
    "FL": "Florida",
    "GA": "Georgia",
    "GU": "Guam",
    "HI": "Hawaii",
    "ID": "Idaho",
    "IL": "Illinois",
    "IN": "Indiana",
    "IA": "Iowa",
    "KS": "Kansas",
    "KY": "Kentucky",
    "LA": "Louisiana",
    "ME": "Maine",
    "MH": "Marshall Islands",
    "MD": "Maryland",
    "MA": "Massachusetts",
    "MI": "Michigan",
    "MN": "Minnesota",
    "MS": "Mississippi",
    "MO": "Missouri",
    "MT": "Montana",
    "NE": "Nebraska",
    "NV": "Nevada",
    "NH": "New Hampshire",
    "NJ": "New Jersey",
    "NM": "New Mexico",
    "NY": "New York",
    "NC": "North Carolina",
    "ND": "North Dakota",
    "MP": "Northern Mariana Islands",
    "OH": "Ohio",
    "OK": "Oklahoma",
    "OR": "Oregon",
    "PW": "Palau",
    "PA": "Pennsylvania",
    "PR": "Puerto Rico",
    "RI": "Rhode Island",
    "SC": "South Carolina",
    "SD": "South Dakota",
    "TN": "Tennessee",
    "TX": "Texas",
    "UT": "Utah",
    "VT": "Vermont",
    "VI": "Virgin Islands",
    "VA": "Virginia",
    "WA": "Washington",
    "WV": "West Virginia",
    "WI": "Wisconsin",
    "WY": "Wyoming"
}

The column just contains state abbreviations (CA, FL, AK, AL etc). Cheers! Some of the data is below

,LOSS STATE
0,AL
1,CA
2,CO
3,DC
4,AZ
5,Nonsense
6,CA
7,PA
8,GA
9,VA
10,VA
11,VA
12,VA
13,TN
14,VA
15,CA
16,TX
17,CO
18,MO
19,CA

I want all the columns that have "valid" state abbreviations to be left alone, but want the "Nonsense" column to be changed to NA as it doesn't appear in the dictionary.

CodePudding user response:

You can create a list of all the values of your column that contain the abbreviation you want to test using df[col_name].tolist() which will result in a list as below

col_values_list = ['CA', 'FL', 'AK', 'AL','AU']

And then check whether any of these values does not exist in your dictionary's keys:

for col in col_list:
    if col not in us_dict.keys():
        print(f"{col} state abbreviation missing from dictionary key")

A much simpler way, would be just to use:

us_dict.keys() - df[col_name] # col name is your actual column name

which gets you again what you need.

  • Related