Home > Software engineering >  Creating a new column based on matching values from a list
Creating a new column based on matching values from a list

Time:03-02

I have a list of tuples with the first element of the tuple consisting of a city and state name separated by a comma and the second element containing the name of the county:

print(county_lookup)

[('Normal,Alabama', 'Madison County'), ('Birmingham,Alabama', 'Jefferson County'), ('Montgomery,Alabama', 'Montgomery County'), ('Huntsville,Alabama', 'Madison County'), ('Tuscaloosa,Alabama', 'Tuscaloosa County'), ('Alexander City,Alabama', 'Tallapoosa County'), ('Athens,Alabama', 'Limestone County')]

I was hoping to be able to use the list to create a new column in a pre-existing dataframe for 'county' data using the values already present in the list of tuples.

df_schools['county'] = a=[x[n] for x in county_lookup]

However I soon realized that df_schools already has a city_state column containing values similar to the first element of each tuple of the list county_lookup.

df_schools.city.city_state

0              Normal,Alabama
1          Birmingham,Alabama
2          Montgomery,Alabama
3          Huntsville,Alabama
4          Montgomery,Alabama
                ...          
7698     Overland Park,Kansas
7699    Highland Heights,Ohio
7700      San Jose,California
7701     Lancaster,California
7702        San Antonio,Texas

I was hoping to ask if there was a way I could compare the first element of each tuple in the list to the city-state column in df_schools dataframe in order to create a new column 'county' with the corresponding information from the second element of each tuple from the country_lookup list.

CodePudding user response:

You can use the pd.merge function:

df = pd.DataFrame(county_lookup, columns=['city_state', 'county'])

df_schools = df_schools.merge(df, how='left', on='city_state')

Now df_schools has a new 'county' column (which might have empty values, if the lookup was not successful).

CodePudding user response:

you can turn it to a dict and map it on the column/Series:

df['city_state'].map(dict(country_lookup))
  • Related