I am having trouble with pandas replace-function. Let's say we have an example dataframe like this:
df = pd.DataFrame({'State': ['Georgia', 'Alabama', 'Tennessee'],
'Cities': [['Atlanta', 'Albany'], ['Montgomery', 'Huntsville', 'Birmingham'], ['Nashville', 'Knoxville']]})
>>> df
State Cities
0 Georgia [Atlanta, Albany]
1 Alabama [Montgomery, Huntsville, Birmingham]
2 Tennessee [Nashville, Knoxville]
Now I want to replace the state names and city names all by abbreviations. I have two dictionaries that define the replacement values:
state_abbrv = {'Alabama': 'AL', 'Georgia': 'GA', 'Tennessee': 'TN'}
city_abbrv = {'Albany': 'Alb.', 'Atlanta': 'Atl.', 'Birmingham': 'Birm.',
'Huntsville': 'Htsv.', 'Knoxville': 'Kxv.',
'Montgomery': 'Mont.', 'Nashville': 'Nhv.'}
When using pd.DataFrame.replace()
on the "States" column (which only contains one value per row) it works as expected and replaces all state names:
>>> df.replace({'State': state_abbrv})
State Cities
0 GA [Atlanta, Albany]
1 AL [Montgomery, Huntsville, Birmingham]
2 TN [Nashville, Knoxville]
I was hoping that it would also individually replace all matching names within the lists of the "Cities" column, but unfortunately it does not seem to work as all cities remain unabbreviated:
>>> df.replace({'Cities': city_abbrv})
State Cities
0 Georgia [Atlanta, Albany]
1 Alabama [Montgomery, Huntsville, Birmingham]
2 Tennessee [Nashville, Knoxville]
How do I get the pd.DataFrame.replace()
function to individually circle through all list elements in the column per row and replace accordingly?
CodePudding user response:
Try:
explode
to split the list into individual rowsreplace
each column using the relevant dictionarygroupby
andagg
to get back the original structure
>>> output = df.explode("Cities").replace({"State": state_abbrv, "Cities": city_abbrv}).groupby("State", as_index=False)["Cities"].agg(list)
State Cities
0 AL [Mont., Htsv., Birm.]
1 GA [Atl., Alb.]
2 TN [Nhv., Kxv.]