Home > Mobile >  How to use pd.DataFrame.replace on a column containing lists
How to use pd.DataFrame.replace on a column containing lists

Time:05-12

I am having trouble with pandas replace-function. Let's say we have an example dataframe like this:

df = pd.DataFrame({'State': ['Georgia', 'Alabama', 'Tennessee'],
                   'Cities': [['Atlanta', 'Albany'], ['Montgomery', 'Huntsville', 'Birmingham'], ['Nashville', 'Knoxville']]})

>>> df
       State                                Cities
0    Georgia                     [Atlanta, Albany]
1    Alabama  [Montgomery, Huntsville, Birmingham]
2  Tennessee                [Nashville, Knoxville]

Now I want to replace the state names and city names all by abbreviations. I have two dictionaries that define the replacement values:

state_abbrv = {'Alabama': 'AL', 'Georgia': 'GA', 'Tennessee': 'TN'}
city_abbrv = {'Albany': 'Alb.', 'Atlanta': 'Atl.', 'Birmingham': 'Birm.',
              'Huntsville': 'Htsv.', 'Knoxville': 'Kxv.',
              'Montgomery': 'Mont.', 'Nashville': 'Nhv.'}

When using pd.DataFrame.replace() on the "States" column (which only contains one value per row) it works as expected and replaces all state names:

>>> df.replace({'State': state_abbrv})

  State                                Cities
0    GA                     [Atlanta, Albany]
1    AL  [Montgomery, Huntsville, Birmingham]
2    TN                [Nashville, Knoxville]

I was hoping that it would also individually replace all matching names within the lists of the "Cities" column, but unfortunately it does not seem to work as all cities remain unabbreviated:

>>> df.replace({'Cities': city_abbrv})
      State                                Cities
0    Georgia                     [Atlanta, Albany]
1    Alabama  [Montgomery, Huntsville, Birmingham]
2  Tennessee                [Nashville, Knoxville]

How do I get the pd.DataFrame.replace() function to individually circle through all list elements in the column per row and replace accordingly?

CodePudding user response:

Try:

  1. explode to split the list into individual rows
  2. replace each column using the relevant dictionary
  3. groupby and agg to get back the original structure
>>> output = df.explode("Cities").replace({"State": state_abbrv, "Cities": city_abbrv}).groupby("State", as_index=False)["Cities"].agg(list)
  State                 Cities
0    AL  [Mont., Htsv., Birm.]
1    GA           [Atl., Alb.]
2    TN           [Nhv., Kxv.]
  • Related