Home > Software design >  Creatine New Column Based on JSON List
Creatine New Column Based on JSON List

Time:09-16

I have the following dataset.

                     details
USA     [{'country': 'USA', 'city': 'NYC'}]
India   [{'country': 'India', 'city': 'Mumbai'}]
Canada  [{'country': 'Canada', 'city': 'VC'}]

I need to create a new column named city. I'm trying the following code snippet but finding a TypeError.

df['details'] = df['details'].str.strip('[]')
df['city'] = df['details'].map(lambda x: x['city'])
df
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-3f4a312e7420> in <module>
      1 df['details'] = df['details'].str.strip('[]')
----> 2 df['city'] = df['details'].map(lambda x: x['city'])
      3 df

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in map(self, arg, na_action)
   3907         dtype: object
   3908         """
-> 3909         new_values = super()._map_values(arg, na_action=na_action)
   3910         return self._constructor(new_values, index=self.index).__finalize__(
   3911             self, method="map"

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
    935 
    936         # mapper is a function
--> 937         new_values = map_f(values, mapper)
    938 
    939         return new_values

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-38-3f4a312e7420> in <lambda>(x)
      1 df['details'] = df['details'].str.strip('[]')
----> 2 df['city'] = df['details'].map(lambda x: x['city'])
      3 df

TypeError: string indices must be integers

I feel the problem I'm facing is with datatypes. What would be the ideal way of doing it?

Any suggestions would be appreciated. Thanks!

CodePudding user response:

The data type of details column is of str type, not dict type. What needs to be done here is that the details column first needs to be parsed via json.loads and then you can get the value of with city key.

You will need to replace single-quotes with double-quotes for it to work.

In [5]: df["details"].apply(lambda x: json.loads(x.replace("'", '"'))["city"])
Out[5]:
0    NYC
Name: details, dtype: object```

CodePudding user response:

Try below code

Explode the list and then try to access the city.

df['city'] = df['details'].explode().map(lambda x: x['city'])

do not strip using : df['details'] = df['details'].str.strip('[]')
Instead use [explode()] as shown in above code

  • Related