I have the following dataset.
details
USA [{'country': 'USA', 'city': 'NYC'}]
India [{'country': 'India', 'city': 'Mumbai'}]
Canada [{'country': 'Canada', 'city': 'VC'}]
I need to create a new column named city
. I'm trying the following code snippet but finding a TypeError.
df['details'] = df['details'].str.strip('[]')
df['city'] = df['details'].map(lambda x: x['city'])
df
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-3f4a312e7420> in <module>
1 df['details'] = df['details'].str.strip('[]')
----> 2 df['city'] = df['details'].map(lambda x: x['city'])
3 df
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in map(self, arg, na_action)
3907 dtype: object
3908 """
-> 3909 new_values = super()._map_values(arg, na_action=na_action)
3910 return self._constructor(new_values, index=self.index).__finalize__(
3911 self, method="map"
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
935
936 # mapper is a function
--> 937 new_values = map_f(values, mapper)
938
939 return new_values
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-38-3f4a312e7420> in <lambda>(x)
1 df['details'] = df['details'].str.strip('[]')
----> 2 df['city'] = df['details'].map(lambda x: x['city'])
3 df
TypeError: string indices must be integers
I feel the problem I'm facing is with datatypes. What would be the ideal way of doing it?
Any suggestions would be appreciated. Thanks!
CodePudding user response:
The data type of details
column is of str
type, not dict
type. What needs to be done here is that the details
column first needs to be parsed via json.loads
and then you can get the value of with city
key.
You will need to replace single-quotes with double-quotes for it to work.
In [5]: df["details"].apply(lambda x: json.loads(x.replace("'", '"'))["city"])
Out[5]:
0 NYC
Name: details, dtype: object```
CodePudding user response:
Try below code
Explode the list and then try to access the city.
df['city'] = df['details'].explode().map(lambda x: x['city'])
do not strip using : df['details'] = df['details'].str.strip('[]')
Instead use [explode()]
as shown in above code