I read my data from excel and saved it in data frame format. One of the columns of the data has data in a dictionary format(same shape but not dictionary format), which is recognized as a string format. So, I want to change the data type of all rows (more than 40k) in that column from string to dictionary format. The when printing out column, the results look like this:
df['fruit']
0 NaN
1 {'apple': [{'A': 1, 'B': 2, ...
2 {'apple': [{'A': 3, 'B': 4, ...
3 {'orange': [{'A': 5, 'B': 6...
4 {'apple': [{'A': 0, 'B': 9, ...
If I use that to_dict() to the column, it will be converted as follows.
df['fruit'].to_dict()
{0: NaN,
1: "{'apple': [{'A': 1, 'end': b, ...}",
2: "{'apple': [{'A': 3, 'B': 4, ...}",
3: "{'orange': [{'A': 5, 'B': 6...}",
4: "{'apple': [{'A': 0, 'B': 9, ...}",
Then, when using to_dict('list'), I got the following error message.
df['fruit'].to_dict('list')
....
TypeError: unsupported type: <class 'str'>
I want to use the dictionary format because I need only the information corresponding to 'B' in the data corresponding to the 'orange.'
Any help would be greatly appreciated!
CodePudding user response:
Use:
import pandas as pd
df = pd.DataFrame({'string dict':["{'a': 1}", "{'b':2}"]})
df['string dict'].apply(eval)
which can be validated as follows:
type(df['string dict'].apply(eval)[0])
returns:
dict
Based on your comment:
df['string dict'].fillna('{}').apply(eval)
I reproduced your error using the following test data:
df = pd.DataFrame({'string dict':["{'a': 1}", "{'b':2}", np.nan, 2]})