I need to convert the ‘content’ column from a string dictionary to a dictionary in python. After that I will use the following line of code:
df[‘content’].apply(pd.Series).
To have the dictionary values as a column name and the dictionary value in a cell.
I can’t do this now because there are missing values in the dictionary string.
How can I handle missing values in the dictionary when I use the function eval(String dictionary) -> dictionary?
[I'm working on the 'content' column that I want to convert to the correct format first, I tried with the eval() function, but it doesn't work, because there are missing values. This is json data.
My goal is to have the content column data for the keys in the column titles and the values in the cells](https://i.stack.imgur.com/1CsIl.png)
CodePudding user response:
I cannot see what the missing values look like in your screenshot, but i tested the following code and got what seems to be a good result. The simple explanation in to use str.replace() to fix the null values before parsing the string to dict.
import pandas as pd
import numpy as np
import json
## setting up an example dataframe. note that row2 has a null value
json_example = [
'{"row1_key1":"row1_value1","row1_key2":"row1_value2"}',
'{"row2_key1":"row2_value1","row2_key2": null}'
]
df= pd.DataFrame()
df['Content'] = json_example
## using string replace on the string representation of the json to clean it up
df['Content'].apply(lambda x: x.replace('null','"0"'))
## using lambda x to first load the string into a dict, then applying pd.Series()
df['Content'].apply(lambda x: pd.Series(json.loads(x)))
CodePudding user response:
you can use json.loads in lambda function. if row value is nan, pass, if not, apply json.loads: :
import json
import numpy as np
df['content']=df['content'].apply(lambda x: json.loads(x) if pd.notna(x) else np.nan)
now you can use pd.Series.