df has the following columns:
Index(
['level_id', 'description', 'level_type', 'difficulty', 'solution', 'modifier', 'last_modified', 'time_limit', 'tags', 'level_options', 'extra_info', 'hint']
, dtype='object')
column that i am converting is ="level_options"
Down below is the data inside first row of level_options:
{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]}
Then in the below code: I manually pass the above first row, got the expected output is well.
> j2=[{"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}]
> pd.json_normalize(j2)
output:
marking max_score random_options correct_options options.1 options.2 options.3 options.4
normal 1 false [2] Carbon dioxide Oxygen Urine Sweat
But while iterating through all rows I don't get the output.
I have used this below code where I was supposed to get 6 lists but got empty lists.PFA of expected output- final output
Option1=[]
Option2=[]
Option3=[]
Option4=[]
Option5=[]
Correct_option=[]
try:
> for index, row in df.iterrows():
> l=row["level_options"]
> print(l) except:
> print(index,row)
> l = json.loads(l)
> print(l)
> for i in l:
> i = pd.normalize_json(i)
> print(i)
> if i['options'] == '1':
> Option1.append(i['value'])
> elif i['options'] == '2':
> Option2.append(i['value'])
> elif i['options'] == '3':
> Option3.append(i['value'])
> elif i['options'] == '4':
> Option4.append(i['value'])
> elif i['options'] == '5':
> Option5.append(i['value'])
> elif i['options'] == 'correct_options':
> Correct_option.append(i['value'])
i even used the json.loads instead of json_normalize but error occurs
below are 2 rows of dataframe= df["level_options"]
{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]} {"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}
The Output shown below is what I am expecting after merging each list as a single dataframe, the problem I am getting in is while retrieving the lists...
Thanks and regards
options.1 options.2 options.3 options.4 options.5 correct_options
Carbon dioxide Oxygen Urine Sweat NaN [2]
Yes No NaN NaN NaN [1]
CodePudding user response:
Just try pd.json_normalize
on column
out = (pd.json_normalize(df['level_options'])
.filter(like='option')
.drop('random_options', axis=1))
print(out)
correct_options options.1 options.2 options.3 options.4
0 [2] Carbon dioxide Oxygen Urine Sweat
1 [1] Yes No NaN NaN
CodePudding user response:
Thanks bro, but i got this error when i used your code: what data you placed inside df["level_options"]
KeyError Traceback (most recent call last)
Input In [107], in <cell line: 1>()
----> 1 out = (pd.json_normalize(df["level_options"])
2 .filter(like='option')
3 .drop('random_options', axis=1))
KeyError: "['random_options'] not found in axis"
i also used the same normalize fun but got the output for only those rows which i passed manually as i have already did it but i want directly from dataframe...not got the output for all rows.
z=[{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]} ,{"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}]
a=pd.json_normalize(z).filter(like='option').drop('random_options', axis=1)
print(a)
correct_options options.1 options.2 options.3 options.4
0 [2] Carbon dioxide Oxygen Urine Sweat
1 [1] Yes No NaN NaN