Transform single Json columns' all rows into multiple columns-CodePudding

df has the following columns:

Index(

['level_id', 'description', 'level_type', 'difficulty', 'solution', 'modifier', 'last_modified', 'time_limit', 'tags', 'level_options', 'extra_info', 'hint']

, dtype='object')

column that i am converting is ="level_options"

Down below is the data inside first row of level_options:

{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]}

Then in the below code: I manually pass the above first row, got the expected output is well.

>     j2=[{"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}]
>     pd.json_normalize(j2)

output:

marking   max_score     random_options   correct_options      options.1   options.2     options.3   options.4  
normal      1             false            [2]       Carbon dioxide Oxygen         Urine        Sweat

But while iterating through all rows I don't get the output.
I have used this below code where I was supposed to get 6 lists but got empty lists.PFA of expected output- final output

Option1=[] 
Option2=[] 
Option3=[] 
Option4=[] 
Option5=[]
Correct_option=[]
 try:
>     for index, row in df.iterrows():
>         l=row["level_options"]
>         print(l) except:
>     print(index,row)
>     l = json.loads(l) 
>     print(l)
>     for i in l:
>         i = pd.normalize_json(i) 
>         print(i)
>         if i['options'] == '1':
>             Option1.append(i['value'])
>         elif i['options'] == '2':
>             Option2.append(i['value'])
>         elif i['options'] == '3':
>             Option3.append(i['value'])
>         elif i['options'] == '4':
>             Option4.append(i['value'])
>         elif i['options'] == '5':
>             Option5.append(i['value'])
>         elif i['options'] == 'correct_options':
>             Correct_option.append(i['value'])

i even used the json.loads instead of json_normalize but error occurs

below are 2 rows of dataframe= df["level_options"]

{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]} {"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}

The Output shown below is what I am expecting after merging each list as a single dataframe, the problem I am getting in is while retrieving the lists...

Thanks and regards

options.1   options.2     options.3 options.4  options.5    correct_options    
Carbon dioxide  Oxygen      Urine      Sweat       NaN               [2]    
Yes              No           NaN        NaN        NaN               [1]

CodePudding user response：

Just try pd.json_normalize on column

out = (pd.json_normalize(df['level_options'])
       .filter(like='option')
       .drop('random_options', axis=1))

print(out)

  correct_options       options.1 options.2 options.3 options.4
0             [2]  Carbon dioxide    Oxygen     Urine     Sweat
1             [1]             Yes        No       NaN       NaN

CodePudding user response：

Thanks bro, but i got this error when i used your code: what data you placed inside df["level_options"]

KeyError                                  Traceback (most recent call last)
Input In [107], in <cell line: 1>()
----> 1 out = (pd.json_normalize(df["level_options"])
      2        .filter(like='option')
      3        .drop('random_options', axis=1))
KeyError: "['random_options'] not found in axis"

i also used the same normalize fun but got the output for only those rows which i passed manually as i have already did it but i want directly from dataframe...not got the output for all rows.

   z=[{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]} ,{"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}] 
a=pd.json_normalize(z).filter(like='option').drop('random_options', axis=1)


    print(a)
    correct_options  options.1 options.2 options.3 options.4 
0       [2]    Carbon dioxide  Oxygen   Urine   Sweat 
1       [1]         Yes          No     NaN       NaN