Home > front end >  Reading JSON Column in Python Dataframe
Reading JSON Column in Python Dataframe

Time:09-06

I have a csv file in which there is column df['questions'] with JSON data

| Date | Agent Name   | Questions |
| 8/5/2022 | Alaa M   | the specified column in question please view the example below |
| 8/5/2022 | Othman M | the specified column in question please view the example below |

an example of the data in that column

[ {'id': 'dee52266-c096-47f4-96d4-6346498039ee', 'name': '1.G – Did an issue been raised?', 'displayOrder': 13, 'type': 'choice', 'multiSelect': False, 'questionsResponseModel': [{'id': 'a3e0ac59-5cc1-4654-a6bc-fbc71d86ba25', 'name': 'No'}], 'parentGroup': 'f1654f7c-204f-48d0-b940-ee9bb98eafa0', 'score': '0', 'maxScore': '0', 'percentage': '0'}, {'id': '6b0a92b4-fad9-488d-8296-030799ee00eb', 'name': '1.G - Comment', 'displayOrder': 14, 'type': 'text', 'multiSelect': None, 'questionsResponseModel': 'NA', 'parentGroup': 'f1654f7c-204f-48d0-b940-ee9bb98eafa0', 'score': '0', 'maxScore': '0', 'percentage': '0'} ]


import pandas as pd
import numpy as np


df = pd.read_csv('Desktop\Data.csv')

#first I tried to replace ' to " to view it as JSON however it is not working

def js(row):
    #return row['questions'].lower().replace("'", '"')

df['new_questions'] = df.apply(js, axis=1)

df["new_questions_2"] = df["new_questions"].apply(json.loads)

#second tried to apply pd.series which also does not work

out = (df.drop(columns=['questions'])
         .join(df['questions'].apply(pd.Series).add_prefix('questions_'))
      )

CodePudding user response:

Try:

import ast

df["Questions"] = df["Questions"].apply(ast.literal_eval)
df = df.explode("Questions")
df = pd.concat([df, df.pop("Questions").apply(pd.Series)], axis=1)

df = df.explode("questionsResponseModel")
df = pd.concat(
    [df, df.pop("questionsResponseModel").apply(pd.Series).add_prefix("qrm_")],
    axis=1,
)
df = df.drop(columns="qrm_0")
print(df)

Prints:

       Date Agent Name                                    id                             name  displayOrder    type multiSelect                           parentGroup score maxScore percentage                                qrm_id qrm_name
0  8/5/2022     Alaa M  dee52266-c096-47f4-96d4-6346498039ee  1.G – Did an issue been raised?            13  choice       False  f1654f7c-204f-48d0-b940-ee9bb98eafa0     0        0          0  a3e0ac59-5cc1-4654-a6bc-fbc71d86ba25       No
0  8/5/2022     Alaa M  6b0a92b4-fad9-488d-8296-030799ee00eb                    1.G - Comment            14    text        None  f1654f7c-204f-48d0-b940-ee9bb98eafa0     0        0          0                                   NaN      NaN
1  8/5/2022   Othman M  dee52266-c096-47f4-96d4-6346498039ee  1.G – Did an issue been raised?            13  choice       False  f1654f7c-204f-48d0-b940-ee9bb98eafa0     0        0          0  a3e0ac59-5cc1-4654-a6bc-fbc71d86ba25       No
1  8/5/2022   Othman M  6b0a92b4-fad9-488d-8296-030799ee00eb                    1.G - Comment            14    text        None  f1654f7c-204f-48d0-b940-ee9bb98eafa0     0        0          0                                   NaN      NaN

Edit: "exploded" questionsResponseModel column too.

  • Related