Home > Software design >  Convert Request Response to Dataframe where there is a separate key for columns and rows
Convert Request Response to Dataframe where there is a separate key for columns and rows

Time:12-25

How do I select rows in the response to be the rows of the dataframe and the cols in the response to be the columns with the correct data types in the dataframe?

r = requests.get(url) is returning the below JSON string.

{
  "data": {
    "rows": [
      [
        "2016-09-06T21:41:38-04:00",
        "The Zebra"
      ],
      [
        "2018-10-29T21:41:38-04:00",
        "The Dog"
      ]
    ],
    "cols": [
      {
        "display_name": "CreatedDate",
        "source": "native",
        "field_ref": [
          "field",
          "created_at",
          {
            "base-type": "type/DateTime"
          }
        ],
        "name": "created_at",
        "base_type": "type/DateTime",
        "effective_type": "type/DateTime"
      },
      {
        "display_name": "name",
        "source": "native",
        "field_ref": [
          "field",
          "created_at",
          {
            "base-type": "type/text"
          }
        ],
        "name": "created_at",
        "base_type": "type/text",
        "effective_type": "type/text"
      }
    ]
  }
}

So far I am trying the below. My main issue is iterating the columns.

data = json.loads(r)
rows = data["data"]["rows"]
cols = data["data"]["cols"]

df = pd.DataFrame(data= rows, columns = cols)

The expected output will look like the below:

 ------------ ------------- ----------- 
| CreatedDate              | Name      |  
 ------------ ------------- ----------- 
|2016-09-06T21:41:38-04:00 |  The Zebra| 
|2018-10-29T21:41:38-04:00 |  The Dog  | 
 ------------ --------- --------------- 

CodePudding user response:

You could apply pd.to_datetime combined with pandas.DataFrame.convert_dtypes function to infer a proper data type for all columns:

data = json.loads(s)['data']
df = pd.DataFrame(data=data['rows'], columns=[d['display_name'] for d in data['cols']])
df = df.apply(pd.to_datetime, errors='ignore').convert_dtypes()

print(df)

The output:

               CreatedDate       Name
0 2016-09-06 21:41:38-04:00  The Zebra
1 2018-10-29 21:41:38-04:00    The Dog

Summary of the dataframe (including dtypes):

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype                                 
---  ------       --------------  -----                                 
 0   CreatedDate  2 non-null      datetime64[ns, pytz.FixedOffset(-240)]
 1   Name         2 non-null      string                                
dtypes: datetime64[ns, pytz.FixedOffset(-240)](1), string(1)
memory usage: 160.0 bytes
  • Related