Turning a list in a list into a Python Pandas Dataframe-CodePudding

I have the following list

test={'data': [{'name': 'john',
   'insights': {'data': [{'id': '123',
      'person_id': '456',
      'date_start': '2022-12-31',
      'date_stop': '2023-01-29',
      'impressions': '4070',
      'spend': '36.14'}],
    'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
   'id': '978'}]}

I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.

I tried doing this,

data = pd.DataFrame()
data = data.append(test['data'])

But the insight now becomes a column like such

name    insights                                             id
john    {'data': [{'id': '123', 'person_id': '456', 'd...   978

How do I get the impressions and the spend from the insight column? When I tried

test['data']['insights']

I got an error

list indices must be integers or slices, not str

CodePudding user response：

One option is to use pandas.json_normalize with pandas.Series.explode :

df = (
        pd.json_normalize(test["data"])
            ['insights.data']
                .explode()
                .pipe(lambda s: pd.DataFrame(s.tolist()))
      )

Output :

print(df)

    id person_id  date_start   date_stop impressions  spend
0  123       456  2022-12-31  2023-01-29        4070  36.14

CodePudding user response：

Try:

import pandas as pd

test = {
    "data": [
        {
            "name": "john",
            "insights": {
                "data": [
                    {
                        "id": "123",
                        "person_id": "456",
                        "date_start": "2022-12-31",
                        "date_stop": "2023-01-29",
                        "impressions": "4070",
                        "spend": "36.14",
                    }
                ],
                "paging": {"cursors": {"before": "MAZDZD", "after": "MAZDZD"}},
            },
            "id": "978",
        }
    ]
}

df = pd.DataFrame(
    [
        {
            "name": d["name"],
            "date_start": dd["date_start"],
            "date_stop": dd["date_stop"],
            "impressions": dd["impressions"],
        }
        for d in test["data"]
        for dd in d["insights"]["data"]
    ]
)
print(df)

Prints:

   name  date_start   date_stop impressions
0  john  2022-12-31  2023-01-29        4070

CodePudding user response：

Here is an alternative approach:

df = pd.DataFrame(test["data"][0]["insights"]["data"], 
        columns=["name", "date_start", "date_stop", "impressions", "spend"])
df["name"] = test["data"][0]["name"]
print(df)

   name  date_start   date_stop impressions  spend
0  john  2022-12-31  2023-01-29        4070  36.14

CodePudding user response：

seriously messy dataframe

data['insights'][0]['data'][0]['impressions']
data['insights'][0]['data'][0]['spend']

CodePudding user response：

Use pd.json_normalize:

>>> pd.json_normalize(test["data"], ['insights', 'data'])

    id person_id  date_start   date_stop impressions  spend
0  123       456  2022-12-31  2023-01-29        4070  36.14