I have the following list
test={'data': [{'name': 'john',
'insights': {'data': [{'id': '123',
'person_id': '456',
'date_start': '2022-12-31',
'date_stop': '2023-01-29',
'impressions': '4070',
'spend': '36.14'}],
'paging': {'cursors': {'before': 'MAZDZD', 'after': 'MAZDZD'}}},
'id': '978'}]}
I want to create a pandas dataframe where the columns are the name, date_start, date_stop, impressions, and spend.
I tried doing this,
data = pd.DataFrame()
data = data.append(test['data'])
But the insight now becomes a column like such
name insights id
john {'data': [{'id': '123', 'person_id': '456', 'd... 978
How do I get the impressions and the spend from the insight column? When I tried
test['data']['insights']
I got an error
list indices must be integers or slices, not str
CodePudding user response:
One option is to use pandas.json_normalize
with pandas.Series.explode
:
df = (
pd.json_normalize(test["data"])
['insights.data']
.explode()
.pipe(lambda s: pd.DataFrame(s.tolist()))
)
Output :
print(df)
id person_id date_start date_stop impressions spend
0 123 456 2022-12-31 2023-01-29 4070 36.14
CodePudding user response:
Try:
import pandas as pd
test = {
"data": [
{
"name": "john",
"insights": {
"data": [
{
"id": "123",
"person_id": "456",
"date_start": "2022-12-31",
"date_stop": "2023-01-29",
"impressions": "4070",
"spend": "36.14",
}
],
"paging": {"cursors": {"before": "MAZDZD", "after": "MAZDZD"}},
},
"id": "978",
}
]
}
df = pd.DataFrame(
[
{
"name": d["name"],
"date_start": dd["date_start"],
"date_stop": dd["date_stop"],
"impressions": dd["impressions"],
}
for d in test["data"]
for dd in d["insights"]["data"]
]
)
print(df)
Prints:
name date_start date_stop impressions
0 john 2022-12-31 2023-01-29 4070
CodePudding user response:
Here is an alternative approach:
df = pd.DataFrame(test["data"][0]["insights"]["data"],
columns=["name", "date_start", "date_stop", "impressions", "spend"])
df["name"] = test["data"][0]["name"]
print(df)
name date_start date_stop impressions spend
0 john 2022-12-31 2023-01-29 4070 36.14
CodePudding user response:
seriously messy dataframe
data['insights'][0]['data'][0]['impressions']
data['insights'][0]['data'][0]['spend']
CodePudding user response:
Use pd.json_normalize
:
>>> pd.json_normalize(test["data"], ['insights', 'data'])
id person_id date_start date_stop impressions spend
0 123 456 2022-12-31 2023-01-29 4070 36.14