So I have this JSON
{
"data": [
{
"id": 2674,
"timestamp": "2022-07-01",
"vendor_name": "test",
"npi_id": "1234567890",
"trigger_description": "test",
"echo_order": 0,
"duration": "0",
"image_url": "https://....",
"name": "Banner Ad Name 1",
"triggers": ["test "]
},
{
...
Which I am reading like so df = pd.read_json("./data/data.json", typ="frame")
The output is ...
> python main.py
data
0 {'id': 2674, 'timestamp': '2022-07-01', 'vendo...
1 {'id': 2675, 'timestamp': '2022-07-01', 'vendo...
2 {'id': 6789, 'timestamp': '2022-07-01', 'vendo...
3 {'id': 2321, 'timestamp': '2022-07-01', 'vendo...
4 {'id': 5678, 'timestamp': '2022-07-01', 'vendo...
5 {'id': 1234, 'timestamp': '2022-07-01', 'vendo...
I am looking to extract certain cols from the output. But for the life of me I can't figure out how. The data is within the key data
which is a list.
When you do df.columns
on the dataframe you only get Index(['data'], dtype='object')
How can i get inside data and extract what i need
CodePudding user response:
So I have managed to do it...but I am not sure if it's the true Pandas way od doing it.
df = pd.read_json("./data/data.json", typ="frame")
attribute_values = []
for item in df["data"]:
data = {
"id": item["id"],
"timestamp": item["timestamp"],
"npi_id": item["npi_id"],
}
row = [data["id"], data["timestamp"], data["npi_id"]]
attribute_values.append(row)
df = pd.read_json("./data/data.json", typ="frame")
Gives me the following:
data
0 {'id': 2674, 'timestamp': '2022-07-01', 'vendo...
1 {'id': 2675, 'timestamp': '2022-07-01', 'vendo...
2 {'id': 6789, 'timestamp': '2022-07-01', 'vendo...
3 {'id': 2321, 'timestamp': '2022-07-01', 'vendo...
4 {'id': 5678, 'timestamp': '2022-07-01', 'vendo...
5 {'id': 1234, 'timestamp': '2022-07-01', 'vendo...
Then using
for item in df["data"]:
data = {
"id": item["id"],
"timestamp": item["timestamp"],
"npi_id": item["npi_id"],
}
row = [data["id"], data["timestamp"], data["npi_id"]]
attribute_values.append(row)
gives me
[[2674, '2022-07-01', '1003883562'], [2675, '2022-07-01', '1043283849'], [6789, '2022-07-01', '1043283849'], [2321, '2022-07-01', '1043283849'], [5678, '2022-07-01', '1043283849'], [1234, '2022-07-01', '1043283849']]
Which is just id
timestamp
npi_id
from the dataset.
CodePudding user response:
You can use .str
out = df['data'].str['id']
print(out)
0 2674
Name: data, dtype: int64
Or use pd.json_normalize
import json
with open('data.json', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.json_normalize(data['data'])
print(df)
id timestamp vendor_name npi_id trigger_description echo_order duration image_url name triggers
0 2674 2022-07-01 test 1234567890 test 0 0 https://.... Banner Ad Name 1 [test ]