I have a nested JSON-file that looks like this:
[
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
}
],
"NumberOfPoints": 1,
}
]
As you can see, the list of this JSON-file consists of two dictionaries that each contains another list (= "Connections") that consists of at least one dictionary. In each dictionary of this JSON-file, I want to select all keys named "Quantity" to make a calculation with its value (so in the example code above, I want to calculate that there are 5 Quantities in total).
With the code below, I created a simple dataframe in Pandas to make this calculation :
import json
import pandas as pd
df = pd.read_json("chargingStations.json")
dfConnections = df["Connections"]
dfConnections = pd.json_normalize(dfConnections)
print(dfConnections)
Ideally, I want to get the "Quantity" key from each dictionary, so that I can make a dataframe like this (where each item has its own row):
However, I am not sure if this is the best way to make my calculation. I tried to get each value of the "Quantity" key by typing dfConnections = dfConnections.get("Quantity")
, but that results in None
. So: how can I get the value of each "Quantity" key in each dictionary to make my calculation?
CodePudding user response:
If data
is parsed Json data from your question, you can do:
df = pd.DataFrame(
[
{
i: sum(dd["Quantity"] for dd in d["Connections"])
for i, d in enumerate(data)
}
]
)
print(df)
Prints:
0 | 1 | |
---|---|---|
0 | 2 | 3 |
CodePudding user response:
you can use json_normalize():
import pandas as pd
true=True
a=[
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 2
}
],
"NumberOfPoints": 1,
},
{
"IsRecentlyVerified": true,
"AddressInfo": {
"Town": "Haarlem",
},
"Connections": [
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
},
{
"PowerKW": 17,
"Quantity": 1
}
],
"NumberOfPoints": 1,
}
]
After reading the data, we use a group by function to index numbers and get the sum of the quantity.
df=pd.json_normalize(a)
df=df.explode('Connections')
df=df.join(pd.json_normalize(df.pop('Connections')))
df=df.reset_index().groupby('index')['Quantity'].sum().to_frame()
print(df)
'''
index Quantity
0 0 2
1 1 3
'''
#or another format
df2=df.T
print(df2)
'''
0 1
Quantity 2 3
'''