I have a nested dictionary, whose first level keys are [0, 1, 2...] and the corresponding values of each key are of the form:
{
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
}
I want to make a pandas dataframe of the form:
Geometry Type Properties
Type Coordinates State Code Name Zone Address
0 Point [..., ...] Features Rajasthan BDHL ... ... ...
1
2
I am not able to understand the examples over the net about multi indexing/nested dataframe/pivoting. None of them seem to take the first level keys as the primary index in the required dataframe.
How do I get from the data I have, to making it into this formatted dataframe?
CodePudding user response:
I would suggest to create columns as "geometry_type"
, "geometry_coord"
, etc.. in order to differentiate theses columns from the column that you would name "type"
. In other words, using your first key as a prefix, and the subkey as the name, and hence creating a new name. And after, just parse and fill your Dataframe like that:
import json
j = json.loads("your_json.json")
df = pd.DataFrame(columns=["geometry_type", "geometry_coord", ... ])
for k, v in j.items():
if k == "geometry":
df = df.append({
"geometry_type": v.get("type"),
"geometry_coord": v.get("coordinates")
}, ignore_index=True)
...
Your output could then looks like this :
geometry_type geometry_coord ...
0 [75.4516454, 27.2520587] NaN ...
PS : And if you really want to go for your initial option, you could check here : Giving a column multiple indexes/headers
CodePudding user response:
I suppose you have a list of nested dictionaries.
Use json_normalize
to read json data and split current column index into 2 part using str.partition
:
import pandas as pd
import json
data = json.load(open('data.json'))
df = pd.json_normalize(data)
df.columns = df.columns.str.partition('.', expand=True).droplevel(level=1)
Output:
>>> df.columns
MultiIndex([( 'type', ''),
( 'geometry', 'type'),
( 'geometry', 'coordinates'),
('properties', 'state'),
('properties', 'code'),
('properties', 'name'),
('properties', 'zone'),
('properties', 'address')],
)
>>> df
type geometry properties
type coordinates state code name zone address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan
CodePudding user response:
You can use pd.json_normalize()
to normalize the nested dictionary into a dataframe df
.
Then, split the column names with dots into multi-index with Index.str.split
on df.columns
with parameter expand=True
, as follows:
Step 1: Normalize nested dict into a dataframe
j = {
"geometry": {
"type": "Point",
"coordinates": [75.4516454, 27.2520587]
},
"type": "Feature",
"properties": {
"state": "Rajasthan",
"code": "BDHL",
"name": "Badhal",
"zone": "NWR",
"address": "Kishangarh Renwal, Rajasthan"
}
}
df = pd.json_normalize(j)
Step 1 Result:
print(df)
type geometry.type geometry.coordinates properties.state properties.code properties.name properties.zone properties.address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan
Step 2: Create Multi-index column labels
df.columns = df.columns.str.split('.', expand=True)
Step 2 (Final) Result:
print(df)
type geometry properties
NaN type coordinates state code name zone address
0 Feature Point [75.4516454, 27.2520587] Rajasthan BDHL Badhal NWR Kishangarh Renwal, Rajasthan