I have the json in this format:
{
"fields": {
"tcidte": {
"mode": "required",
"type": "date",
"format": "%Y%m%d"
},
"tcmcid": {
"mode": "required",
"type": "string"
},
"tcacbr": {
"mode": "required",
"type": "string"
}
}
}
I want it to be in a dataframe format where each of the three field names are separate rows. Where one row has a column(e.g "format") where others are blank should be assumed to be NULL.
I have tried to use the flatten_json function which I found on here, but doesn't work as expected but will still include here:
def flatten_json(nested_json, exclude=['']):
"""Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
exclude: Keys to exclude from output.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude: flatten(x[a], name a '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name str(i) '_')
i = 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
flatten_json_file = pd.DataFrame(flatten_json(nested_json))
pprint.pprint(flatten_json_file)
CodePudding user response:
With
data = {
"fields": {
"tcidte": {
"mode": "required",
"type": "date",
"format": "%Y%m%d"
},
"tcmcid": {
"mode": "required",
"type": "string"
},
"tcacbr": {
"mode": "required",
"type": "string"
}
}
}
this
df = pd.DataFrame(data["fields"].values())
results in
mode type format
0 required date %Y%m%d
1 required string NaN
2 required string NaN
Is that your goal?
CodePudding user response:
One option is the jmespath
library, which can be helpful in scenarios such as this:
# pip install jmespath
import jmespath
import pandas as pd
# think of it like a path
# fields is the first key
# there are sub keys with varying names
# we are only interested in mode, type, format
# hence the * to represent the intermediate key(s)
expression = jmespath.compile('fields.*[mode, type, format]')
pd.DataFrame(expression.search(data), columns = ['mode', 'type', 'format'])
mode type format
0 required date %Y%m%d
1 required string None
2 required string None
jmespath has a host of tools; this however should suffice, and covers scenarios where keys(mode, type, format) are missing in sub dictionaries.
CodePudding user response:
df= pd.read_json('test.json')
df_fields = pd.DataFrame(df['fields'].values.tolist(), index=df.index)
print(df_fields)
output:
mode type format
tcacbr required string NaN
tcidte required date %Y%m%d
tcmcid required string NaN