Sorry to bother u guys again. Below is my Json output (which I have put into a variable for convenience) from Azure form recognizer extract :
Service_Table = {
"valueArray": [{"type": "object",
"valueObject": {
"Minimum Connections": {
"type": "string",
"valueString": "290",
"content": "290"},
"Plan name": {
"type": "string",
"valueString": "Endless Plan",
"content": "Endless Plan"}}},
{"type": "object",
"valueObject": {
"Plan name": {
"type": "string",
"valueString": "Corporate Plan",
"content": "Corporate Plan"}}}]}
As you can see it is a nested list containing key value pairs. I need to get the dataframe out of this as follows:
I've tried many things such as pd.DataFrame(ServiceTable), pd.json_normalize(ServiceTable) other list comprehension methods etc but not able to get what I need, because the key value pairs are uneven (inside the first "valueOject" you can see two keys while in the second one there is only one key). Any help here is really appreciated and thank you for your time.
For more clarity the key "valueObject" could be accessed by calling Service['valueArray'][i] where i is the index (there are two keys with name valueObject)
CodePudding user response:
You can use pandas.json_normalize
:
df = pd.json_normalize(Service_Table['valueArray'])
df = df.loc[:,df.columns.str.endswith('content')]
df.columns = range(df.columns.size)
df.insert(0, 'S.no', df.index 1)
df = df.rename(columns= {0: 'Minimum connections', 1: 'Plan name'})
# Output :
print(df)
S.no Minimum connections Plan name
0 1 290 Endless Plan
1 2 NaN Corporate Plan
CodePudding user response:
I would prepare a dictionary first:
content = [
(values.get('Minimum Connections', dict()).get('content'),
values.get('Plan name', dict()).get('content'))
for xs in Service_Table.get('valueArray', [])
for key, values in xs.items()
if key == 'valueObject']
rows = {i: (i 1,) content[i] for i in range(len(content))}
Then create a DataFrame from with from_dict
:
(pd.DataFrame
.from_dict(rows,
orient='index',
columns=['S.no', 'Minimum Connections', 'Plan name'])
.astype({'Minimum Connections': 'Int32'}))
The result table:
S.no Minimum Connections Plan name
0 1 290 Endless Plan
1 2 <NA> Corporate Plan
Note: I would suggest to remove spaces from the column name.