Home > database >  Normalize JSON: map classes/array to columns
Normalize JSON: map classes/array to columns

Time:04-16

I would like to transform a JSON with the following input:


{
  "039.png" : [
    "finding1"
  ],
  "006.png" : [
    "finding1",
    "finding2"
  ],
  "012.png" : [
    "nofinding"
  ]}

into a data frame with the following layout:

|  image  | finding1 | finding2 | nofinding |

| ------- | -------- | -------- | --------- |
| 039.png |   true   |   false  |   false   |
| 006.png |   true   |   true   |   false   |
| 012.png |   false  |   false  |   true    |

By now I have tried:

import pandas as pd
import json
from pandas.io.json import json_normalize

with open('file.json') as data_file:    
  data = json.load(data_file)
df = pd.json_normalize(data)
df = df.transpose()
df.head()

However, this results in:

039.png     [finding1]
006.png     [finding1, finding2]
012.png     [nofinding]

In addition, I would also like to program the return path to be able to create the given JSON format from the desired table format

many thanks in advance

CodePudding user response:

It's straightforward with get_dummies:

data = {
  "039.png" : [
    "finding1"
  ],
  "006.png" : [
    "finding1",
    "finding2"
  ],
  "012.png" : [
    "nofinding"
  ]}

df = pd.DataFrame.from_dict(data, orient='index')
print(pd.get_dummies(df, prefix='', prefix_sep='', dtype=bool))

Output:

         finding1  nofinding  finding2
039.png      True      False     False
006.png      True      False      True
012.png     False       True     False

CodePudding user response:

Based on your pd.json_normalize, you can explode then pd.crosstab

df = pd.json_normalize(data).T.explode(0)
out = pd.crosstab(df.index, df[0]).eq(1).rename_axis('image').reset_index().rename_axis(None,axis=1)
     image  finding1  finding2  nofinding
0  006.png      True      True      False
1  012.png     False     False       True
2  039.png      True     False      False
  • Related