Home > Mobile >  Converting json dataset into pandas dataframe
Converting json dataset into pandas dataframe

Time:04-28

I have a large json dataset of the form:

r_json = 

[[{'gasto': 3.47},
  {'interacciones': 2.0},
  {'fecha': 'Tue, 15 Oct 2019 00:00:00 GMT'},
  {'moneda': 'USD'},
  {'id_campania': '6666070697'},
  {'id_conjunto_anuncios': '76484749901'},
  {'id_anuncio': '389396738140'},
  {'visitas_al_50': 862.6176},
  {'visitas_al_75': 651.1752},
  {'visitas_al100': 578.9394},
  {'visitas_al100': 'YouTube'},
  {'country': 'PE'},
  {'brand': 'PILSEN CALLAO'},
  {'objetivo': 'Video Views'},
  {'objetivo': 'Mon, 14 Oct 2019 00:00:00 GMT'},
  {'formato': 'Video'},
  {'nombre_campania': 'LITRAZO'},
  {'impresiones': 2106.0}],
 [{'gasto': 4.49},
  {'interacciones': 3.0},
  {'fecha': 'Fri, 15 Nov 2019 00:00:00 GMT'},
  {'moneda': 'USD'},
  {'id_campania': '213904'},
  {'id_conjunto_anuncios': '1905221'},
  {'id_anuncio': '4460317'},
  {'visitas_al_50': 791.0},
  {'visitas_al_75': 732.0}, (...)

I want to convert it to a pandas dataframe. I have tried:

df = pd.DataFrame (r_json, columns = ['gasto', 'interacciones', 'fecha', 'moneda', 'id_campania', 
                                      'id_conjunto_anuncios', 'id_anuncio', 'visitas_al_50', 'visitas_al_75',
                                      'visitas_al100', 'visitas_al100_2', 'country', 'brand', 'objetivo', 'objetivo_2',
                                      'formato', 'nombre_campania', 'impresiones'])

But I am getting a dataframe that seems to contain dictionaries:

    gasto   interacciones   fecha   moneda  id_campania     id_conjunto_anuncios    id_anuncio  visitas_al_50   visitas_al_75   visitas_al100   visitas_al100_2     country     brand   objetivo    objetivo_2  formato     nombre_campania     impresiones
0   {'gasto': 3.47}     {'interacciones': 2.0}  {'fecha': 'Tue, 15 Oct 2019 00:00:00 GMT'}  {'moneda': 'USD'}   {'id_campania': '6666070697'}   {'id_conjunto_anuncios': '76484749901'}     {'id_anuncio': '389396738140'}  {'visitas_al_50': 862.6176}     {'visitas_al_75': 651.1752}     {'visitas_al100': 578.9394}     {'visitas_al100': 'YouTube'}    {'country': 'PE'}   {'brand': 'PILSEN CALLAO'}  {'objetivo': 'Video Views'}     {'objetivo': 'Mon, 14 Oct 2019 00:00:00 GMT'}   {'formato': 'Video'}    {'nombre_campania': 'LITRAZO'}  {'impresiones': 2106.0}
1   {'gasto': 4.49}     {'interacciones': 3.0}  {'fecha': 'Fri, 15 Nov 2019 00:00:00 GMT'}  {'moneda': 'USD'}   {'id_campania': '213904'}   {'id_conjunto_anuncios': '1905221'}     {'id_anuncio': '4460317'}   {'visitas_al_50': 791.0}    {'visitas_al_75': 732.0}    {'visitas_al100': 699.0}    {'visitas_al100': 'Oath'}   {'country': 'PE'}   {'brand': 'PILSEN CALLAO'}  {'objetivo': 'Video Views'}     {'objetivo': 'Wed, 13 Nov 2019 00:00:00 GMT'}   {'formato': 'Video'}    {'nombre_campania': 'LITRAZO'}  {'impresiones': 832.0}

Screenshot of dataframe

I have tried many things but haven't succeeded. Could anyone give me some guidance? Thanks in advance.

CodePudding user response:

You need to manipulate your JSON a bit before calling the DataFrame constructor:

tmp = [
    dict(kv for dict_ in item for kv in dict_.items())
    for item in r_json
]
df = pd.DataFrame(tmp)

Result:

   gasto  interacciones                          fecha moneda id_campania id_conjunto_anuncios    id_anuncio  visitas_al_50  visitas_al_75 visitas_al100 country          brand                       objetivo formato nombre_campania  impresiones
0   3.47            2.0  Tue, 15 Oct 2019 00:00:00 GMT    USD  6666070697          76484749901  389396738140       862.6176       651.1752       YouTube      PE  PILSEN CALLAO  Mon, 14 Oct 2019 00:00:00 GMT   Video         LITRAZO       2106.0
1   4.49            3.0  Fri, 15 Nov 2019 00:00:00 GMT    USD      213904              1905221       4460317       791.0000       732.0000           NaN     NaN            NaN                            NaN     NaN             NaN          NaN

How it works

If you pass a list of lists (i.e. 2D list) to the DataFrame constructor, it assumes that the elements are stored in row-by-column manner:

r_json = [
    [
        {'gasto': 3.47},        # first row, first column = {'gasto': 3.47}
        {'interacciones': 2.0}  # first row, second column = {'interacciones': 2.0}
    ],
    [
        {'gasto': 4.49},        # second row, first column = {'gasto': 4.49}
        {'interacciones': 3.0}  # second row, second column = {'interacciones': 3.0}
    ]
]
df = pd.DataFrame(r_json)

If you pass in a list of dictionaries instead, it assumes each item represents a row and each key-value pair represents a column:

tmp = [
    {
        'gasto': 3.47,          # first row, column gasto = 3.47
        'interacciones': 2.0    # first row, column interacciones = 2.0
    },
    {
        'gasto': 4.49,
        'interacciones': 3.0
    }
]
df = pd.DataFrame(tmp)

So we need to convert the original JSON to the second form by merging the key-value pairs:

dict1 = dict([('gasto', 3.47), ('interacciones', 2.0)])
dict2 = dict([('gasto', 4.49), ('interacciones', 3.0)])
tmp = [dict1, dict2]

The list comprehension is just to apply that transformation on the whole r_json list.

  • Related