Home > Enterprise >  Export multiple pandas dataframe in a single json object
Export multiple pandas dataframe in a single json object

Time:03-29

I have multiple pandas.DataFrames objects that I would like to dump in a single json string.

Let's say that I have the two following dfs:

import pandas as pd
import json

df1 = pd.DataFrame(
    [["a", "b"], ["c", "d"]],
    index=["row 1", "row 2"],
    columns=["col 1", "col 2"],
    )

df2 = pd.DataFrame(
    [["A", "B", "C"], ["D", "E", "F"]],
    index=["Row 1", "Row 2"],
    columns=["Col 1", "Col 2", "Col3"],
    )

I want to export them in a single json string as:

{"df1":
    {"columns":
        ["col 1", "col 2"],
    "index":
        ["row 1", "row 2"],
    "data":
        [["a", "b"], ["c", "d"]]
    },
"df2":
    {"columns":
        ["Col 1", "Col 2", "Col3"],
    "index":
        ["Row 1", "Row 2"],
    "data":
        [["A", "B", "C"], ["D", "E", "F"]]
    }
}

My tries

Try 1

If I create a single dictionary in python containing both dataframes and then I pass it to json.dumps, I receive a TypeError since json does not know how to serialize a pandas.DafaFrame:

out = {'df1': df1,
       'df2': df2
       }
out = json.dumps(out) #<-- Raises TypeError: Object of type DataFrame is not JSON serializable

Try 2

If I serialize each df individually using the pandas.DataFrame.to_json method as

df1_jsonstr = df1.to_json(orient='split')
df2_jsonstr = df2.to_json(orient='split')

out = {'df1': df1_jsonstr,
       'df2': df2_jsonstr
       }
out  = json.dumps(out)

The output looks like:

{"df1": "{\"columns\":[\"col 1\",\"col 2\"],\"index\":[\"row 1\",\"row 2\"],\"data\":[[\"a\",\"b\"],[\"c\",\"d\"]]}", "df2": "{\"columns\":[\"Col 1\",\"Col 2\",\"Col3\"],\"index\":[\"Row 1\",\"Row 2\"],\"data\":[[\"A\",\"B\",\"C\"],[\"D\",\"E\",\"F\"]]}"}

Both strings generated by pandas.DataFrame.to_json have been escaped and quoted. When I try to load them back doing data = json.loads(out), the two dataframes are considered (correctly) strings and are loaded as such.

Try 3

The only way I found to generate the json file I want is to dump the dataframe to json using pandas.DataFrame.to_json, then load them back into dictionaries with json.loads and then dump them again together. This looks like:

df1_json = df1.to_json(orient='split')
df2_json = df2.to_json(orient='split')

out = {'df1': json.loads(df1_json),
       'df2': json.loads(df2_json)
       }
out = json.dumps(out)
data = json.loads(out)

This works, but if df1 and df2 have hundreds of thousands or millions of lines, you can understand that this performs the conversion three times (pd.DataFrame -> str -> dict -> str) becoming inefficient.

Question

Is there a way to achieve the same result as my last example, but performing a single conversion?

CodePudding user response:

You can build your own JSON string using the JSON equivalents of the two dataframes:

out = '{ "df1" : '   df1.to_json(orient='split')   ', "df2": '   df2.to_json(orient='split')   '}'

Check that it is valid JSON:

json.loads(out)

Output:

{'df1': {'columns': ['col 1', 'col 2'], 'index': ['row 1', 'row 2'], 'data': [['a', 'b'], ['c', 'd']]}, 'df2': {'columns': ['Col 1', 'Col 2', 'Col3'], 'index': ['Row 1', 'Row 2'], 'data': [['A', 'B', 'C'], ['D', 'E', 'F']]}}

CodePudding user response:

I think you could do something like:

out = """
       {
          "df1": """   df1.to_json(orient='split')   """,
          "df2": """   df2.to_json(orient='split')   """
       }
"""

or:

df1_json = df1.to_dict()
df2_json = df2.to_dict()

out = {'df1': df1_json
       'df2': df2_json
       }
out = json.dumps(out)
data = json.loads(out)
  • Related