I have multiple pandas.DataFrame
s objects that I would like to dump in a single json string.
Let's say that I have the two following dfs:
import pandas as pd
import json
df1 = pd.DataFrame(
[["a", "b"], ["c", "d"]],
index=["row 1", "row 2"],
columns=["col 1", "col 2"],
)
df2 = pd.DataFrame(
[["A", "B", "C"], ["D", "E", "F"]],
index=["Row 1", "Row 2"],
columns=["Col 1", "Col 2", "Col3"],
)
I want to export them in a single json string as:
{"df1":
{"columns":
["col 1", "col 2"],
"index":
["row 1", "row 2"],
"data":
[["a", "b"], ["c", "d"]]
},
"df2":
{"columns":
["Col 1", "Col 2", "Col3"],
"index":
["Row 1", "Row 2"],
"data":
[["A", "B", "C"], ["D", "E", "F"]]
}
}
My tries
Try 1
If I create a single dictionary in python containing both dataframes and then I pass it to json.dumps
, I receive a TypeError
since json
does not know how to serialize a pandas.DafaFrame
:
out = {'df1': df1,
'df2': df2
}
out = json.dumps(out) #<-- Raises TypeError: Object of type DataFrame is not JSON serializable
Try 2
If I serialize each df individually using the pandas.DataFrame.to_json
method as
df1_jsonstr = df1.to_json(orient='split')
df2_jsonstr = df2.to_json(orient='split')
out = {'df1': df1_jsonstr,
'df2': df2_jsonstr
}
out = json.dumps(out)
The output looks like:
{"df1": "{\"columns\":[\"col 1\",\"col 2\"],\"index\":[\"row 1\",\"row 2\"],\"data\":[[\"a\",\"b\"],[\"c\",\"d\"]]}", "df2": "{\"columns\":[\"Col 1\",\"Col 2\",\"Col3\"],\"index\":[\"Row 1\",\"Row 2\"],\"data\":[[\"A\",\"B\",\"C\"],[\"D\",\"E\",\"F\"]]}"}
Both strings generated by pandas.DataFrame.to_json
have been escaped and quoted. When I try to load them back doing data = json.loads(out)
, the two dataframes are considered (correctly) strings and are loaded as such.
Try 3
The only way I found to generate the json file I want is to dump the dataframe to json using pandas.DataFrame.to_json
, then load them back into dictionaries with json.loads
and then dump them again together. This looks like:
df1_json = df1.to_json(orient='split')
df2_json = df2.to_json(orient='split')
out = {'df1': json.loads(df1_json),
'df2': json.loads(df2_json)
}
out = json.dumps(out)
data = json.loads(out)
This works, but if df1
and df2
have hundreds of thousands or millions of lines, you can understand that this performs the conversion three times (pd.DataFrame
-> str
-> dict
-> str
) becoming inefficient.
Question
Is there a way to achieve the same result as my last example, but performing a single conversion?
CodePudding user response:
You can build your own JSON string using the JSON equivalents of the two dataframes:
out = '{ "df1" : ' df1.to_json(orient='split') ', "df2": ' df2.to_json(orient='split') '}'
Check that it is valid JSON:
json.loads(out)
Output:
{'df1': {'columns': ['col 1', 'col 2'], 'index': ['row 1', 'row 2'], 'data': [['a', 'b'], ['c', 'd']]}, 'df2': {'columns': ['Col 1', 'Col 2', 'Col3'], 'index': ['Row 1', 'Row 2'], 'data': [['A', 'B', 'C'], ['D', 'E', 'F']]}}
CodePudding user response:
I think you could do something like:
out = """
{
"df1": """ df1.to_json(orient='split') """,
"df2": """ df2.to_json(orient='split') """
}
"""
or:
df1_json = df1.to_dict()
df2_json = df2.to_dict()
out = {'df1': df1_json
'df2': df2_json
}
out = json.dumps(out)
data = json.loads(out)