I've got a JSON file that I want to convert to a useful pd.DataFrame
so that I can use this for further modelling. The JSON file looks like this:
json_file = {
"x1": [
{
"a": "XZ12ABC1834",
"b": "J. Doe",
"c": [
{
"Amount": -50,
"Date": "2021-08-15T10:00:00.000Z",
"CategoryId": "abc123",
"CounterParty": "The Farm",
"Description": "some description",
"Counter": "XYZ456AZ",
"Type": "bc"
},{
"Amount": -1,
"Date": "2020-08-15T10:00:00.000Z",
"CategoryId": "cde123",
"CounterParty": "The pool",
"Description": "some other description",
"Counter": "WYZ12",
"Type": "X"
}
]
"a": "XX34XX872",
"b": "J. Doe",
"c": [
{
"Amount": -1,50,
"Date": "2019-05-15T10:00:00.000Z",
"CategoryId": "QWR627",
"CounterParty": "The City",
"Description": "last other description",
"Counter": "QWE123",
"Type": "S"
}
]
}
]
}
And I want to convert this JSON file to a dataframe that somewhat looks like this:
var1 | a | b | amount | date | CategoryID | Counterparty | Description | Counter | Type |
---|---|---|---|---|---|---|---|---|---|
x1 | XZ12ABC1834 | J. Doe | -50 | 2021-08-15T10:00:00.000Z | abc123 | The Farm | some description | XYZ456AZ | bv |
x1 | XZ12ABC1834 | J. Doe | -1 | 2020-08-15T10:00:00.000Z | cde123 | The pool | some other description | WYZZ12 | X |
x1 | XX34XX872 | J. Doe | -1.50 | 2019-05-15T10:00:00.000Z | cde123 | The city | last other description | QWE123 | S |
Hopefully this is enough info for someone to help me with this matter.
CodePudding user response:
I think that something like this should work:
import pandas as pd
result = []
for key in json_file:
df_nested_list = pd.json_normalize(
json_file[key],
record_path =['c'],
meta=['a', 'b']
)
df_nested_list['var1'] = key
result.append(df_nested_list)
pd.concat(result)
For more information look at: https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8