For example the JSON is :
{
"samples": [
{
"sample_id": "A2434",
"start": "1664729482",
"end": "1664729482",
"parts": [
{
"name": "123",
"start": "1664736682",
"end": "1618688700",
"fail": ""
}
]
}
]
}
I want the df and columns like below :
sample_id,start,end,parts.name,parts.start,parts.end,parts.fail
CodePudding user response:
Using json.normalize
df = pd.json_normalize(
data=data["samples"],
record_path="parts",
record_prefix="parts.",
meta=["sample_id", "start", "end"]
).drop(columns="parts.name")
print(df)
parts.start parts.end parts.fail sample_id start end
0 1664736682 1618688700 A2434 1664729482 1664729482
CodePudding user response:
Use pd.read_json()
to unpack the JSON to a Dataframe. You can then use pd.json_normalize()
as required on the generated columns to get the more nested data out.
CodePudding user response:
You can use df.explode
to get the separate item in a list on a row, and use agg(pd.Series)
to convert key, value pairs in a dictionary to columns and rows, respectively.
df = pd.DataFrame(data['samples'])
df[['parts.start','parts.end','parts.fail']] = df['parts'].explode().agg(pd.Series)
df.drop('parts', axis=1, inplace=True)
Output:
sample_id start end parts.name parts.start parts.end parts.fail
0 A2434 1664729482 1664729482 123 1664736682 1618688700