I have the following csv file with a json column data
. The json data is a little bit complicated and I can't unwrap it the way I want. My goal is to get some data from configuration
into columns and the corrosponding values from readings
into the row as value. I found something here but I can't go deeper in the nested data to extract the id and the value with the solutions customized to my data.
The data I have:
system;datetime;data;
ACD;19.10.2021 11:24:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:23:00.000","end":"2021-10-19 11:24:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":8}}],"id":1},{"readings":[{"reading":{"count":2}}],"id":2}]}};
ACD;19.10.2021 11:25:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:24:00.000","end":"2021-10-19 11:25:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":7}}],"id":1},{"readings":[{"reading":{"count":4}}],"id":2}]}};
What I want:
system;datetime;data;DD51;VV22;
ACD;19.10.2021 11:24:00,000;8;2;
ACD;19.10.2021 11:25:00,000;7;4;
CodePudding user response:
If your_file.csv
contains:
system;datetime;data;
ACD;19.10.2021 11:24:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:23:00.000","end":"2021-10-19 11:24:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":8}}],"id":1},{"readings":[{"reading":{"count":2}}],"id":2}]}};
ACD;19.10.2021 11:25:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:24:00.000","end":"2021-10-19 11:25:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":7}}],"id":1},{"readings":[{"reading":{"count":4}}],"id":2}]}};
Then:
from ast import literal_eval
df = pd.read_csv("your_file.csv", sep=";")[["system", "datetime", "data"]]
df = pd.concat(
[
df,
df.pop("data")
.apply(literal_eval)
.apply(
lambda x: pd.Series(
{
c["name"]: d["readings"][0]["reading"]["count"]
for c, d in zip(
x["payload"]["configuration"], x["payload"]["detectors"]
)
}
)
),
],
axis=1,
)
Prints:
system datetime DD51 VV22
0 ACD 19.10.2021 11:24:00,000 8 2
1 ACD 19.10.2021 11:25:00,000 7 4