Home > Enterprise >  Read csv with json column which is nested multiple times into dataframe
Read csv with json column which is nested multiple times into dataframe

Time:10-26

I have the following csv file with a json column data. The json data is a little bit complicated and I can't unwrap it the way I want. My goal is to get some data from configuration into columns and the corrosponding values from readings into the row as value. I found something here but I can't go deeper in the nested data to extract the id and the value with the solutions customized to my data.

The data I have:

system;datetime;data;
ACD;19.10.2021 11:24:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:23:00.000","end":"2021-10-19 11:24:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":8}}],"id":1},{"readings":[{"reading":{"count":2}}],"id":2}]}};
ACD;19.10.2021 11:25:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:24:00.000","end":"2021-10-19 11:25:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":7}}],"id":1},{"readings":[{"reading":{"count":4}}],"id":2}]}};

What I want:

system;datetime;data;DD51;VV22;
ACD;19.10.2021 11:24:00,000;8;2;
ACD;19.10.2021 11:25:00,000;7;4;

CodePudding user response:

If your_file.csv contains:

system;datetime;data;
ACD;19.10.2021 11:24:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:23:00.000","end":"2021-10-19 11:24:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":8}}],"id":1},{"readings":[{"reading":{"count":2}}],"id":2}]}};
ACD;19.10.2021 11:25:00,000;{"id":"device1","type":"dataValues","payload":{"deviceId":"device1","start":"2021-10-19 11:24:00.000","end":"2021-10-19 11:25:00.000","configuration":[{"name":"DD51","id":1},{"name":"VV22","id":2}],"detectors":[{"readings":[{"reading":{"count":7}}],"id":1},{"readings":[{"reading":{"count":4}}],"id":2}]}};

Then:

from ast import literal_eval

df = pd.read_csv("your_file.csv", sep=";")[["system", "datetime", "data"]]
df = pd.concat(
    [
        df,
        df.pop("data")
        .apply(literal_eval)
        .apply(
            lambda x: pd.Series(
                {
                    c["name"]: d["readings"][0]["reading"]["count"]
                    for c, d in zip(
                        x["payload"]["configuration"], x["payload"]["detectors"]
                    )
                }
            )
        ),
    ],
    axis=1,
)

Prints:

  system                 datetime  DD51  VV22
0    ACD  19.10.2021 11:24:00,000     8     2
1    ACD  19.10.2021 11:25:00,000     7     4
  • Related