Home > database >  Convert dataframe to nested jsonl file
Convert dataframe to nested jsonl file

Time:11-12

I need to convert a dataframe to a nested jsonl file in a specific way. I have the dataframe below and I constructed the column "quantity details" myself which means it was 2 saperate columns before.

      id    price     quantity details
0     12     11.00    "quantity" : 4.0, "locationId" : 1234567
1     34     22.00    "quantity" : 7.0, "locationId" : 1234567
2     56     33.00    "quantity" : 13.0, "locationId" : 1234567
3     78     44.00    "quantity" : 2.0, "locationId" : 1234567
4     90     55.00    "quantity" : 3.0, "locationId" : 1234567

I used the code below to add "input" to the front while converting it to jsonl, thanks to this thread How to turn a dataframe to jsonl with similar index for every line?.

json_as_str=df.to_json(orient="index")
json_value=json.loads(json_as_str)
string_formatted=[]
for key,val in json_value.items():
    string_formatted.append("{'input':%s}" %val)
with open("file_name_here.jsonl","a") as fh:
    for i in string_formatted:
        i=i.replace("'",'"')
        fh.write(f"{i}\n")

The jsonl file i get:

{"input":{"id": "12", "price": 11, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "34", "price": 22, "quantity details": ""availableQuantity": 15.0, "locationId": 1234567"}}
{"input":{"id": "56", "price": 33, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "78", "price": 44, "quantity details": ""availableQuantity": 14.0, "locationId": 1234567"}}
{"input":{"id": "90", "price": 55, "quantity details": ""availableQuantity": 10.0, "locationId": 1234567"}}

This is the desired output for the jsonl file:

{"input":{"id": "12", "price": 11, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "34", "price": 22, "quantity details": {"availableQuantity": 15.0, "locationId": 1234567}}}
{"input":{"id": "56", "price": 33, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "78", "price": 44, "quantity details": {"availableQuantity": 14.0, "locationId": 1234567}}}
{"input":{"id": "90", "price": 55, "quantity details": {"availableQuantity": 10.0, "locationId": 1234567}}}

Any help is greatly appreciated. Thank you for reading this

CodePudding user response:

Convert each value in the column "quantity details" to a dictionary then write each line to the file as below:

import pandas as pd
import json

# toy data
df = pd.DataFrame.from_dict(
    {'id': {0: 12, 1: 34, 2: 56, 3: 78, 4: 90}, 'price': {0: 11.0, 1: 22.0, 2: 33.0, 3: 44.0, 4: 55.0},
     'quantity details': {0: '"quantity" : 4.0, "locationId" : 1234567', 1: '"quantity" : 7.0, "locationId" : 1234567',
                          2: '"quantity" : 13.0, "locationId" : 1234567', 3: '"quantity" : 2.0, "locationId" : 1234567',
                          4: '"quantity" : 3.0, "locationId" : 1234567'}})

df["quantity details"] = df["quantity details"].apply("{{{}}}".format).apply(json.loads)

with open("file_name_here.jsonl", "a") as fh:
    for value in df.to_dict(orient="index").values():
        json.dump({"input": value}, fh)
        fh.write("\n")

Output (file_name_here.jsonl)

{"input": {"id": 12, "price": 11.0, "quantity details": {"quantity": 4.0, "locationId": 1234567}}}
{"input": {"id": 34, "price": 22.0, "quantity details": {"quantity": 7.0, "locationId": 1234567}}}
{"input": {"id": 56, "price": 33.0, "quantity details": {"quantity": 13.0, "locationId": 1234567}}}
{"input": {"id": 78, "price": 44.0, "quantity details": {"quantity": 2.0, "locationId": 1234567}}}
{"input": {"id": 90, "price": 55.0, "quantity details": {"quantity": 3.0, "locationId": 1234567}}}
  • Related