Home > other >  How to turn a dataframe to jsonl with similar index for every line?
How to turn a dataframe to jsonl with similar index for every line?

Time:11-07

I have dataframe that has "input" as the index of all of the rows. Thousands of them.

  df1 = 
  index   item name   item quantity
  input       apple         4
  input       orange        3
  input       lemon         6  

I need to turn it to a jsonl file looking like this. I need it looking like this because of the requirement in Shopify GraphQL Admin API Bulk import.

{ "input": { "item name": "apple", "item quantity": "4"}}
{ "input": { "item name": "orange", "item quantity": "3"}}
{ "input": { "item name": "lemon", "item quantity": "6"}}

I can use df1.to_json(orient='records', lines=True) to convert it into jsonl without the index to look like this

{ "item name": "apple", "item quantity": "4"}}
{ "item name": "orange", "item quantity": "3"}}
{ "item name": "lemon", "item quantity": "6"}}

But i will need to add the key "input" to the front which i dont know how to do it. I am new to json.

I also tried df1.to_json(orient="index") but it gives me an error ValueError: DataFrame index must be unique for orient='index' indicating that every index must be unique.

Any help is appreciated. Cheers

CodePudding user response:

Your required output is:

{ "input": { "item name": "apple", "item quantity": "4"}}
{ "input": { "item name": "orange", "item quantity": "3"}}
{ "input": { "item name": "lemon", "item quantity": "6"}}

orient='index' won't work in this case because index are not unique. The above answer looks great. Your expected output is seperate JSONs.

This is a workaround, According to your required format incase if you need the same format itself

import pandas as pd
import json
data=[['apple',4],['orange','3'],['lemon',6]]
df=pd.DataFrame(data,columns=['item name','item quantity'])
json_as_str=df.to_json(orient="index")
json_value=json.loads(json_as_str)
string_formatted=[]
for key,val in json_value.items():
    string_formatted.append("{'input':%s}" %val)
for i in string_formatted:
    i=i.replace("'",'"')
    print(i)

Output:

{"input":{"item name": "apple", "item quantity": 4}}
{"input":{"item name": "orange", "item quantity": 3}}
{"input":{"item name": "lemon", "item quantity": 6}}

CodePudding user response:

Assuming your dataframe to be like this:

In [93]: df
Out[93]: 
    index item_name  item_quantity
0  input1     apple              4
1  input2    orange              3
2  input3     lemon              6

You can use df.set_index with df.T and to_dict:

In [95]: df.set_index('index').T.to_dict()
Out[95]: 
{'input1': {'item_name': 'apple', 'item_quantity': 4},
 'input2': {'item_name': 'orange', 'item_quantity': 3},
 'input3': {'item_name': 'lemon', 'item_quantity': 6}}

Or, if you want json, do this:

In [96]: df.set_index('index').T.to_json()
Out[96]: '{"input1":{"item_name":"apple","item_quantity":4},"input2":{"item_name":"orange","item_quantity":3},"input3":{"item_name":"lemon","item_quantity":6}}'
  • Related