Home > Back-end >  Custom JSON format from dataframe
Custom JSON format from dataframe

Time:10-19

I have a following test code

import numpy as np
import pandas as pd
  
x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'], ['103', 'title3', 'body3', 'answer3']])
   
df = pd.DataFrame(x, columns = ['col1', 'col2', 'col3', 'col4'])
df.head()

 | col1 |col2   |col3   |col4
---------------------------------
0|  101 |title1 |body1  |answer1
1|  102 |title2 |body2  |answer2
2|  103 |title3 |body3  |answer3

I want a json output of this format

{"index": {"_id": "col1"}}
{"title": "col2", "body": "col3", "answer": "col4"}

Like

{"index": {"_id": "101"}}
{"title": "title1", "body": "body1", "answer": "answer1"}

For every rows in the dataframe.

Could anyone shine some light?

CodePudding user response:

It's not dificult, I recommend you read at documentations of library.

import numpy as np
import pandas as pd

x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'],
              ['103', 'title3', 'body3', 'answer3']])

df = pd.DataFrame(x, columns=['col1', 'col2', 'col3', 'col4'])
djson=df.to_json()
print(djson)

CodePudding user response:

Raphael's answer is probably the best approach to your problem (In case you want it row-wise, you can go ahead and do df.transpose before the df.to_json()); however, if what you really want is to define the json string exactly as you have there (That would be different from column names), you can do something like this:

jsons = []
json_string = '{{"index": {{"_id": "{index}"}}}}{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'

for row in range(len(df)):
    data = [df.iloc[row, x] for x in range(len(df.columns))]
    jsons.append(json_string.format(index=data[0],title=data[1],body=data[2], answer=data[3]))

And then get the list of json strings in the jsons variable. In case you need to convert into object, you can use the json library, and use json.loads()

CodePudding user response:

Create a json string of how you want your output to look and format it on the dataframe.

For curl operations (while using ElasticSearch and Kibana), make sure to split the json string below

json_string = '{{"index": {{"_id": "{index}"}}}}{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'

will give output in json as

{"index": {"_id": "101"}}{"title": "title1", "body": "body1", "answer": "answer1"}

For output to be like this

{"index": {"_id": "101"}}
{"title": "title1", "body": "body1", "answer": "answer1"}

use

json_string1 = '{{"index": {{"_id": "{index}"}}}}'
json_string2 = '{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'

Here is the complete code (Convert json list to json object by json.dumps() followed by json.loads()

import numpy as np
import pandas as pd
import json
  
x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'], ['103', 'title3', 'body3', 'answer3']])
   
df = pd.DataFrame(x, columns = ['col1', 'col2', 'col3', 'col4'])

jsons = []
json_string1 = '{{"index": {{"_id": "{index}"}}}}'
json_string2 = '{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'

for row in range(len(df)):
    data = [df.iloc[row, x] for x in range(len(df.columns))]
    jsons.append(json_string1.format(index=data[0]))
    jsons.append(json_string2.format(title=data[1],body=data[2], answer=data[3]))

with open("filename.json", "w") as fp:
    for i in range(len(jsons)):
        x = json.dumps(jsons[i])
        y = json.loads(x)
        fp.write("%s\n" % y)
  • Related