I have a following test code
import numpy as np
import pandas as pd
x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'], ['103', 'title3', 'body3', 'answer3']])
df = pd.DataFrame(x, columns = ['col1', 'col2', 'col3', 'col4'])
df.head()
| col1 |col2 |col3 |col4
---------------------------------
0| 101 |title1 |body1 |answer1
1| 102 |title2 |body2 |answer2
2| 103 |title3 |body3 |answer3
I want a json output of this format
{"index": {"_id": "col1"}}
{"title": "col2", "body": "col3", "answer": "col4"}
Like
{"index": {"_id": "101"}}
{"title": "title1", "body": "body1", "answer": "answer1"}
For every rows in the dataframe.
Could anyone shine some light?
CodePudding user response:
It's not dificult, I recommend you read at documentations of library.
import numpy as np
import pandas as pd
x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'],
['103', 'title3', 'body3', 'answer3']])
df = pd.DataFrame(x, columns=['col1', 'col2', 'col3', 'col4'])
djson=df.to_json()
print(djson)
CodePudding user response:
Raphael's answer is probably the best approach to your problem (In case you want it row-wise, you can go ahead and do df.transpose
before the df.to_json()
); however, if what you really want is to define the json string exactly as you have there (That would be different from column names), you can do something like this:
jsons = []
json_string = '{{"index": {{"_id": "{index}"}}}}{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'
for row in range(len(df)):
data = [df.iloc[row, x] for x in range(len(df.columns))]
jsons.append(json_string.format(index=data[0],title=data[1],body=data[2], answer=data[3]))
And then get the list of json strings in the jsons
variable. In case you need to convert into object, you can use the json library, and use json.loads()
CodePudding user response:
Create a json string of how you want your output to look and format it on the dataframe.
For curl operations (while using ElasticSearch and Kibana), make sure to split the json string below
json_string = '{{"index": {{"_id": "{index}"}}}}{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'
will give output in json as
{"index": {"_id": "101"}}{"title": "title1", "body": "body1", "answer": "answer1"}
For output to be like this
{"index": {"_id": "101"}}
{"title": "title1", "body": "body1", "answer": "answer1"}
use
json_string1 = '{{"index": {{"_id": "{index}"}}}}'
json_string2 = '{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'
Here is the complete code (Convert json list to json object by json.dumps() followed by json.loads()
import numpy as np
import pandas as pd
import json
x = np.array([['101', 'title1', 'body1', 'answer1'], ['102', 'title2', 'body2', 'answer2'], ['103', 'title3', 'body3', 'answer3']])
df = pd.DataFrame(x, columns = ['col1', 'col2', 'col3', 'col4'])
jsons = []
json_string1 = '{{"index": {{"_id": "{index}"}}}}'
json_string2 = '{{"title": "{title}", "body": "{body}", "answer": "{answer}"}}'
for row in range(len(df)):
data = [df.iloc[row, x] for x in range(len(df.columns))]
jsons.append(json_string1.format(index=data[0]))
jsons.append(json_string2.format(title=data[1],body=data[2], answer=data[3]))
with open("filename.json", "w") as fp:
for i in range(len(jsons)):
x = json.dumps(jsons[i])
y = json.loads(x)
fp.write("%s\n" % y)