How can i fix this error when converting csv to json-CodePudding

I am trying to write CSV files into elasticsearch database, but first I want to pass it as json and I keep getting this error, and I don't know how to fix it...

Here is the code below

from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

import pandas as pd
df = pd.read_csv('Data/FINAL_CORD_DATA_0.csv')

dicts = df.to_dict('records')

final_dicts = []
for each in dicts:
    tmp = {}
    tmp['text'] = each.pop('body_text')
    tmp['meta'] = each
    final_dicts.append(tmp)

Here is the error message I receive when I run the last cell

KeyError                                  Traceback (most recent call last)
<ipython-input-13-e5e7b4b7ff5a> in <module>
      2 for each in dicts:
      3     tmp = {}
----> 4     tmp['text'] = each.pop('body_text')
      5     tmp['meta'] = each
      6     final_dicts.append(tmp)

KeyError: 'body_text'

CodePudding user response：

to_dict() in pandas means that usually NaN-Values in the original DataFrame result in the corresponding dicts key-value-pair not being created at all. I could imagine that parts of the DataFrame contain NaNs (or empty strings that were auto-converted to NaNs), so some of the dicts might not have a key-value-pair for 'body_text'.

You can catch that case e.g. by filling in an empty string for those dicts like this:

for each in dicts:
    tmp = {}
    if 'body_text' in each:
        tmp['text'] = each.pop('body_text')
    else:
        tmp['text'] = ""
    tmp['meta'] = each
    final_dicts.append(tmp)