I am trying to write CSV files into elasticsearch database, but first I want to pass it as json and I keep getting this error, and I don't know how to fix it...
Here is the code below
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
import pandas as pd
df = pd.read_csv('Data/FINAL_CORD_DATA_0.csv')
dicts = df.to_dict('records')
final_dicts = []
for each in dicts:
tmp = {}
tmp['text'] = each.pop('body_text')
tmp['meta'] = each
final_dicts.append(tmp)
Here is the error message I receive when I run the last cell
KeyError Traceback (most recent call last)
<ipython-input-13-e5e7b4b7ff5a> in <module>
2 for each in dicts:
3 tmp = {}
----> 4 tmp['text'] = each.pop('body_text')
5 tmp['meta'] = each
6 final_dicts.append(tmp)
KeyError: 'body_text'
CodePudding user response:
to_dict()
in pandas means that usually NaN-Values in the original DataFrame result in the corresponding dicts key-value-pair not being created at all. I could imagine that parts of the DataFrame contain NaNs (or empty strings that were auto-converted to NaNs), so some of the dicts might not have a key-value-pair for 'body_text'
.
You can catch that case e.g. by filling in an empty string for those dicts like this:
for each in dicts:
tmp = {}
if 'body_text' in each:
tmp['text'] = each.pop('body_text')
else:
tmp['text'] = ""
tmp['meta'] = each
final_dicts.append(tmp)