I have this code to iterate through a json file. The user specifies tiers to be extracted, the names of which are then saved in inputLabels, and this for loop extracts the data from those tiers:
with open(inputfilename, 'r', encoding='utf8', newline='\r\n') as f:
data = json.load(f)
for line in data:
if line['label'] in inputLabels:
elements = [(e['body']['value']).replace(" ", "_") "\t" for e in line['first']['items']]
outputData.append(elements)
I wrote this code a year ago and have run it multiple times since then with no issues, but running it today I received a TypeError.
if line['label'] in inputLabels:
TypeError: string indices must be integers
I don't understand why my code was able to work before if this is a true TypeError. Why is this only a problem in the code now, and how can I fix it?
EDIT: Pasted part of the json:
{
"contains": [
{
"total": 118,
"generated": "ELAN Multimedia Annotator 6.2",
"id": "xxx",
"label": "BAR001_TEXT",
"type": "AnnotationCollection",
"@context": "http://www.w3.org/ns/ldp.jsonld",
"first": {
"startIndex": "0",
"id": "xxx",
"type": "AnnotationPage",
"items": [
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": ""
},
"@context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": "Dobar vam"
},
"@context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
{
"id": "xxx",
"type": "Annotation",
"body": {
"purpose": "transcribing",
"format": "text/plain",
"language": "",
"type": "TextualBody",
"value": "Je"
},
"@context": "http://www.w3.org/ns/anno.jsonld",
"target": {
"format": "audio/x-wav",
"id": "xxx",
"type": "Audio"
}
},
CodePudding user response:
Your code would probably work if you replaced for line in data:
with for line in data['contains']
Maybe the JSON schema didn't have the "contains"
level previously.
CodePudding user response:
A pretty pythonic approach would be using exceptions:
with open(inputfilename, 'r', encoding='utf8', newline='\r\n') as f:
data = json.load(f)
for line in data:
try:
if line['label'] in inputLabels:
elements = [(e['body']['value']).replace(" ", "_") "\t" for e in line['first']['items']]
outputData.append(elements)
except Exception as e:
print( f"{type(e)} : {e} when trying to use {line}")
Your code will run through and give you a hint about what failed
CodePudding user response:
Turns out it was a pretty simple fix. All of the JSON file was in a container (look at the portion I posted in the question, it's the second line, "contains":). I was able to just remove that container and its open/closing brackets and the code ran successfully after that. Thanks all for your help.