Home > Enterprise >  convert text to json with this format
convert text to json with this format

Time:10-07

I need to convert the text file to JSON with this format

'annotations': [{u'image_id': 0, u'caption': u'the man is playing a guitar'},
                    {u'image_id': 0, u'caption': u'a man is playing a guitar'},
                    {u'image_id': 1, u'caption': u'a woman is slicing cucumbers'},
                    {u'image_id': 1, u'caption': u'the woman is slicing cucumbers'},
                    {u'image_id': 1, u'caption': u'a woman is cutting cucumbers'}]
    }

text file as

   image_id 42 caption man is sitting on bench with his head
   image_id 73 caption man is riding motorcycle on the street
   image_id 74 caption cat laying on top of bed next to window

the code is

import json
images = []
with open('1.txt') as f:
    for line in f:
        _, image_id, _, caption = line.split(maxsplit=3)
        images.append({"image_id": int(image_id), "caption": caption})

 with open('r.json', "w") as f:
    json.dump(images, f)

but got in the result file

[{"image_id": 42, "caption": "man is holding an umbrella in the rain\n"}, {"image_id": 73, "caption": "black and white cat sitting on top of car\n"},....] 

as the problem when i tried to read the result file

imgToAnnsRES = {ann['image_id']: [] for ann in datasetRES['annotations']}
TypeError: list indices must be integers or slices, not str

CodePudding user response:

Assuming you got the initial dict:

images = [{u'image_id': 0, u'caption': u'the man is playing a guitar'},
                    {u'image_id': 1, u'caption': u'a man is playing a guitar'},
                    {u'image_id': 2, u'caption': u'a woman is slicing cucumbers'},
                    {u'image_id': 3, u'caption': u'the woman is slicing cucumbers'},
                    {u'image_id': 4, u'caption': u'a woman is cutting cucumbers'}]

We can simply define the datasetRES object as :

datasetRES = {'annotations': images}

Now you can use the following code:

imgToAnnsRES = {ann['image_id']: [] for ann in datasetRES['annotations']}
  • Related