I need to create a JSON file with this structure
[{"image_id": 0873, "caption": "clock tower with a clock on top of it"}, {"image_id": 1083, "caption": "two zebras are standing in the grass in the grass"} , .....
from this file which contains
image_id 0873 caption clock tower with a clock on top of it
image_id 1083 caption two zebras are standing in the grass in the grass
image_id 1270 caption baseball player is swinging a bat at the ball
image_id 1436 caption man is sitting on the bed with laptop
how can I start to do that?
CodePudding user response:
Assuming every line looks like:
image_id {image_id} caption {caption}
You can use the str method split(maxsplit=number)
for splitting the line into the four parts.
line = "image_id 0873 caption clock tower with a clock on top of it"
_, image_id, _, caption = line.split(maxsplit=3)
# Now image_id = "0873", caption = "caption clock tower with a clock on top of it"
For iterating over all the file's lines:
images = []
with open(path) as f:
for line in f:
_, image_id, _, caption = line.split(maxsplit=3)
images.append({"image_id": int(image_id), "caption": caption})
For saving a variable into JSON file, you can use the json module:
import json
with open(path_to_save, "w") as f:
json.dump(images, f)
CodePudding user response:
This should the trick:
import json
# get your data
file_lines = open("file_with_data.txt").readlines()
json_data = []
for line in file_lines:
# removing new line char \n
line = line.replace("\n", "")
# split words inside line
splt_line = line.split(" ")
# bullit single dict from line data
small_json = {splt_line[0]: splt_line[1], splt_line[3]: " ".join(splt_line[4:]).strip()}
# add data to your list
json_data.append(small_json)
# now dump List[Dict] to .json file
json.dump( json_data, open("json_dump.json", 'w'),)
CodePudding user response:
Try to use regexp - easy import more complicated patterns. Below is extended version of @Kozubi answer:
import json
import re
json_data = []
with open("test.txt") as f:
pattern = re.compile(r"""image_id\s (?P<image_id>[0-9] )\s
caption\s (?P<caption>.*)$
""", re.X)
for line in f.readlines():
m = pattern.match(line.strip())
if m:
json_data.append({
"image_id": int(m.group('image_id')),
"caption": m.group('caption')
})
print(json.dumps(json_data, indent=4))
json.dump(json_data, open("json_dump.json", 'w'), indent=4)
CodePudding user response:
Go to https://anyconv.com/txt-to-json-converter/ in a web browser. You can use any web browser to convert TXT to JSON. Click Choose File. It's centered in the page; doing so will bring up your file manager.