Home > Enterprise >  Extracting text from a specific field in a json file in Python
Extracting text from a specific field in a json file in Python

Time:11-29

My JSON looks like this (but with many lines like these):

{"text": "Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter.", "timestamp": "2018-01-20T18:56:35Z", "url": "http://proarslausitz.de/1.html"}
{"text": "Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.", "url": "http://www.shutterstock.com/de/pic.mhtml?id=79800031&src=lznayUu4-IHg9bkDAflIhg-1-15"}

I want to create a .txt file containing just the text from text. So it would be just:

Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter. Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.

No strings, no nothing. The encoding (because of umlauts) I think is not hard to solve afterwards. But regarding text extraction, I know I can do:

json_object = json.loads(json_object_string)
print(json_object["text"])

But that's just for a line. Do I need to iterate over the lines? How can I merge the texts into a single .txt file?

CodePudding user response:

with open("file.txt", 'w') as txt_file:
    for i in range(len(js_file['...'])):
        txt_file.write(js['...'][i]['text'])

txt_file.close()

replace '...' with the name of the main key for the json file

CodePudding user response:

I'm not entirely sure there is a way to "vectorize" copying values from a json, and even if there was, iterating still gets the job done just fine in my opinion. If I were to iterate through every line of that long JSON and put each "text" into a text file, I would do it like this:

import json

# removed escape sequences, that is not focus of problem
test = '[{"text": "Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.Kunst. Und so weiter.", "timestamp": "2018-01-20T18:56:35Z", "url": "http://proarslausitz.de/1.html"}, {"text": "Bildnummer: 79800031VektorgrafikSkalieren Sie ohne Aufl sungsverlust auf jede beliebige. Ende.", "url": "http://www.shutterstock.com/de/pic.mhtml?id=79800031&src=lznayUu4-IHg9bkDAflIhg-1-15"}]'

# as you said loading the object from list of dicts into json
test_json = json.loads(test)

# opens a new text file to put the json text into
with open("json_output.txt", 'w ') as file:
    for line in test_json:
       # assuming the text includes /n write function will paste each dict on different line
       file.write(line.get("text"))

CodePudding user response:

json.load returns data in the form of key/value pairs. run a loop through your json_object data = json.load(json_object_string)

create a .txt file for output.

output = open("newfile.txt", "a")

for e in json_object:

f.write(e['text'])

close your file

f.close()

  • Related