Home > Mobile >  Issues with creating json and/or xml
Issues with creating json and/or xml

Time:12-05

i need help with writing code in python, i need to write a code that either create a json or xml with the words' position/index in the sentence, whether or not all the characters in the word are alphabet letters and finally the word itself for each word in the sentence they provide me. I first thought about using a simple dictionary to store the key and values and then transform the dictionary into json:

import json
data = {}
liste = [] # it's for storing all the words after splitting them by space
sentence="As its price tag has been slashed to $1.7trn over a decade, half as much as first pitched, the hunger—or squid—games between progressives and moderates have turned fiercer."

liste = sentence.split(" ")
for word,index in zip(liste,range(0,len(liste))):
    data[word.lower()] = {"alpha":word.lower().isalpha()}
    data[word.lower()]['Word'] = word.lower()
    data[word.lower()]['Index'] = index
json_data = json.dumps(data,ensure_ascii=False)
print(json_data)

which prints me this json:

{"as": {"alpha": true, "Word": "as", "Number": 15}, "its": {"alpha": true, "Word": "its", "Number": 1}, "price": {"alpha": true, "Word": "price", "Number": 2}, "tag": {"alpha": true, "Word": "tag", "Number": 3}, "has": {"alpha": true, "Word": "has", "Number": 4}, "been": {"alpha": true, "Word": "been", "Number": 5}, "slashed": {"alpha": true, "Word": "slashed", "Number": 6}, "to": {"alpha": true, "Word": "to", "Number": 7}, "$1.7trn": {"alpha": false, "Word": "$1.7trn", "Number": 8}, "over": {"alpha": true, "Word": "over", "Number": 9}, "a": {"alpha": true, "Word": "a", "Number": 10}, "decade,": {"alpha": false, "Word": "decade,", "Number": 11}, "half": {"alpha": true, "Word": "half", "Number": 12}, "much": {"alpha": true, "Word": "much", "Number":14}, "first": {"alpha": true, "Word": "first", "Number": 16}, "pitched,": {"alpha": false, "Word": "pitched,", "Number": 17}, "the": {"alpha": true, "Word": "the", "Number": 18}, "hunger—or": {"alpha": false, "Word": "hunger—or", "Number": 19}, "squid—games": {"alpha": false, "Word": "squid—games", "Number": 20}, "between": {"alpha": true, "Word": "between", "Number": 21}, "progressives": {"alpha": true, "Word": "progressives", "Number": 22}, "and": {"alpha": true, "Word": "and", "Number": 23}, "moderates": {"alpha": true, "Word": "moderates", "Number": 24}, "have": {"alpha": true, "Word": "have", "Number": 25}, "turned": {"alpha": true, "Word": "turned", "Number": 26}, "fiercer.": {"alpha": false, "Word": "fiercer.", "Number": 27}}

As you can see this json is not correct, there are some words that are missing (the two other "as"). After doing some research on stackoverflow i think that i start to understand why: If my understanding is correct, a dictionary and a json object cannot have the same keys more than once. But the problem is that in most english sentences some words are repeated more than once.

E.g. of an english sentence : As its price tag has been slashed to $1.7trn over a decade, half as much as first pitched, the hunger—or squid—games between progressives and moderates have turned fiercer.

In this sentence the word "as" is repeated 3 times, so i think in my code the key in the dictionary got overwritten twice, as there are 3 times the word "as". Is my thinking correct? If it's right, what can i do to solve this problem? Can i bypass the unique key of dictionary or json problem somehow? Which data structure should i use and how to get either a json or xml as output?

CodePudding user response:

In json you cannot bypass this syntax-wise, however you could just add a json attribute to a word that would be its occurences:

data[word.lower()]["occurences"]= data[word.lower()]["occurences"]  1 if word.lower() in data else 1

As a sidenote, I would strongly advise you to rename frequently used code as an attribute (here at the very least word.lower())

CodePudding user response:

I think you should really consider why you need it to be that way, what will you even do with that json ? What would you expect a computer reading the json to return if I do my_json["as"] ? There are many options for human-readable formats like TSV you could use if it's meant for a human to read.

The key needs to be something unique about the entry, if there is nothing unique about it, just add a count or use a list. In this case, I think the unique element would be the index, so you might want to consider using it as your key. Or you could use a list of dictionaries instead and put the word inside the dictionary. [{word:as, index:1}, {word:as, index:5}, ...] or you could have a dictionary of lists of dictionaries [{as:[{index:1}, {index:5}]}]. This can all be turned into a json with the module.

If you want a json only humans can read your only option is probably going to build the string yourself.

  • Related