Here's a basic example of what I mean by parsing JSON:
import json
kitab ='{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'
indo_dict = json.loads(kitab)
comma = ","
#Imagine here that the user wanted to know the word "kitab" in English, and also selected to see the etymology
print(indo_dict['word'], comma, indo_dict['english'], comma, indo_dict['etymology'])
This works fine for one word like "kitab", but what about the other thousands of words I'd need to add? The issue here is that the one lone Indonesian word has to go after json.loads for this to run; I could not make it work with more than one word in the dictionary which sort of defeats the point. Is there a better way of doing this that I'm not aware of?
Sorry if this is a weird question, but I have a learning disability and I'm not entirely sure what I'm doing.
CodePudding user response:
A dictionary containing the info of a single word isn't the correct data structure to hold the info for all words. Instead, you want the dict you already have (indo_dict
) to be one entry in a dict that contains other words too.
kitab_json = '{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'
khaleesi_json = '{"word": "khaleesi", "english": "queen", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Doth Raki"}'
# Deserialize the jsons
kitab_dict = json.loads(kitab_json)
khaleesi_dict = json.loads(khaleesi_json)
# Create a big dict
all_words = dict()
# Add your words to this dict
all_words["kitab"] = kitab_dict
all_words["khaleesi"] = khaleesi_dict
More generally, to add any given info_dict
to all_words
:
word = info_dict["word"]
all_words[word] = info_dict
And now you have all_words
as:
{
'kitab': {
'word': 'kitab',
'english': 'book',
'partofspeech': 'noun',
'honorifics': 'n/a',
'etymology': 'from Arabic'
},
'khaleesi': {
'word': 'khaleesi',
'english': 'queen',
'partofspeech': 'noun',
'honorifics': 'n/a',
'etymology': 'from Doth Raki'
}
}
To access e.g. the part of speech of any word w
, you'd look up the info dict for that word using all_words[w]
, and then get the part of speech from that info dict:
w = "khaleesi"
print(all_words[w]["partofspeech"]) # noun
You might want to look into replacing the info_dict
s with a dataclass, or using a dataframe or database
CodePudding user response:
Both options are recommended for you, getting the data from json is the same as your example, it's not complicated, I'll mainly show you the formatting on.
- if your words will not be repeated, you can use the word directly as a dictionary key to store the information
kitab_json = {
"kitab": {
"english": "book",
"partofspeech": "noun",
"honorifics": "n/a",
"etymology": "from Arabic"
},
"word2": {
"english": "book",
"partofspeech": "noun",
"honorifics": "n/a",
"etymology": "from Arabic"
}
}
- if your content will be repetitive, try using json in list form:
kitab_json = [
{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"},
{"word": "word2", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"},
{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}
]
CodePudding user response:
If you have json for multiple words stored like this,
kitab_json = '{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'
khaleesi_json = '{"word": "khaleesi", "english": "queen", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Doth Raki"}'
Then you can first convert it into something like the one below using what @Pranav told.
all_words = {
'kitab': {
'word': 'kitab',
'english': 'book',
'partofspeech': 'noun',
'honorifics': 'n/a',
'etymology': 'from Arabic'
},
'khaleesi': {
'word': 'khaleesi',
'english': 'queen',
'partofspeech': 'noun',
'honorifics': 'n/a',
'etymology': 'from Doth Raki'
}
}
From here on, you can use pandas
to convert this into a dataframe with the code below,
import pandas as pd
df = pd.DataFrame(all_words).T.drop(columns = "word")
This would give you a table that looks like this -
english | partofspeech | honorifics | etymology | |
---|---|---|---|---|
kitab | book | noun | n/a | from Arabic |
khaleesi | queen | noun | n/a | from Doth Raki |
From here you can use the rows and columns to get the info for that particular word.