Home > Mobile >  What would be the best way to structure a two-way language dictionary in Python 3 parsing in JSON?
What would be the best way to structure a two-way language dictionary in Python 3 parsing in JSON?

Time:04-26

Here's a basic example of what I mean by parsing JSON:

import json
 
kitab ='{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'
 
indo_dict = json.loads(kitab)
comma = ","
 
#Imagine here that the user wanted to know the word "kitab" in English, and also selected to see the etymology
print(indo_dict['word'], comma, indo_dict['english'], comma, indo_dict['etymology'])

This works fine for one word like "kitab", but what about the other thousands of words I'd need to add? The issue here is that the one lone Indonesian word has to go after json.loads for this to run; I could not make it work with more than one word in the dictionary which sort of defeats the point. Is there a better way of doing this that I'm not aware of?

Sorry if this is a weird question, but I have a learning disability and I'm not entirely sure what I'm doing.

CodePudding user response:

A dictionary containing the info of a single word isn't the correct data structure to hold the info for all words. Instead, you want the dict you already have (indo_dict) to be one entry in a dict that contains other words too.

kitab_json = '{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'

khaleesi_json = '{"word": "khaleesi", "english": "queen", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Doth Raki"}'

# Deserialize the jsons
kitab_dict = json.loads(kitab_json)
khaleesi_dict = json.loads(khaleesi_json)

# Create a big dict
all_words = dict()

# Add your words to this dict
all_words["kitab"] = kitab_dict
all_words["khaleesi"] = khaleesi_dict

More generally, to add any given info_dict to all_words:

word = info_dict["word"]
all_words[word] = info_dict

And now you have all_words as:

{
 'kitab': {
            'word': 'kitab',
            'english': 'book',
            'partofspeech': 'noun',
            'honorifics': 'n/a',
            'etymology': 'from Arabic'
          },
 'khaleesi': {
            'word': 'khaleesi',
            'english': 'queen',
            'partofspeech': 'noun',
            'honorifics': 'n/a',
            'etymology': 'from Doth Raki'
          }
}

To access e.g. the part of speech of any word w, you'd look up the info dict for that word using all_words[w], and then get the part of speech from that info dict:

w = "khaleesi"
print(all_words[w]["partofspeech"]) # noun

You might want to look into replacing the info_dicts with a dataclass, or using a dataframe or database

CodePudding user response:

Both options are recommended for you, getting the data from json is the same as your example, it's not complicated, I'll mainly show you the formatting on.

  1. if your words will not be repeated, you can use the word directly as a dictionary key to store the information
kitab_json = {
    "kitab": {
        "english": "book",
        "partofspeech": "noun",
        "honorifics": "n/a",
        "etymology": "from Arabic"
    },
    "word2": {
        "english": "book",
        "partofspeech": "noun",
        "honorifics": "n/a",
        "etymology": "from Arabic"
    }
}
  1. if your content will be repetitive, try using json in list form:
kitab_json = [
    {"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"},
    {"word": "word2", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"},
    {"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}
]

CodePudding user response:

If you have json for multiple words stored like this,

kitab_json = '{"word": "kitab", "english": "book", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Arabic"}'
khaleesi_json = '{"word": "khaleesi", "english": "queen", "partofspeech": "noun", "honorifics": "n/a", "etymology": "from Doth Raki"}'

Then you can first convert it into something like the one below using what @Pranav told.

all_words = {
 'kitab': {
            'word': 'kitab',
            'english': 'book',
            'partofspeech': 'noun',
            'honorifics': 'n/a',
            'etymology': 'from Arabic'
          },
 'khaleesi': {
            'word': 'khaleesi',
            'english': 'queen',
            'partofspeech': 'noun',
            'honorifics': 'n/a',
            'etymology': 'from Doth Raki'
          }
}

From here on, you can use pandas to convert this into a dataframe with the code below,

import pandas as pd

df = pd.DataFrame(all_words).T.drop(columns = "word")

This would give you a table that looks like this -

english partofspeech honorifics etymology
kitab book noun n/a from Arabic
khaleesi queen noun n/a from Doth Raki

From here you can use the rows and columns to get the info for that particular word.

  • Related