How can I create a dictionary from an unordered list, where the list contains the keys which are the-CodePudding

I have multiple list which are ordered like the following list:

['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']

I need to transform this list into a dictionary with the words with ":" at the end as keys. The lists are changing, so that sometimes new words with ":" are added. The corresponding values are always at the next position after the word with ":" in the list.

When I start iterating about the list it gets frustrating very quickly because there are to much possibilities for me at the moment. So I would like to ask, if anyone knows a fast transformation from such a list into a dictionary.

I tried multiple iterating processes like the one here to access the words with ':':

checkwords = []
for charnum_list in df_new.char_num:
    try:
        for charnum in charnum_list:
            math.isnan(charnum)        
    except:
        new_charnum_list = []
        for charnum in charnum_list:
            charnum_new = charnum.replace('HP:','HP')
            charnum_new = charnum_new.replace('<','').replace('>','').split(' ')
            for word in charnum_new:
                checkwords.append(word)
diagnosis_dictionaries = list(set([word for word in checkwords if ':' in word]))

output:

diagnosis_dictionaries:

['HPO:', 'ICD9CM:', 'SNOMEDCT:', 'UMLS:', 'ICD10CM:']

Then I tried to iterate again to compare the lists with the values and keys with the list with the keys (above) but at this point i am really desperate, because none of my ideas worked out well.

It would be very nice, if someone has a good idea or a better solution than mine.

CodePudding user response：

If I interpret your question correctly then I think you're looking to do this:

lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']

dct = dict()
k = None
for e in lst:
    if e[-1] == ':':
        k = e[:-1]
    else:
        if k is not None:
            dct.setdefault(k, []).append(e)
    
print(dct)

Output:

{'SNOMEDCT': ['263681008,', '771269000'], 'UMLS': ['C0443147,', 'C1867440', 'C0443147'], 'HPO': ['HP0000006', 'HP0000006']}

CodePudding user response：

You can use itertools.groupby to create the dictionary. For example:

from itertools import groupby


lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']


out = {}
for k, g in groupby(lst, lambda i: i.endswith(":")):
    if k:
        out.setdefault(key := next(g).strip(":"), [])
    else:
        out[key].extend(map(lambda s: s.strip(","), g))

print(out)

Prints:

{
    "SNOMEDCT": ["263681008", "771269000"],
    "UMLS": ["C0443147", "C1867440", "C0443147"],
    "HPO": ["HP0000006", "HP0000006"],
}