I have multiple list which are ordered like the following list:
['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']
I need to transform this list into a dictionary with the words with ":" at the end as keys. The lists are changing, so that sometimes new words with ":" are added. The corresponding values are always at the next position after the word with ":" in the list.
When I start iterating about the list it gets frustrating very quickly because there are to much possibilities for me at the moment. So I would like to ask, if anyone knows a fast transformation from such a list into a dictionary.
I tried multiple iterating processes like the one here to access the words with ':':
checkwords = []
for charnum_list in df_new.char_num:
try:
for charnum in charnum_list:
math.isnan(charnum)
except:
new_charnum_list = []
for charnum in charnum_list:
charnum_new = charnum.replace('HP:','HP')
charnum_new = charnum_new.replace('<','').replace('>','').split(' ')
for word in charnum_new:
checkwords.append(word)
diagnosis_dictionaries = list(set([word for word in checkwords if ':' in word]))
output:
diagnosis_dictionaries:
['HPO:', 'ICD9CM:', 'SNOMEDCT:', 'UMLS:', 'ICD10CM:']
Then I tried to iterate again to compare the lists with the values and keys with the list with the keys (above) but at this point i am really desperate, because none of my ideas worked out well.
It would be very nice, if someone has a good idea or a better solution than mine.
CodePudding user response:
If I interpret your question correctly then I think you're looking to do this:
lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']
dct = dict()
k = None
for e in lst:
if e[-1] == ':':
k = e[:-1]
else:
if k is not None:
dct.setdefault(k, []).append(e)
print(dct)
Output:
{'SNOMEDCT': ['263681008,', '771269000'], 'UMLS': ['C0443147,', 'C1867440', 'C0443147'], 'HPO': ['HP0000006', 'HP0000006']}
CodePudding user response:
You can use itertools.groupby
to create the dictionary. For example:
from itertools import groupby
lst = ['SNOMEDCT:', '263681008,', '771269000', 'UMLS:', 'C0443147,', 'C1867440', 'HPO:', 'HP0000006', 'HPO:', 'HP0000006', 'UMLS:', 'C0443147']
out = {}
for k, g in groupby(lst, lambda i: i.endswith(":")):
if k:
out.setdefault(key := next(g).strip(":"), [])
else:
out[key].extend(map(lambda s: s.strip(","), g))
print(out)
Prints:
{
"SNOMEDCT": ["263681008", "771269000"],
"UMLS": ["C0443147", "C1867440", "C0443147"],
"HPO": ["HP0000006", "HP0000006"],
}