I have a dictionary disease_dict with values in a list element. I would like to fetch key and value for specific keys and then check if the value (as a substring) exists in other keys and fetch all the key --> value pair.
For example this is the dictionary. I would like to see if the 'Stroke' or 'stroke' exist in the dictionary and then match if the value of this key is a substring of other value (like 'C10.228.140.300.775' exists in 'C10.228.140.300.275.800', 'C10.228.140.300.775.600')
'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']
I have the following lines of code for fetching the key and value for a specific term.
#extract all child terms
for k, v in dis_dict.items():
if (k in ['Glaucoma', 'Stroke']) or (k in ['glaucoma', 'stroke']):
disease = k
tree_id = v
print (disease, tree_id)
else:
disease = ''
tree_id = ''
continue
Any help is highly appreciated!
CodePudding user response:
You have a good starting point and as you probably already know, you need to work on the key to split it. Here is how you could do it:
disease_dict = { 'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600'], 'Flue' : ['C10.228.140.300.780'] }
for k, v in disease_dict.items():
tmp = ''.join(x for x in k if x.isalpha() or x == '-' or x == ' ')
tmpKey = tmp.split(' ')
for tk in tmpKey:
if tk.capitalize() in ['Stroke', 'Glaucoma']:
print(k, v, end= ' ') # To remove the new line ending
print(notable_diseases)
First, we remove unnecessary characters by using this line :
tmp = ''.join(x for x in k if x.isalpha() or x == ' ' or x == '-')
It only keeps the alpha characters, spaces, and dashes. Since I don't know what your diseases look like, I only kept those characters (space is needed on the next line). After creating this new formatted key, we split it by spaces to then compare substrings.
tmpKey = tmp.split(' ')
Once tmpKey
is made, we loop over it to check if your wanted disease belongs to the original key.
for tk in tmpKey:
if tk.capitalize() in ['Stroke', 'Glaucoma']:
print(k, v, end= ' ') # To remove the new line ending
tk.capitalize()
is used to capitalize the first letter so you don't have to check both forms of a word.
Finally, after running the above script, here is what we got:
Stroke ['C10.228.140.300.775', 'C14.907.253.855'] Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']
CodePudding user response:
The code below should do what you want to achieve:
dis_dict = {
'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'],
'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
}
dict_already_printed = {}
for k, v in dis_dict.items():
if ( k.lower() in ['glaucoma', 'stroke'] ):
disease = k
tree_id = v
output = None
for c_code_1 in tree_id:
for key, value in dis_dict.items():
for c_code_2 in value:
if c_code_1 in c_code_2:
if f'{disease} {tree_id}' != f'{key} {value}':
tmp_output = f'{disease} {tree_id}, other: {key} {value}'
if tmp_output not in dict_already_printed:
output = tmp_output
print(output)
dict_already_printed[output] = None
if output is None:
output = f'{disease} {tree_id}'
print(output)
else:
disease = ''
tree_id = ''
continue
so test it with another data for the dictionary to see if it works as expected. It prints only in case of complete match:
Stroke ['C10.228.140.300.775', 'C14.907.253.855'], other: Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
or if no other disease was found (with dictionary values changed to avoid a match) only the found one:
Stroke ['C10.228.140.300.775', 'C14.907.253.855']