Check if a value in a dictionary is a substring of another key-value pair in Python-CodePudding

I have a dictionary disease_dict with values in a list element. I would like to fetch key and value for specific keys and then check if the value (as a substring) exists in other keys and fetch all the key --> value pair.

For example this is the dictionary. I would like to see if the 'Stroke' or 'stroke' exist in the dictionary and then match if the value of this key is a substring of other value (like 'C10.228.140.300.775' exists in 'C10.228.140.300.275.800', 'C10.228.140.300.775.600')

'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']

I have the following lines of code for fetching the key and value for a specific term.

#extract all child terms
for k, v in dis_dict.items():
    if (k in ['Glaucoma', 'Stroke']) or (k in ['glaucoma', 'stroke']):
        disease = k
        tree_id = v
        print (disease, tree_id)
    else:
        disease = ''
        tree_id = ''
        continue

Any help is highly appreciated!

CodePudding user response：

You have a good starting point and as you probably already know, you need to work on the key to split it. Here is how you could do it:

disease_dict = { 'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600'], 'Flue' : ['C10.228.140.300.780'] } 

for k, v in disease_dict.items():
    tmp = ''.join(x for x in k if x.isalpha() or x == '-' or x == ' ')
    tmpKey = tmp.split(' ')
    for tk in tmpKey:
        if tk.capitalize() in ['Stroke', 'Glaucoma']:
            print(k, v, end= ' ') # To remove the new line ending

print(notable_diseases)

First, we remove unnecessary characters by using this line :

tmp = ''.join(x for x in k if x.isalpha() or x == ' ' or x == '-')

It only keeps the alpha characters, spaces, and dashes. Since I don't know what your diseases look like, I only kept those characters (space is needed on the next line). After creating this new formatted key, we split it by spaces to then compare substrings.

tmpKey = tmp.split(' ')

Once tmpKey is made, we loop over it to check if your wanted disease belongs to the original key.

for tk in tmpKey:
    if tk.capitalize() in ['Stroke', 'Glaucoma']:
        print(k, v, end= ' ') # To remove the new line ending

tk.capitalize() is used to capitalize the first letter so you don't have to check both forms of a word.

Finally, after running the above script, here is what we got:

Stroke ['C10.228.140.300.775', 'C14.907.253.855'] Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']

CodePudding user response：

The code below should do what you want to achieve:

dis_dict = {
    'Stroke':          ['C10.228.140.300.775', 'C14.907.253.855'], 
    'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
}

dict_already_printed = {}
for k, v in dis_dict.items():
    if ( k.lower() in ['glaucoma', 'stroke'] ):
        disease = k
        tree_id = v
        output = None
        for c_code_1 in tree_id:
            for key, value in dis_dict.items():  
                for c_code_2 in value: 
                    if c_code_1 in c_code_2: 
                        if f'{disease} {tree_id}' != f'{key} {value}':
                            tmp_output = f'{disease} {tree_id}, other: {key} {value}'
                            if tmp_output not in dict_already_printed:
                                output = tmp_output
                                print(output)
                                dict_already_printed[output] = None
        if output is None: 
            output = f'{disease} {tree_id}'
            print(output)

    else:
        disease = ''
        tree_id = ''
        continue

so test it with another data for the dictionary to see if it works as expected. It prints only in case of complete match:

Stroke ['C10.228.140.300.775', 'C14.907.253.855'], other: Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']

or if no other disease was found (with dictionary values changed to avoid a match) only the found one:

Stroke ['C10.228.140.300.775', 'C14.907.253.855']