Home > Blockchain >  converting lists to strings and integers in nested dictionary
converting lists to strings and integers in nested dictionary

Time:10-16

I have a nested dictionary where I want to convert 'protein accession' values from lists into simple strings. Also, I want to convert lists of strings into lists of integers for example in 'sequence length', 'start location', 'stop location'.

'protein accession': ['A0A0H3LJT0_e'] into 'protein accession': 'A0A0H3LJT0_e'

'sequence length': ['102'] into 'sequence length': [102]

and so on

Here is the sample of my dictionary:

{
    "A0A0H3LJT0_e": {
        "protein accession": ["A0A0H3LJT0_e"],
        "sequence length": ["102"],
        "analysis": ["SMART"],
        "signature accession": ["SM00886"],
        "signature description": ["Dabb_2"],
        "start location": ["4"],
        "stop location": ["98"],
        "e-value": ["1.5E-22"],
        "interpro accession": ["IPR013097"],
        "interpro description": ["Stress responsive alpha-beta barrel"],
        "nunique": [2],
        "domain_count": [1],
    }
}

Could someone help me, please?

CodePudding user response:

You need to iterate through the replace values accordingly.

d is the input dictionary here.

In [1]: data = d['C4QY10_e']

In [2]: result = {}

In [3]: for k,v in data.items():
  ...:     if str(v[0]).isdigit():
  ...:         result[k] = [int(v[0])]
  ...:     else:
  ...:         result[k] = v[0]
  ...: 

In [4]: result
Out[4]: 
{'protein accession': 'C4QY10_e',
 'sequence length': [1879],
 'analysis': 'Pfam',
 'signature accession': 'PF18314',
 'signature description': 'Fatty acid synthase type I helical domain',
 'start location': [328],
 'stop location': [528],
 'e-value': '4.7E-73',
 'interpro accession': 'IPR041550',
 'interpro description': 'Fatty acid synthase type I',
 'nunique': [1],
 'domain_count': [5]}

To iterate through the entire dictionary like this,

for val in d.values():
    for k,v in val.items():
        if str(v[0]).isdigit():
            result[k] = [int(v[0])]
        else:
            result[k] = v[0]

If you want to change the dictionary itself you can do this,

for main_k, main_val in d.items():
    for k,v in main_val.items():
        if str(v[0]).isdigit():
            d[main_k][k] = [int(v[0])]
        else:
            d[main_k][k] = v[0]

CodePudding user response:

When creating a new mapping-type object with nested container objects (e.g. list, dict, set, etc.), a defaultdict from the built-in collections library may be called for.

However, let's assume you are modifying the existing dictionary in place, thus preserving the dict type. We can use two explicit for loops over dict.items():

# assume input is stored to the variable, data
for name, details in data.items():  # Also possible to use for details in data.values():
    for attribute, values in details.items():
        # Use tuple unpacking to raise a ValueError
        # when no values or more than one value unexpectedly appears
        (value,) = values

        # Only consider strings with decimal values
        if isinstance(value, str) and value.isdecimal():
            # details is a reference to data[name]
            details[attribute] = int(value)
        else:
            details[attribute] = value

CodePudding user response:

Assuming you also want to convert the string representation of a floating point number then you could do this (which also allows for list values with more than one element:

sample = {
    "A0A0H3LJT0_e": {
        "protein accession": ["A0A0H3LJT0_e"],
        "sequence length": ["102"],
        "analysis": ["SMART"],
        "signature accession": ["SM00886"],
        "signature description": ["Dabb_2"],
        "start location": ["4"],
        "stop location": ["98"],
        "e-value": ["1.5E-22"],
        "interpro accession": ["IPR013097"],
        "interpro description": ["Stress responsive alpha-beta barrel"],
        "nunique": [2],
        "domain_count": [1]
    }
}

for sd in sample.values():
    if isinstance(sd, dict):
        for k, v in sd.items():
            if isinstance(v, list):
                try:
                    sd[k] = list(map(int, v))
                except ValueError:
                    try:
                        sd[k] = list(map(float, v))
                    except ValueError:
                        sd[k] = ', '.join(map(str, v))

print(sd)

Output:

{'protein accession': 'A0A0H3LJT0_e', 'sequence length': [102], 'analysis': 'SMART', 'signature accession': 'SM00886', 'signature description': 'Dabb_2', 'start location': [4], 'stop location': [98], 'e-value': [1.5e-22], 'interpro accession': 'IPR013097', 'interpro description': 'Stress responsive alpha-beta barrel', 'nunique': [2], 'domain_count': [1]}

Note:

Unless every value in a list can be converted to either int or float, the values will be converted into a single string where each element is separated from the other by ', '

  • Related