I have a long list of keys (main keys) in the nested dictionary. In one of the sub-keys, I want to create a list of its values. This is one of the records from my nested dictionary. They all are structured in a similar manner.
{'C4QY10_e':
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily", 'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1]
}
The sub-key value I want to turn into the list is 'interpro description'. I want it to be divided by ','. So [0] value of the list would be "Fatty acid synthase type I" and [1] "Fatty acid synthase subunit alpha". It is very important that these values would preserve input order.
CodePudding user response:
Using split()
:
yourdict = {'C4QY10_e':
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily", 'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1]
}}
yourdict['C4QY10_e']['interpro description'] = yourdict['C4QY10_e']['interpro description'].split(', ')
print(yourdict)
{'C4QY10_e': {'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily",
'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': ['Fatty acid synthase type I',
'Fatty acid synthase subunit alpha',
"4'-phosphopantetheinyl transferase domain"],
'nunique': [1]}}
CodePudding user response:
Here's another solution:
def to_list(_dict: dict, _key: str = 'interpro description') -> dict:
"""Convert a string to a list of strings.
Parameters
----------
_dict : dict
A dictionary to convert `_key` into a list.
_key : str
A key in the dictionary.
Returns
-------
dict
The original dictionary, with `_key` modified into a list of strings.
Notes
-----
Function accepts dictionaries with multiple levels.
"""
for key, value in _dict.items():
if isinstance(value, dict):
_dict[key] = to_list(value, _key)
if key == _key and isinstance(value, str):
_dict[key] = list(
map(
lambda value: value.lstrip(" "),
value.split(',')
)
)
return _dict
# == Example ==========
my_dict = {
'C4QY10_e': {
'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily",
'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1],
}
}
_my_dict = to_list(my_dict)
print(_my_dict['C4QY10_e']['interpro description'])
# Prints:
# ['Fatty acid synthase type I', 'Fatty acid synthase subunit alpha', "4'-phosphopantetheinyl transferase domain"]