My latest use of PyPDF2 extracted all fields as a nested dictionary. I am looking at roughly 70 keys that I want to make into a simple df (and then a .csv file.) Here is a sample of my data- with 2 keys.
{'Proposed Arrangement': {'/FT': '/Ch',
'/T': 'Proposed Arrangement',
'/TU': 'Please select an arrangement from the drop-down list.',
'/Ff': 4325376,
'/V': 'Remote – Within the Local Commuting Area',
'/DV': 'Remote – Within the Local Commuting Area'},
'Proposed Length of Arrangement': {'/FT': '/Ch',
'/T': 'Proposed Length of Arrangement',
'/TU': 'Please select the proposed length of the arrangement from the drop-down list.',
'/Ff': 71434240,
'/V': '6 Months',
'/DV': 'Please select length'}}
I am trying to make a for loop to clean the dict and by pulling on the '/V' keys. Ideally, the new dict would have this output:
{'Proposed Arrangement': 'Remote – Within the Local Commuting Area',
'Proposed Length of Arrangement':'6 Months'}
Does anyone have any idea of where to start with this loop? I'm a bit of a beginner and most of the resources I found were extracting ONLY the values (e.g. [Remote - Within the Local Commuting Area, 'Six Months]) and this isn't what I need. I want to keep the first keys to eventually become my column headers when I switch the cleaned dict into a df. Thanks!
CodePudding user response:
In a bit convoluted way... This should help you.
input_dict = {'Proposed Arrangement':
{'/FT': '/Ch',
'/T': 'Proposed Arrangement',
'/TU': 'Please select an arrangement from the drop-down list.',
'/Ff': 4325376,
'/V': 'Remote - Within the Local Commuting Area',
'/DV': 'Remote - Within the Local Commuting Area'},
'Proposed Length of Arrangement':
{'/FT': '/Ch',
'/T': 'Proposed Length of Arrangement',
'/TU': 'Please select the proposed length of the arrangement from the drop-down list.',
'/Ff': 71434240,
'/V': '6 Months',
'/DV': 'Please select length'}
}
output_dict = {}
for key in input_dict.keys():
nested_dict = input_dict[key]
output_dict[key] = nested_dict['/V']
print(output_dict)
CodePudding user response:
Steven Rumbalski provides this answer above. But here is a slight improvement where the value is only included if it exists, rather than including the empty None
value.
new_dict = {k: v.get('/V') for k, v in old_dict.items() if v.get('/V') is not None}