I have a nested dictionary and want to assess its content, e.g. find all dates where the numerical value of the second element of the contained tuple is bigger than absolute 1.
my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
'2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
'position_1': ('info_0', -0.020718000014312565)},
'2020_02_06': {},
'2020_02_10': {'position_0': ('info_0', 0.010201999975834042),
'position_1': ('info_1', 0.016721999971196055),
'position_2': ('info_2', -0.05537799999001436),
'position_3': ('info_3', 0.00985999999102205),
'position_4': ('info_4', 0.03294800000730902),
'position_5': ('info_4', 0.03294800000730902)}}
For single values I could access it by something like:
minimal_differences['2020_02_04']['position_1'][1]
df = pd.DataFrame.from_dict(minimal_differences)
df = df.T.sort_index(axis=1) # transpose with .T and sort by column
Here the output would be "2020_02_04", since the absolute numerical value of the tuple is 338.9..., so > 1. Also I would like to have sorted list of the absolute values, so
- '2002_02_06': nan
- '2020_02_05': 0.02
- '2020_02_10': 0.03 <-- here the biggest value from the date was chosen
- '20_02_04': 339
(values rounded).
But I struggle to automate this over all elements of the dictionary. I tried to import it into pandas but still didn't manage to make a conditional selection and sort based on the second element of the tuple. Especially since not all dates have the same number of information.
I was also thinking about unpacking the dictionary more to have easier access in pandas but am at a loss how to do that.
Your help well be much appreciated. Best regards!
CodePudding user response:
This:
...find all dates where the numerical value of the second element of the contained tuple is bigger than absolute 1
my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
'2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
'position_1': ('info_0', -0.020718000014312565)},
'2020_02_06': {},
'2020_02_10': {'position_0': ('info_0', 5.010201999975834042),
'position_1': ('info_1', 4.016721999971196055),
'position_2': ('info_2', -3.05537799999001436),
'position_3': ('info_3', 2.00985999999102205),
'position_4': ('info_4', 1.03294800000730902),
'position_5': ('info_4', 0.03294800000730902)}}
for k_date, inner_dict in my_dict.items():
for position, val_tuple in inner_dict.items():
if abs(val_tuple[1]) > 1:
print(f"Date: {k_date}, position: {position}, info: {val_tuple[0]}, value: {val_tuple[1]}")
Outputs:
Date: 2020_02_04, position: position_1, info: info_3, value: -338.99962600006256
Date: 2020_02_10, position: position_0, info: info_0, value: 5.010201999975834
Date: 2020_02_10, position: position_1, info: info_1, value: 4.016721999971196
Date: 2020_02_10, position: position_2, info: info_2, value: -3.0553779999900144
Date: 2020_02_10, position: position_3, info: info_3, value: 2.009859999991022
Date: 2020_02_10, position: position_4, info: info_4, value: 1.032948000007309
Do you have an idea, how I could make a boolean list to show all correpsonding dates in a pandas dataframe?
Yes:) sum, boolean... what do you need :)
import pandas as pd
import numpy as np
my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
'2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
'position_1': ('info_0', -0.020718000014312565)},
'2020_02_06': {},
'2020_02_10': {'position_0': ('info_0', 5.010201999975834042),
'position_1': ('info_1', 4.016721999971196055),
'position_2': ('info_2', -3.05537799999001436),
'position_3': ('info_3', 2.00985999999102205),
'position_4': ('info_4', 1.03294800000730902),
'position_5': ('info_4', 0.03294800000730902)}}
df = pd.DataFrame(my_dict)
def sum_series_tuple(x):
v_sum = 0
for val in x:
if type(val) == tuple:
v_sum =val[1]
return v_sum , abs(v_sum) > 1
df.apply(sum_series_tuple)
Outputs:
| 2020_02_04| 2020_02_05| 2020_02_06| 2020_02_10 |
----|-----------|-------------|-----------|------------|
0 |-338.999626| -0.041436 | 0 | 9.047302 |
1 |True |False | False | True |