Access and sort by tuple element in pandas dataframe-CodePudding

I have a nested dictionary and want to assess its content, e.g. find all dates where the numerical value of the second element of the contained tuple is bigger than absolute 1.

my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
 '2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
                'position_1': ('info_0', -0.020718000014312565)},
 '2020_02_06': {},
 '2020_02_10': {'position_0': ('info_0', 0.010201999975834042),
                'position_1': ('info_1', 0.016721999971196055),
                'position_2': ('info_2', -0.05537799999001436),
                'position_3': ('info_3', 0.00985999999102205),
                'position_4': ('info_4', 0.03294800000730902),
                'position_5': ('info_4', 0.03294800000730902)}}

For single values I could access it by something like:

minimal_differences['2020_02_04']['position_1'][1]

df = pd.DataFrame.from_dict(minimal_differences)
df = df.T.sort_index(axis=1)  # transpose with .T and sort by column

Here the output would be "2020_02_04", since the absolute numerical value of the tuple is 338.9..., so > 1. Also I would like to have sorted list of the absolute values, so

'2002_02_06': nan
'2020_02_05': 0.02
'2020_02_10': 0.03 <-- here the biggest value from the date was chosen
'20_02_04': 339

(values rounded).

But I struggle to automate this over all elements of the dictionary. I tried to import it into pandas but still didn't manage to make a conditional selection and sort based on the second element of the tuple. Especially since not all dates have the same number of information.

I was also thinking about unpacking the dictionary more to have easier access in pandas but am at a loss how to do that.

Your help well be much appreciated. Best regards!

CodePudding user response：

This:

...find all dates where the numerical value of the second element of the contained tuple is bigger than absolute 1

my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
 '2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
                'position_1': ('info_0', -0.020718000014312565)},
 '2020_02_06': {},
 '2020_02_10': {'position_0': ('info_0', 5.010201999975834042),
                'position_1': ('info_1', 4.016721999971196055),
                'position_2': ('info_2', -3.05537799999001436),
                'position_3': ('info_3', 2.00985999999102205),
                'position_4': ('info_4', 1.03294800000730902),
                'position_5': ('info_4', 0.03294800000730902)}}

for k_date, inner_dict in my_dict.items():
    for position, val_tuple in inner_dict.items():
        if abs(val_tuple[1]) > 1:
            print(f"Date: {k_date}, position: {position}, info: {val_tuple[0]}, value: {val_tuple[1]}")

Outputs:

Date: 2020_02_04, position: position_1, info: info_3, value: -338.99962600006256
Date: 2020_02_10, position: position_0, info: info_0, value: 5.010201999975834
Date: 2020_02_10, position: position_1, info: info_1, value: 4.016721999971196
Date: 2020_02_10, position: position_2, info: info_2, value: -3.0553779999900144
Date: 2020_02_10, position: position_3, info: info_3, value: 2.009859999991022
Date: 2020_02_10, position: position_4, info: info_4, value: 1.032948000007309

Do you have an idea, how I could make a boolean list to show all correpsonding dates in a pandas dataframe?

Yes:) sum, boolean... what do you need :)

import pandas as pd
import numpy as np

my_dict = {'2020_02_04': {'position_1': ('info_3', -338.99962600006256)},
 '2020_02_05': {'position_0': ('info_0', -0.020718000014312565),
                'position_1': ('info_0', -0.020718000014312565)},
 '2020_02_06': {},
 '2020_02_10': {'position_0': ('info_0', 5.010201999975834042),
                'position_1': ('info_1', 4.016721999971196055),
                'position_2': ('info_2', -3.05537799999001436),
                'position_3': ('info_3', 2.00985999999102205),
                'position_4': ('info_4', 1.03294800000730902),
                'position_5': ('info_4', 0.03294800000730902)}}

df = pd.DataFrame(my_dict)

def sum_series_tuple(x):
    v_sum = 0
    for val in x:
        if type(val) == tuple:
            v_sum  =val[1]
    return v_sum , abs(v_sum) > 1        

df.apply(sum_series_tuple)

Outputs:

    | 2020_02_04|   2020_02_05| 2020_02_06| 2020_02_10 |
----|-----------|-------------|-----------|------------|
0   |-338.999626|   -0.041436 | 0         |  9.047302  |
1   |True       |False        | False     | True       |