Home > Software engineering >  how to deal with multilevel of nested dicts and lists Python
how to deal with multilevel of nested dicts and lists Python

Time:10-05

I have a list of dicts with a list of dicts

Object Example:

my_obj = [
    {
        "weight": 3000,
        "data": [
            {
                "date": datetime.datetime(2020, 11, 3, 0, 0),
                "value": 8.5
            },
            {
                "date": datetime.datetime(2020, 11, 4, 0, 0),
                "value": 9.3
            },
            {...}
        ]
    },
    {
        "weight": 2000,
        "data": [
            {
                "date": datetime.datetime(2020, 11, 3, 0, 0),
                "value": 8.2
            },
            {
                "date": datetime.datetime(2020, 11, 4, 0, 0),
                "value": 8
            },
            {...}
        ]
    },
    {...}
]

i need to do some math with those values and weight and return a unique list of data

Expected Result:

"data": [
    {
        "date": datetime.datetime(2020, 11, 3, 0, 0),
        "value": '(
        ( first_nested_data_list(value[0]) * first_nested_data_list(weight) ) 
        ( second_nested_data_list(value[0]) * second_nested_data_list(weight) ) 
        ( third_nested_data_list(value[0]) * third_nested_data_list(weight) )
        ) / sum(all_weight)'
    },
    {
        "date": datetime.datetime(2020, 11, 4, 0, 0),
        "value": '(
        ( first_nested_data_list(value[1]) * first_nested_data_list(weight) )  
        ( second_nested_data_list(value[1]) * second_nested_data_list(weight) )  
        ( third_nested_data_list(value[1]) * third_nested_data_list(weight) )
        ) / sum(all_weight)'
    },
    {...}
]

# or

"data": [
    {
        "date": datetime.datetime(2020, 11, 3, 0, 0),
        "value": ( (8.5 * 3000)   (8.2 * 2000) ) / 5000
    },
    {...}
]

tried to use zip, but as i don't know my_obj length, I couldn't solve this

any help will be appreciated!

CodePudding user response:

You can use nested defaultdict as a temporary storage for data grouped by date and then iterate over this temporary storage calculating average for each date and saving in required form.

Code:

from collections import defaultdict

my_obj = [ ... ]

temp = defaultdict(lambda: defaultdict(int))
for obj in my_obj:
    for i in obj["data"]:
        temp[i["date"]]["sum"]  = i["value"] * obj["weight"]
        temp[i["date"]]["weight"]  = obj["weight"]
result = [{"date": k, "value": v["sum"] / v["weight"]} for k, v in temp.items()]
# add {"data": [ ... ]} if you need result in form provided in question

CodePudding user response:

I'd handle this problem in two steps. First transpose your data so that the items are grouped by date, rather than by weight as they are now. Then in another step find the weighted averages for each date.

Since you need to be able to look up the weights and values by day, I'd use the dates as keys in an intermediate mapping (before turning it back into a JSON like mapping with lables as keys at the end):

transposed = {}         # build a dict in this format: {date: [(weight, value), ...]}
for wdict in my_obj:
    weight = wdict["weight"]
    for ddict in wdict["data"]:
        date = ddict["date"]
        value = ddict["value"]
        transposed.setdefault(date, []).append((weight, value))

results = []           # format is [{"date": date, "value": weighted_average}, ...]
for date, weighted_values_list in transposed.items():
    weighted_average = (sum(weight * value for weight, value in weighted_values_list) /
                        sum(weight for weight, value in weighted_values_list))
    results.append({"date": date, "value": weighted_average})

# optionally wrap the results list in another dictionary
# final_results = {data: results}

You might be able to change the trasnposition step to do the summing of the weighted values and the weights directly, rather than making a list of them to sum up later. Then the calculation in the second step could just be the division to find the weighted average. But I like the list-based approach better, though I'm not really sure why.

  • Related