Home > Enterprise >  Problem with processing data from a list of tuples / groupby / lambda
Problem with processing data from a list of tuples / groupby / lambda

Time:05-20

I have a list of tuples, which looks like below:

tuplelist = [
    (datetime.date(2020, 4, 20), 4.23, 'EUR'),
    (datetime.date(2020, 4, 20), 3.76, 'USD'),
    (datetime.date(2020, 4, 20), 4.21, 'EUR'),
    (datetime.date(2020, 4, 20), 5.20, 'GPB'),
    (datetime.date(2020, 4, 20), 3.77, 'USD'),
    (datetime.date(2020, 4, 20), 4.27, 'EUR'),
    (datetime.date(2020, 4, 20), 3.79, 'USD'),
    (datetime.date(2020, 4, 20), 4.30, 'EUR'),
    (datetime.date(2020, 4, 20), 5.14, 'GPB'),
    (datetime.date(2020, 4, 20), 3.77, 'USD'),
    (datetime.date(2020, 4, 25), 4.23, 'EUR'),
    (datetime.date(2020, 4, 25), 3.76, 'USD'),
    (datetime.date(2020, 4, 25), 4.21, 'EUR'),
    (datetime.date(2020, 4, 25), 5.20, 'GPB'),
    (datetime.date(2020, 4, 25), 3.77, 'USD'),
    (datetime.date(2020, 4, 27), 4.27, 'EUR'),
    (datetime.date(2020, 4, 27), 3.79, 'USD'),
    (datetime.date(2020, 4, 27), 4.30, 'EUR'),
    (datetime.date(2020, 4, 27), 5.14, 'GPB'),
    (datetime.date(2020, 4, 28), 3.77, 'USD'),
    (datetime.date(2020, 4, 28), 4.23, 'EUR'),
    (datetime.date(2020, 5, 2), 3.76, 'USD'),
    (datetime.date(2020, 5, 2), 4.21, 'EUR'),
    (datetime.date(2020, 5, 2), 5.20, 'GPB'),
    (datetime.date(2020, 5, 2), 3.77, 'USD'),
    (datetime.date(2020, 5, 2), 4.27, 'EUR'),
    (datetime.date(2020, 5, 5), 3.79, 'USD'),
    (datetime.date(2020, 5, 5), 4.30, 'EUR'),
    (datetime.date(2020, 5, 5), 5.14, 'GPB'),
    (datetime.date(2020, 5, 5), 3.77, 'USD')
]

and I'd like to group it by a date and currency symbol. It should look like this (for each day):

(datetime.date(2020, 4, 20), [{'EUR': [4.23, 4.21, 4.27, 4.3]}, {'USD': [3.76, 3.77, 3.79, 3.77]}, {'GPB': [5.2, 5.14]}])

I managed to group this by data, using this line of code:

tuplelist2dict = {key: [*map(lambda v: {v[2]:v[1]}, values)] for key, values in groupby(tuplelist, lambda x: x[0])} 

and I get this output:

(datetime.date(2020, 4, 20), [{'EUR': 4.23}, {'USD': 3.76}, {'EUR': 4.21}, {'GPB': 5.2}, {'USD': 3.77}, {'EUR': 4.27}, {'USD': 3.79}, {'EUR': 4.3}, {'GPB': 5.14}, {'USD': 3.77}])
(datetime.date(2020, 4, 25), [{'EUR': 4.23}, {'USD': 3.76}, {'EUR': 4.21}, {'GPB': 5.2}, {'USD': 3.77}])
(datetime.date(2020, 4, 27), [{'EUR': 4.27}, {'USD': 3.79}, {'EUR': 4.3}, {'GPB': 5.14}])
(datetime.date(2020, 4, 28), [{'USD': 3.77}, {'EUR': 4.23}])
(datetime.date(2020, 5, 2), [{'USD': 3.76}, {'EUR': 4.21}, {'GPB': 5.2}, {'USD': 3.77}, {'EUR': 4.27}])
(datetime.date(2020, 5, 5), [{'USD': 3.79}, {'EUR': 4.3}, {'GPB': 5.14}, {'USD': 3.77}])

However I'm struggling with merging the values of different currencies to get the format of this data I'd shown.

I'd appreciate any hints.

CodePudding user response:

I would not try to perform all those complex operations in a comprehension. Instead, I would use two for loops that update the nested data structures, using .setdefault() for conciseness:

result = {}
for date, currency_entries in groupby(tuplelist, lambda x: x[0]):
    for _, currency_val, currency_name in currency_entries:
        result.setdefault(date, {}).setdefault(currency_name, []).append(currency_val)

This outputs:

{datetime.date(2020, 4, 20): {'EUR': [4.23, 4.21, 4.27, 4.3], 'USD': [3.76, 3.77, 3.79, 3.77], 'GPB': [5.2, 5.14]}, datetime.date(2020, 4, 25): {'EUR': [4.23, 4.21], 'USD': [3.76, 3.77], 'GPB': [5.2]}, datetime.date(2020, 4, 27): {'EUR': [4.27, 4.3], 'USD': [3.79], 'GPB': [5.14]}, datetime.date(2020, 4, 28): {'USD': [3.77], 'EUR': [4.23]}, datetime.date(2020, 5, 2): {'USD': [3.76, 3.77], 'EUR': [4.21, 4.27], 'GPB': [5.2]}, datetime.date(2020, 5, 5): {'USD': [3.79, 3.77], 'EUR': [4.3], 'GPB': [5.14]}}
  • Related