Optimising two for loops using list comprehension in python(without pandas)-CodePudding

I am a newbie into python and I am trying to optimize a snippet of my program from using two for loops to using the list comprehension in Python3. While doing this, I am unable to calculate the sum of more than one column. For Ex, if I have columns 1,2, and 3 of types float int, and string in a dictionary, I am able to calculate the sum only for column one and not for column 2.

The double for loop looks something like this

final_dict = []
    for k, g in itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id'])):
        txn_amount = 0
        distinct_user_count = 0
        for v in g:
            # print(k)
            txn_amount  = float(v['transaction_amount'])
            distinct_user_count  = v['user_id_count']
            # print(v)

        final_dict.append({'transaction_category_id': k, 'aggregated_transaction_amount': txn_amount,
                           'distinct_user_count': distinct_user_count})

The code I want to optimise to should ideally look something like this :

final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
                    itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]

But the code does not add up values for the user_id_count column and return sum as 0.

The sample data looks something like this :

user_id,transaction_amount,transaction_category_id
b2d30a62-36bd-41c6-8221-987d5c4cd707,63.05,3
b2d30a62-36bd-41c6-8221-987d5c4cd707,13.97,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,97.15,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,23.54,5

and the ideal output would look like :

['4', 111.12, 2],
['3', 63.05, 1],
['5', 23.54, 1],

but it prints out and the ideal output would look like :

['4', 111.12, 0],
['3', 63.05, 0],
['5', 23.54, 0],

I tried the below sample code out but the output is not what I expected :

final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
                    itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]

CodePudding user response：

You seem to have a very simple comma-delimited CSV file. To get output similar to that shown in the question you could do this:

from collections import defaultdict

FILENAME = '/Volumes/G-Drive/sample.csv'

def gendict():
    return {'amount': 0.0, 'count': 0}

summary = defaultdict(gendict)

with open(FILENAME) as csv:
    next(csv, None) # skip column headers
    for line in map(str.rstrip, csv):
        *_, _amount, _id = line.split(',')
        summary[_id]['amount']  = float(_amount)
        summary[_id]['count']  = 1

slist = [[k, *v.values()] for k, v in summary.items()]

print(slist)

Output:

[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]

CodePudding user response：

Alternatively, you can use csv.DictReader with collections.defaultdict to simulate the groupby.agg of pandas:

from csv import DictReader
from collections import defaultdict

out = defaultdict(lambda: [0,0])

with open("tmp/Ashish Shetty.csv", "r") as csvfile:
    reader = DictReader(csvfile)
    for row in reader:
        tra_id = str(row["transaction_category_id"])
        out[tra_id][0]  = float(row["transaction_amount"])
        out[tra_id][1]  = 1

out = [[k] v for k,v in out.items()]

Output :

print(out)
#[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]