Home > Back-end >  Python CSV sum value if they have same ID/name
Python CSV sum value if they have same ID/name

Time:03-05

I want to sum all values that have the same name / ID in a csv file

Right now I am only looking for ID with the name 'company'

csv file format:

company A, 100 
company B, 200
company A, 300 

The end result I am looking for is:

company A, 400
company B, 200
total: 600

My code so far:

import csv

name = ''
num = ''
total = 0

with open('xx.csv', 'r', newline='') as csvfile:
    reader = csv.reader(csvfile)
    next(csvfile)

    for a in reader:
        if a[0].__contain__('company'):
            name = (a[0])
            num = (a[1])
            total  = float(a[1])
            print(str(name)   ', '   str(num))

            print('total: '   str(total))

CodePudding user response:

First, CSV typically have commas, and the delimiter for csv.reader must be a single character, so I suggest updating your file to properly use commas.

Secondly, to aggregate the companies, you need to store them as you iterate the file. Easiest way is to use a dictionary type.

Then only after you've aggregated everything, should you create a second loop to go over the aggregated values, then print the final total.

import csv
from collections import defaultdict

totals = defaultdict(int)
total = 0

with open('companies.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    # next (csvfile)  # shown file has no header
   
    for row in reader:
      if not row[0].startswith('company'):
        continue
      name, value = row
      totals[name]  = float(value)

total = 0
for name, value in totals.items():
    print(f'{name},{value}')
    total  = value

print(f'total: {total}')

CodePudding user response:

You don't necessarily need to use csv module here. Just read every single line split them from right (rsplit) and fill a dictionary like below:

d = {}

with open('your_file.csv') as f:
    # next(f) -  If header needs to be skipped
    for line in f:
        name, value = line.rsplit(',', maxsplit=1)
        d[name] = d.get(name, 0)   int(value)

for k, v in d.items():
    print(f"{k}, {v}")

print(f"total: {sum(d.values())}")

output:

company A, 400
company B, 200
total: 600

In order not to iterate again through the dictionary's values to calculate the total(I mean in sum(d.values()) expression), you can do add to total while you are printing the items like:

d = {}

with open('new.csv') as f:
    for line in f:
        name, value = line.rsplit(',', maxsplit=1)
        d[name] = d.get(name, 0)   int(value)

total = 0
for k, v in d.items():
    total  = v
    print(f"{k}, {v}")

print(f"total: {total}")
  • Related