Home > Software engineering >  Python iterate over csv rows with multiple factors
Python iterate over csv rows with multiple factors

Time:09-30

I am looking to summarize data by Unit, Period and SupplierID. For each Unique Unit and Period Combo return the SupplierID with the largerst Amount. This would sum the amounts for a SupplierID in a given Unit/period. So in the case below it should return:

5400,6 - 7957.33

6300,7 - 9801.10

I would like to accomplish this by iterating over the data WITHOUT using Numpy or Pandas as I am trying to understand the logic. I am stuck on the logic part, how to iterate over this and store the data. Below I was trying dictionaries but not sure that is the best way... Here is a short data sample:

cv = "Unit,SupplierID,ReceivedPeriod,Amount/n5400,0213123,6,3450.87/n5400,0521332,6,5902.21/n5400,0213123,6,4506.46/n6300,0293899,7,9801.10/n6300,0381923,7,6203.76"
line = cv.split("/n")
d = {}
for l in line:
    unit,id,period,amount = l.split(",")
    k1 = unit   ","   period
    k2 = id
    if k1 in d:
        if k2 in d[k1]:
        up = d[k1][id]
        nup = float(amount)   float(up)
        d[k1][id] = round(nup, 2)
    else:
        d[k1] = {[id]:amount}

The way I am going about this seems to complicated, is there a better way to go about this when iterating rows?

CodePudding user response:

I changed your string to use real newlines instead of your fake "/n", so the code can easily move to reading a file. I also added code to skip over the header line. This would be modestly easier using the "csv" module to parse the string. It would also be a bit shorter by using DefaultDict instead of a straight dict.

cv = "Unit,SupplierID,ReceivedPeriod,Amount\n5400,0213123,6,3450.87\n5400,0521332,6,5902.21\n5400,0213123,6,4506.46\n6300,0293899,7,9801.10\n6300,0381923,7,6203.76"
d = {}
for l in cv.splitlines():
    unit,id,period,amount = l.split(",")
    if unit == 'Unit':
        continue
    amount = float(amount)
    k1 = unit   ","   period
    if k1 in d:
        if id in d[k1]:
            d[k1][id]  = amount
        else:
            d[k1][id] = amount
    else:
        d[k1] = {id:amount}
print(d)

for k,v in d.items():
    largest = max(v.values())
    print(k,largest)

Output:

{'5400,6': {'0213123': 7957.33, '0521332': 5902.21}, '6300,7': {'0293899': 9801.1, '0381923': 6203.76}}
5400,6 7957.33
6300,7 9801.1
  • Related