I am looking to summarize data by Unit, Period and SupplierID. For each Unique Unit and Period Combo return the SupplierID with the largerst Amount. This would sum the amounts for a SupplierID in a given Unit/period. So in the case below it should return:
5400,6 - 7957.33
6300,7 - 9801.10
I would like to accomplish this by iterating over the data WITHOUT using Numpy or Pandas as I am trying to understand the logic. I am stuck on the logic part, how to iterate over this and store the data. Below I was trying dictionaries but not sure that is the best way... Here is a short data sample:
cv = "Unit,SupplierID,ReceivedPeriod,Amount/n5400,0213123,6,3450.87/n5400,0521332,6,5902.21/n5400,0213123,6,4506.46/n6300,0293899,7,9801.10/n6300,0381923,7,6203.76"
line = cv.split("/n")
d = {}
for l in line:
unit,id,period,amount = l.split(",")
k1 = unit "," period
k2 = id
if k1 in d:
if k2 in d[k1]:
up = d[k1][id]
nup = float(amount) float(up)
d[k1][id] = round(nup, 2)
else:
d[k1] = {[id]:amount}
The way I am going about this seems to complicated, is there a better way to go about this when iterating rows?
CodePudding user response:
I changed your string to use real newlines instead of your fake "/n", so the code can easily move to reading a file. I also added code to skip over the header line. This would be modestly easier using the "csv" module to parse the string. It would also be a bit shorter by using DefaultDict
instead of a straight dict.
cv = "Unit,SupplierID,ReceivedPeriod,Amount\n5400,0213123,6,3450.87\n5400,0521332,6,5902.21\n5400,0213123,6,4506.46\n6300,0293899,7,9801.10\n6300,0381923,7,6203.76"
d = {}
for l in cv.splitlines():
unit,id,period,amount = l.split(",")
if unit == 'Unit':
continue
amount = float(amount)
k1 = unit "," period
if k1 in d:
if id in d[k1]:
d[k1][id] = amount
else:
d[k1][id] = amount
else:
d[k1] = {id:amount}
print(d)
for k,v in d.items():
largest = max(v.values())
print(k,largest)
Output:
{'5400,6': {'0213123': 7957.33, '0521332': 5902.21}, '6300,7': {'0293899': 9801.1, '0381923': 6203.76}}
5400,6 7957.33
6300,7 9801.1