How to sum the highest/lowest total cost from a specific row and column in a excel file?-CodePudding

I want to write a program that evaluates data about a construction site operation from an ASCII table in CSV format file. The template file is an Excel file.

name          Qualification           costs
Max Mustermann  Seller              6.155,39
Max Mustermann  Seller              5.069,15
Max Mustermann  Seller              362,08
Klee klumper    Seller              4.637,65
Klee klumper    Seller              1.159,41
Koch Schnerider Project Engineer    1.358,28
Koch Schnerider Project Engineer    679,14
Müller Manim    Distribution        15.149,28
Müller Manim    Distribution        16.743,94
Schach Matt     Site Manager        14.399,79
Schach Matt     Site Manager        1.371,41
Zeimetz Kinder  Project Engineer    11.376,50
Zeimetz Kinder  Project Engineer    2.133,09

The following data should be evaluated:

Total costs from all operations
All qualifications with the respective sum of costs

I managed to calculate these two above, but how do I manage the other two?

The qualification that has the highest total cost
The qualification that has the lowest total cost

This is my first coding:

import pandas as pd
import os

filename = "site_operation.csv"
path = "."
file = os.path.join(filename, path)
tscv1 = pd.read_csv(file, sep=";", thousands=".", decimal=",", encoding="ansi")

total_cost = tscv1['costs'].sum()
print("Total costs from all operations: ", total_cost)

CodePudding user response：

You can use the groupby function from pandas

for the relative costs, grouped by Qual.
tscv1.groupby('Qualification').sum()

                         costs
Qualification                 
 Distribution         31893.22
 Project Engineer     15547.01
 Seller               17383.68
 Site Manager         15771.20


# For the min e max values 
# an easy way can be sort the results:

sorted_by_qual_value = tscv1.groupby('Qualification').sum().sort_values('costs')
min_qual = sorted_by_qual_value.head(1)

#                         costs
# Qualification                 
#  Project Engineer     15547.01

max_qual = sorted_by_qual_value.tail(1)

#                         costs
# Qualification                 
#  Distribution         31893.22