I want to write a program that evaluates data about a construction site operation from an ASCII table in CSV format file. The template file is an Excel file.
name Qualification costs
Max Mustermann Seller 6.155,39
Max Mustermann Seller 5.069,15
Max Mustermann Seller 362,08
Klee klumper Seller 4.637,65
Klee klumper Seller 1.159,41
Koch Schnerider Project Engineer 1.358,28
Koch Schnerider Project Engineer 679,14
Müller Manim Distribution 15.149,28
Müller Manim Distribution 16.743,94
Schach Matt Site Manager 14.399,79
Schach Matt Site Manager 1.371,41
Zeimetz Kinder Project Engineer 11.376,50
Zeimetz Kinder Project Engineer 2.133,09
The following data should be evaluated:
- Total costs from all operations
- All qualifications with the respective sum of costs
I managed to calculate these two above, but how do I manage the other two?
- The qualification that has the highest total cost
- The qualification that has the lowest total cost
This is my first coding:
import pandas as pd
import os
filename = "site_operation.csv"
path = "."
file = os.path.join(filename, path)
tscv1 = pd.read_csv(file, sep=";", thousands=".", decimal=",", encoding="ansi")
total_cost = tscv1['costs'].sum()
print("Total costs from all operations: ", total_cost)
CodePudding user response:
You can use the groupby function from pandas
for the relative costs, grouped by Qual.
tscv1.groupby('Qualification').sum()
costs
Qualification
Distribution 31893.22
Project Engineer 15547.01
Seller 17383.68
Site Manager 15771.20
# For the min e max values
# an easy way can be sort the results:
sorted_by_qual_value = tscv1.groupby('Qualification').sum().sort_values('costs')
min_qual = sorted_by_qual_value.head(1)
# costs
# Qualification
# Project Engineer 15547.01
max_qual = sorted_by_qual_value.tail(1)
# costs
# Qualification
# Distribution 31893.22