Home > Software engineering >  Calculate aggregated values from Excel CSV file
Calculate aggregated values from Excel CSV file

Time:09-22

I have a finished Excel sheet with 20 rows of name, date of birth, class year (fresh, soph, junior, senior), GPA.

I need to write a program that calculates the following:

  • Number of students per year
  • Average age per year
  • Average GPA per year

Do you have any suggestions on how to achieve these results?

CodePudding user response:

Stack Overflow is generally used to get answers for specific questions, rather than general programming advice.

My high level advice is to read the data from the excel sheet, into a list of dictionaries. From there you can iterate over the list of dictionaries to find the information you're looking for.

Alternatively you can use Pandas but that is generally used for large data sets and computationally expensive operations.

CodePudding user response:

There are a few different questions here that are worth answering because this may be applicable to more people.

Specifically, the questions are:

  1. Read excel data in python
  2. Calculate count based on a column
  3. Calculate averages based on a column
  4. Calculate age based on date of birth (this will depend a bit on how you've formatted DOB)

1- Use the pandas module

2-4* See below:

import pandas
import numpy

data = pandas.read_excel(r'D:\User Files\Downloads\73802797-file.xlsx')
print('\nExcel data')
print(data)

print('\nNumber of students per year')
#https://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column
print(data['Class Year'].value_counts())

print('\nAverage age per year')
#https://stackoverflow.com/questions/31490816/calculate-datetime-difference-in-years-months-etc-in-a-new-pandas-dataframe-c
data['Age'] = (pandas.Timestamp.now() - data['DOB']) / numpy.timedelta64(1, 'Y')
print(data['Age'])
print(data.groupby('Class Year')['Age'].mean())

print('\nAverage GPA per year')
#https://stackoverflow.com/questions/30482071/how-to-calculate-mean-values-grouped-on-another-column-in-pandas
print(data.groupby('Class Year')['GPA'].mean())

Running this produces the following output:


Excel data
        Name        DOB Class Year  GPA
0  Redeye438 2008-09-22      Fresh    1
1  Redeye439 2009-09-20       Soph    2
2  Redeye440 2010-09-22     Junior    3
3  Redeye441 2011-09-20     Senior    4
4  Redeye442 2008-09-20      Fresh    4
5  Redeye443 2009-09-22       Soph    3

Number of students per year
Soph      2
Fresh     2
Junior    1
Senior    1
Name: Class Year, dtype: int64

Average age per year
Class Year
Fresh     14.000320
Junior    11.998910
Senior    11.005050
Soph      13.000984
Name: Age, dtype: float64

Average GPA per year
Class Year
Fresh     2.5
Junior    3.0
Senior    4.0
Soph      2.5
Name: GPA, dtype: float64
  • Related