Home > database >  Computing daily averages of multiple columns in Pandas Dataframe, then inserting a row with them
Computing daily averages of multiple columns in Pandas Dataframe, then inserting a row with them

Time:06-18

I've inputted a CSV file using Pandas to a data frame and want to calculate the daily averages of the columns within it. There are 2 columns that won't be averaged, time and date (with the date being what the data is grouped together from). I then want to insert a row below the last row of that date, with 'Average' in the date column (nothing in the time column) and then the corresponding averages in the correct columns. I've only been using Pandas for a little while and so I'm unsure what the easiest way to go about this is. Any help would be greatly appreciated.

CodePudding user response:

I don't understand quite what you want (specifically with "insert a row below the last row of that date, with 'Average' in the date column"). However, you can create a column of averages using .mean(). Ex below:

student1 = {
    'name':'Calvin',
    'math': 87,
    'literature': 100,
    'chemistry': 84
}
student2 = {
    'name':'Garfield',
    'math': 40,
    'literature': 75,
    'chemistry': 93
}


grades = pd.DataFrame([student1, student2], index=['Calvin', 'Garfield'])
print(grades)

#The code below makes the new column with averages
grades['Avg grade'] = grades[['math', 'literature', 'chemistry']].mean(axis=1)
print(grades)

CodePudding user response:

What you are trying to do is slightly inconsistent with the idea of having each column in a pandas dataframe have a homogeneous data type, and seems to be mixing the data capabilities of dataframes with the formatting needs of your use case.

With that caveat, here is a way to do what your question asks:

import pandas as pd
df = pd.DataFrame({
'Date':[1,1,1,2,2,2,3,3,3,3],
'Time':[9,12,15,9,12,15,9,12,15,18],
'Col1':[1,2,3,4,5,6,7,8,9,10],
'Col2':[9,8,7,6,5,4,3,2,1,0]
})
print(df)

df2 = df.drop(columns=['Time']).groupby('Date').mean().join(df[['Date','Time']].groupby('Date').count(), on='Date')
mx = df2.Time.max()
stepup = 10
while stepup <= mx:
    stepup *= 10

foo = []
for i, count in zip(df2.index, df2.Time):
    foo  = [(i - 1) * stepup   y   1 for y in range(count)]
df.index = foo
df = df.reindex(foo)
dataCols = [x for x in df.columns if x not in ('Date', 'Time')]
for i in df2.index:
    df.loc[i*stepup, ['Date']   dataCols] = ['Average']   list(df2.loc[i, dataCols])

df = df.sort_index().reset_index(drop=True).fillna('')
print(df)

Input:

   Date  Time  Col1  Col2
0     1     9     1     9
1     1    12     2     8
2     1    15     3     7
3     2     9     4     6
4     2    12     5     5
5     2    15     6     4
6     3     9     7     3
7     3    12     8     2
8     3    15     9     1
9     3    18    10     0

Output:

       Date  Time  Col1  Col2
0       1.0   9.0   1.0   9.0
1       1.0  12.0   2.0   8.0
2       1.0  15.0   3.0   7.0
3   Average         2.0   8.0
4       2.0   9.0   4.0   6.0
5       2.0  12.0   5.0   5.0
6       2.0  15.0   6.0   4.0
7   Average         5.0   5.0
8       3.0   9.0   7.0   3.0
9       3.0  12.0   8.0   2.0
10      3.0  15.0   9.0   1.0
11      3.0  18.0  10.0   0.0
12  Average         8.5   1.5
  • Related