Dataframe evntually converts to Excel... Trying to create a additional row with the avg and max above each column.
- Do not want to disturb the original headers for the actual data.
I dont want to hard-code column names as these will change need kind of abstract. I attempted to create a max but failed. I need the max above the column headers.
CodePudding user response:
Try this, I don't know how to create above the dataframe, but I believe that in the end it might be a good solution:
import pandas as pd
df = {
'date and time':['2022-03-01', '2022-03-02', '2022-03-03', '2022-03-04'],
'<PowerAC--->':[40, 20, 9, 12]
}
df = pd.DataFrame(df)
cols = ['<PowerAC--->']
agg = (df[cols].agg(['mean', max]))
out = pd.concat([df, agg])
print(out)
CodePudding user response:
A one-liner method which also remove the "NaN" values to make it visually better (I'm a bit OCD ;))
df.append(df.agg({'<PowerAC--->' : ['mean', max]})).fillna('')
CodePudding user response:
I would say it's a good idea to keep your data separated from the reporting on it - I don't really see the logic for an "additional row above the column".
I would generate statistics for the overall data as a separate dataframe.
import pandas as pd
import numpy as np
np.random.seed(1)
t = pd.date_range(start='2022-05-31', end='2022-06-07')
x = np.random.rand(len(t))
df = pd.DataFrame({'date': t, 'data': x})
print(df)
# The 'numeric_only=False' behaviour will become default in a future version of pandas
d_mean = df.mean(numeric_only=False)
d_max = df.max()
# We need to transpose this, as the `d_mean` and `d_max` are Series (columns), and we want them as rows
df_stats = pd.DataFrame({'mean': d_mean, 'max':d_max}).transpose()
print(df_stats)
df
output:
date data
0 2022-05-31 0.417022
1 2022-06-01 0.720324
2 2022-06-02 0.000114
3 2022-06-03 0.302333
4 2022-06-04 0.146756
5 2022-06-05 0.092339
6 2022-06-06 0.186260
7 2022-06-07 0.345561
df_stats
output:
date data
mean 2022-06-03 12:00:00 0.276339
max 2022-06-07 00:00:00 0.720324
You could add this and the dataframe together with
pd.concat([df_stats, df])
which looks like
date data
mean 2022-06-03 12:00:00 0.276339
max 2022-06-07 00:00:00 0.720324
0 2022-05-31 00:00:00 0.417022
1 2022-06-01 00:00:00 0.720324
2 2022-06-02 00:00:00 0.000114
3 2022-06-03 00:00:00 0.302333
4 2022-06-04 00:00:00 0.146756
5 2022-06-05 00:00:00 0.092339
6 2022-06-06 00:00:00 0.18626
7 2022-06-07 00:00:00 0.345561
but I would keep them separate unless you've got a very good reason to.
There may be some way which makes sense using a multi-index, but that's a bit beyond me, and probably beyond the scope of this question.
Edit: If you don't infer any meaning from the max and mean of the date column but still want something compatiable with that column (i.e. still a datetime
but effectively null) you could replace it by np.datetime64['NaT']
(NaT similar to NaN, but "not a time"):
df_stats['date'] = np.datetime64['NaT']
print(pd.concat([df_stats, df]).head())
output:
date data
mean NaT 0.276339
max NaT 0.720324
0 2022-05-31 0.417022
1 2022-06-01 0.720324
2 2022-06-02 0.000114