I have a dataframe of 100 random numbers and I would like to find the mean as follows:
mean0 should have mean of 0,5,10,... rows
mean1 should have mean of 1,6,11,16,.... rows
.
.
. mean4 should have mean of 4,9,14,... rows.
So far, I am able to find the mean0 but I am not able to figure out a way to iterate the process in order to obtain the remaining means.
My code is as follows:
import numpy as np
import pandas as pd
import csv
data = np.random.randint(1, 100, size=100)
df = pd.DataFrame(data)
print(df)
df.to_csv('example.csv', index=False)
df1 = df[::5]
print("Every 12th row is:\n",df1)
df2 = df1.mean()
print(df2)
CodePudding user response:
Since df[::5]
is equivalent to df[0::5]
, you could use df[1::5]
, df[2::5]
, df[3::5]
, and df[4::5]
for the remaining dataframes with subsequent application of mean by df[i::5].mean()
.
It is not explicitly showcased in the Pandas documentation examples but identical list slicing with [start:stop:step]
.
CodePudding user response:
I would use the underlying numpy array:
df[0].to_numpy().reshape(-1, 5).mean(0)
output: array([40.8 , 52.75, 43.2 , 55.05, 47.45])