I've to write a function (column_means), that calculates the mean of each column from Dataframe and give me a list of means at the end. I'm not allowed to use the mean function .mean(), so I'm implementing the general formula of the mean: sum(x_i)/Number of elements.
This is my code:
df = pd.DataFrame({'a':[1,2,3], 'b': [4,5,6]})
def column_means(df):
means = []
for i,n in zip(df.columns, df.shape[0]):
means [n] = sum(df[i])/ df.shape[0]
return means
It doesn't work as intended. could you please help me and tell me, what are my mistakes?
Thank you in advance.
CodePudding user response:
You are iterating over int in zip function, as df.shape[0]
is returning single integer and not an iterable datatype.
So you can simply do as following:
def column_means(df):
means = []
for i in df.columns:
means.append(sum(df[i]) / df.shape[0])
return means
And if you want mean to be just an integer instead of float, you can just do sum(df[i]) // df.shape[0]
I hope this answers your question.
CodePudding user response:
Do you want the mean of each column? You have to be careful if they don't have the exact same length:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b': [4,5,6]})
def column_means(df):
means = []
for i,n in enumerate(df.columns):
means.append(sum(df[n])/len(df[n]))
return means
print(column_means(df))
You can also use the mean method of pd DataFrame
df.mean()
CodePudding user response:
change the first df.shape[0]
to df.index
and the assignment line.
def column_means(df):
means = []
for i,n in zip(df.columns, df.index):
means.append(sum(df[i])/ df.shape[0])
return means
CodePudding user response:
If the only thing you're not allowed to use is the df.mean()
function, then you could do:
def column_means(df):
return df.sum(axis=0).div(df.shape[0]).to_list()
Sum over the columns, divide the result by the number of rows, and convert it to a list.