x y
1.2 3.1
1.4 3.5
1.5 3.2
2.2 3.6
2.2 2.8
2.3 3.3
2.4 3.5
2.5 3.8
2.7 3.4
2.8 3.3
Say i have the dataframe above, and I wish to write a function
def ave(pd,minx,maxx):
which calculates the average of the y values for respective x values between minx and maxx, ie in the following example:
ave(file, 2, 3) #where file is wherever I import these x and y values from
it would return 3.3857...
I have tried the following:
def ave(pd,minx,maxx):
x = list(data.iloc[:, 0].values)
y = list(data.iloc[:, 1].values)
lst=[]
for i in x:
if x[i]>xmin and x[i]<xmax:
lst =y[i]
return (sum(lst)/len(list))
but this gives the error: list indices must be integers or slices, not numpy.float64
CodePudding user response:
Why not just select rows where those conditions are true? You really should avoid looping as much as possible when working with dataframes.
def y_average(df, min_x, max_x):
return df[(df["x"] > min_x) & (df["x"] < max_x)]["y"].mean()
Usage:
In [3]: avg(df, 2, 3)
Out[3]: 3.3857142857142857