I'm working with csv files.
I'd like a to create a continuously updated average of a sequence. ex;
I'd like to output the average of each individual value of a list
list; [a, b, c, d, e, f]
formula:
(a)/1= ?
(a b)/2=?
(a b c)/3=?
(a b c d)/4=?
(a b c d e)/5=?
(a b c d e f)/6=?
To demonstrate:
if i have a list; [1, 4, 7, 4, 19]
my output should be; [1, 2.5, 4, 4, 7]
explained;
(1)/1=1
(1 4)/2=2.5
(1 4 7)/3=4
(1 4 7 4)/4=4
(1 4 7 4 19)/5=7
As far as my python file it is a simple code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('somecsvfile.csv')
x = [] #has to be a list of 1 to however many rows are in the "numbers" column, will be a simple [1, 2, 3, 4, 5] etc...
#x will be used to divide the numbers selected in y to give us z
y = df[numbers]
z = #new dataframe derived from the continuous average of y
plt.plot(x, z)
plt.show()
If numpy is needed that is no problem.
CodePudding user response:
You can use cumsum
to get cumulative sum and then divide to get the running average.
x = np.array([1, 4, 7, 4, 19])
np.cumsum(x)/range(1,len(x) 1)
print (z)
output:
[1. 2.5 4. 4. 7. ]
CodePudding user response:
pandas.DataFrame.expanding is what you need.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.expanding.html
Using it you can just pass df.expanging().mean() to get the result you want.
mean = df.expanding().mean()
print(mean)
Out[10]:
0 1.0
1 2.5
2 4.0
3 4.0
4 5.0
if you want to do it just in one column pass it instead of df. like df['column_name'].expanding().mean()