I would like to do calculations on my x column and make a new column, for example lets try to determine the rolling standard deviation, I know how to calculate that for the full column:
df['std'] = df.x.rolling(2).std
Example of original dataframe:
id x
1 10
1 20
1 5
1 5
2 30
2 50
2 20
2 30
3 10
3 5
3 6
3 6
3 6
3 8
But I want that it will do this calculation with selecting them first by the id column.
Because now I first have to cut the data frame in separate data frames, and then paste them together after, and I think there should be a better way. But I do not know how to express this in a question so I can Google it.
So I would like to do the following:
df['std'] = df.x[id = 1].rolling(2).std()
= df.x[id = 2].rolling(2).std()
= df.x[id = 3].rolling(2).std()
I know this is not a correct way but I try to show what I want to achieve
CodePudding user response:
As you are filtering for each "id", you can use GroupBy:
df.groupby("id")["x"].rolling(2).std()
#Out[7]:
#id
#1 0 NaN
# 1 7.071068
# 2 10.606602
# 3 0.000000
#2 4 NaN
# 5 14.142136
# 6 21.213203
# 7 7.071068
#3 8 NaN
# 9 3.535534
# 10 0.707107
# 11 0.000000
# 12 0.000000
# 13 1.414214
#Name: x, dtype: float64
To append it as another column, you need to first drop the "id" groups from the index:
df["std"] = df.groupby("id")["x"].rolling(2).std().reset_index(0, drop=True)
CodePudding user response:
Just loop over the dataframe rows, perform your calculation, and store the calculated value to dataframe
for index, row in df.iterrows():
print (row ['id'], row ['x'])
id=row ['id']
x = row ['x']
#perform you calculation and join with the previous df