Home > front end >  How can I make a new column that does calculations but first selects them by my id column?
How can I make a new column that does calculations but first selects them by my id column?

Time:06-19

I would like to do calculations on my x column and make a new column, for example lets try to determine the rolling standard deviation, I know how to calculate that for the full column:

df['std'] = df.x.rolling(2).std

Example of original dataframe:

id x 
1  10
1  20
1  5
1  5
2  30
2  50
2  20
2  30
3  10
3  5
3  6
3  6
3  6
3  8

But I want that it will do this calculation with selecting them first by the id column.

Because now I first have to cut the data frame in separate data frames, and then paste them together after, and I think there should be a better way. But I do not know how to express this in a question so I can Google it.

So I would like to do the following:

df['std'] = df.x[id = 1].rolling(2).std() 
          = df.x[id = 2].rolling(2).std()
          = df.x[id = 3].rolling(2).std() 

I know this is not a correct way but I try to show what I want to achieve

CodePudding user response:

As you are filtering for each "id", you can use GroupBy:

df.groupby("id")["x"].rolling(2).std()
#Out[7]: 
#id    
#1   0           NaN
#    1      7.071068
#    2     10.606602
#    3      0.000000
#2   4           NaN
#    5     14.142136
#    6     21.213203
#    7      7.071068
#3   8           NaN
#    9      3.535534
#    10     0.707107
#    11     0.000000
#    12     0.000000
#    13     1.414214
#Name: x, dtype: float64

To append it as another column, you need to first drop the "id" groups from the index:

df["std"] = df.groupby("id")["x"].rolling(2).std().reset_index(0, drop=True)

CodePudding user response:

Just loop over the dataframe rows, perform your calculation, and store the calculated value to dataframe

for index, row in df.iterrows():
print (row ['id'], row ['x'])
id=row ['id']
x = row ['x']
#perform you calculation and join with the previous df
  • Related