i have data frame like this,
id | value |
---|---|
a | 2 |
a | 4 |
a | 3 |
a | 5 |
b | 1 |
b | 4 |
b | 3 |
c | 1 |
c | nan |
c | 5 |
the resulted data frame contain new column ['average'] and to get its values will be:
- make group-by(id)
- first row in 'average' column per each group is equal to its corresponding value in 'value'
- other rows in ' average' in group is equal to mean for all previous rows in 'value'(except current value)
- for more explain in group of (a) the "average' of index
the resulted data frame must be :
id | value | average |
---|---|---|
a | 2 | 2 |
a | 4 | 2 |
a | 3 | 3 |
a | 5 | 3 |
b | 1 | 1 |
b | 4 | 1 |
b | 3 | 2.5 |
c | 1 | 1 |
c | nan | 1 |
c | 5 | 1 |
CodePudding user response:
You can group the dataframe by id
, then calculate the expanding mean for value
column for each groups, then shift the expanding mean and get it back to the original dataframe, once you have it, you just need to ffill
on axis=1
on for the value
and average
columns to get the first value for the categories:
out = (df
.assign(average=df
.groupby(['id'])['value']
.transform(lambda x: x.expanding().mean().shift(1))
)
)
out[['value', 'average']] = out[['value', 'average']].ffill(axis=1)
OUTPUT:
id value average
0 a 2.0 2.0
1 a 4.0 2.0
2 a 3.0 3.0
3 a 5.0 3.0
4 b 1.0 1.0
5 b 4.0 1.0
6 b 3.0 2.5
7 c 1.0 1.0
8 c NaN 1.0
9 c 5.0 1.0
CodePudding user response:
Here is a solution which, I think, satisfies the requirements. Here, the first row in a group of id
s is simply passing its value to the average
column. For every other row, we take the average where the index is smaller than the current index.
You may want to specify how you want to handle the NaN values. In the below, I set them to None
so that they are ignored.
import numpy as np
from numpy import average
import pandas as pd
df = pd.DataFrame([
['a', 2],
['a', 4],
['a', 3],
['a', 5],
['b', 1],
['b', 4],
['b', 3],
['c', 1],
['c', np.NAN],
['c', 5]
], columns=['id', 'value'])
# Replace the NaN value with None
df['value'] = df['value'].replace(np.nan, None)
id_groups = df.groupby(['id'])
id_level_frames = []
for group, frame in id_groups:
print(group)
# Resets the index for each id-level frame
frame = frame.reset_index()
for index, row in frame.iterrows():
# If this is the first row:
if index== 0:
frame.at[index, 'average'] = row['value']
else:
current_index = index
earlier_rows = frame[frame.index < index]
frame.at[index, 'average'] = average(earlier_rows['value'])
id_level_frames.append(frame)
final_df = pd.concat(id_level_frames)