I want to create the mean of the values in two rows, based on values in a third row. In one row I have ID's A, B and C and means have to be created for the values in two rows with the ID A & B, B & C and A & C. Is there a simple way to do this?
My dataset is as the example below:
station group groupA groupB groupC value time
1 A A nan nan 4 30
1 B nan B nan 7 30
1 C nan nan C 6 30
2 A A nan nan 5 30
2 B nan B nan 3 30
2 C nan nan C 1 30
3 A A nan nan 2 30
3 B nan B nan 3 30
3 C nan nan C 4 30
....
#creating the mean of station 1 in time 30 works with this code.
df['mean_allstations'] = df.groupby(['station','time'])['value'].transform('mean')
#But if I only want to create the mean of station 1 in time 30 for group A and B I tried this, which #gives me a column with nan values
df['mean_AB'] = df.groupby(['station','time', 'group_A', 'group_B'])['value'].transform('mean')
expected outcome:
station group groupA groupB groupC value time meanAB meanAC meanBC ALLme
1 A A nan nan 4 30 5.5 5 6.5 5.6
1 B nan B nan 7 30 5.5 5 6.5 5.6
1 C nan nan C 6 30 5.5 5 6.5 5.6
2 A A nan nan 5 30 4 3 2 3
2 B nan B nan 3 30 4 3 2 3
2 C nan nan C 1 30 4 3 2 3
3 A A nan nan 2 30 2.5 3 3.5 3
3 B nan B nan 3 30 2.5 3 3.5 3
3 C nan nan C 4 30 2.5 3 3.5 3
CodePudding user response:
Try:
from itertools import combinations
columns = ["A", "B", "C"]
g = df.groupby("station")
for c in combinations(columns, 2):
for _, d in g:
x = d.loc[d["group"].isin(c), "value"].mean()
df.loc[d.index, f"mean{c[0]}{c[1]}"] = x
df["ALLme"] = g["value"].transform("mean")
print(df)
Prints:
station group groupA groupB groupC value time meanAB meanAC meanBC ALLme
0 1 A A NaN NaN 4 30 5.5 5.0 6.5 5.666667
1 1 B NaN B NaN 7 30 5.5 5.0 6.5 5.666667
2 1 C NaN NaN C 6 30 5.5 5.0 6.5 5.666667
3 2 A A NaN NaN 5 30 4.0 3.0 2.0 3.000000
4 2 B NaN B NaN 3 30 4.0 3.0 2.0 3.000000
5 2 C NaN NaN C 1 30 4.0 3.0 2.0 3.000000
6 3 A A NaN NaN 2 30 2.5 3.0 3.5 3.000000
7 3 B NaN B NaN 3 30 2.5 3.0 3.5 3.000000
8 3 C NaN NaN C 4 30 2.5 3.0 3.5 3.000000
CodePudding user response:
Another possible solution:
from itertools import combinations
newcols = ['mean' x[0] x[1] for x in combinations(['A','B','C'], 2)] ['ALLme']
# this function calculates the means and returns a dataframe
# with as many rows as x
def f(x):
d = pd.DataFrame([np.apply_along_axis(np.mean, 1, list(
combinations(x, 2))).tolist() [np.mean(x)]], columns=newcols)
return pd.concat([d] * len(x))
df[newcols] = df.groupby('station')['value'].apply(f).reset_index(drop=True)
Output:
station group groupA groupB ... meanAB meanAC meanBC ALLme
0 1 A A NaN ... 5.5 5.0 6.5 5.666667
1 1 B NaN B ... 5.5 5.0 6.5 5.666667
2 1 C NaN NaN ... 5.5 5.0 6.5 5.666667
3 2 A A NaN ... 4.0 3.0 2.0 3.000000
4 2 B NaN B ... 4.0 3.0 2.0 3.000000
5 2 C NaN NaN ... 4.0 3.0 2.0 3.000000
6 3 A A NaN ... 2.5 3.0 3.5 3.000000
7 3 B NaN B ... 2.5 3.0 3.5 3.000000
8 3 C NaN NaN ... 2.5 3.0 3.5 3.000000