Average of two rows based on grouped columns-CodePudding

I want to create the mean of the values in two rows, based on values in a third row. In one row I have ID's A, B and C and means have to be created for the values in two rows with the ID A & B, B & C and A & C. Is there a simple way to do this?

My dataset is as the example below:

station group groupA groupB groupC value time
1       A     A       nan    nan   4     30
1       B     nan     B      nan   7     30
1       C     nan     nan    C     6     30
2       A     A       nan    nan   5     30
2       B     nan     B      nan   3     30
2       C     nan     nan    C     1     30
3       A     A       nan    nan   2     30
3       B     nan     B      nan   3     30
3       C     nan     nan    C     4     30
....

#creating the mean of station 1 in time 30 works with this code.

df['mean_allstations'] = df.groupby(['station','time'])['value'].transform('mean')

#But if I only want to create the mean of station 1 in time 30 for group A and B I tried this, which #gives me a column with nan values

df['mean_AB'] = df.groupby(['station','time', 'group_A', 'group_B'])['value'].transform('mean')

expected outcome:

station group groupA groupB groupC value time meanAB meanAC meanBC ALLme
1       A     A       nan    nan   4     30   5.5     5     6.5     5.6
1       B     nan     B      nan   7     30   5.5     5     6.5     5.6
1       C     nan     nan    C     6     30   5.5     5     6.5     5.6
2       A     A       nan    nan   5     30   4       3     2       3
2       B     nan     B      nan   3     30   4       3     2       3
2       C     nan     nan    C     1     30   4       3     2       3
3       A     A       nan    nan   2     30   2.5     3     3.5     3
3       B     nan     B      nan   3     30   2.5     3     3.5     3
3       C     nan     nan    C     4     30   2.5     3     3.5     3

CodePudding user response：

Try:

from itertools import combinations

columns = ["A", "B", "C"]

g = df.groupby("station")
for c in combinations(columns, 2):
    for _, d in g:
        x = d.loc[d["group"].isin(c), "value"].mean()
        df.loc[d.index, f"mean{c[0]}{c[1]}"] = x
df["ALLme"] = g["value"].transform("mean")

print(df)

Prints:

   station group groupA groupB groupC  value  time  meanAB  meanAC  meanBC     ALLme
0        1     A      A    NaN    NaN      4    30     5.5     5.0     6.5  5.666667
1        1     B    NaN      B    NaN      7    30     5.5     5.0     6.5  5.666667
2        1     C    NaN    NaN      C      6    30     5.5     5.0     6.5  5.666667
3        2     A      A    NaN    NaN      5    30     4.0     3.0     2.0  3.000000
4        2     B    NaN      B    NaN      3    30     4.0     3.0     2.0  3.000000
5        2     C    NaN    NaN      C      1    30     4.0     3.0     2.0  3.000000
6        3     A      A    NaN    NaN      2    30     2.5     3.0     3.5  3.000000
7        3     B    NaN      B    NaN      3    30     2.5     3.0     3.5  3.000000
8        3     C    NaN    NaN      C      4    30     2.5     3.0     3.5  3.000000

CodePudding user response：

Another possible solution:

from itertools import combinations

newcols = ['mean'   x[0]  x[1] for x in combinations(['A','B','C'], 2)]   ['ALLme']

# this function calculates the means and returns a dataframe
# with as many rows as x
def f(x):
    d = pd.DataFrame([np.apply_along_axis(np.mean, 1, list(
        combinations(x, 2))).tolist()   [np.mean(x)]], columns=newcols)

    return pd.concat([d] * len(x))

df[newcols] = df.groupby('station')['value'].apply(f).reset_index(drop=True)

Output:

   station group groupA groupB  ... meanAB  meanAC  meanBC     ALLme
0        1     A      A    NaN  ...    5.5     5.0     6.5  5.666667
1        1     B    NaN      B  ...    5.5     5.0     6.5  5.666667
2        1     C    NaN    NaN  ...    5.5     5.0     6.5  5.666667
3        2     A      A    NaN  ...    4.0     3.0     2.0  3.000000
4        2     B    NaN      B  ...    4.0     3.0     2.0  3.000000
5        2     C    NaN    NaN  ...    4.0     3.0     2.0  3.000000
6        3     A      A    NaN  ...    2.5     3.0     3.5  3.000000
7        3     B    NaN      B  ...    2.5     3.0     3.5  3.000000
8        3     C    NaN    NaN  ...    2.5     3.0     3.5  3.000000