I am trying to calculate the mean of other values by excluding the focal company. I know it is a little bit complicated but let me explain:
Such as, assume the below code is my data:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c","c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20]}
df = pd.DataFrame(d)
I would like to get (B C)'s mean for 2015 and add it into a new column in A.2016, by considering df.group. So, we need to take mean by df.group for the previous year, by excluding the focal item.
The result should correspond to this:
d = {'col1': ["A", "A", "A", "B", "B", "B", "c", "c", "c", "d", "d", "d", "e", "e", "e"],
'col2': [2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017, 2015, 2016, 2017],
'col3': [10, 20, 25, 10, 12, 14, 8, 9, 10, 50, 60, 70, 40, 50, 60],
'group':[10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 20, 20,20],
'operation':['0', '(B2015 C2015)/2', '(B2016 C2016)/2', '0', '(A2015 C2015)/2', '(A2016 C2016)/2', '0', '(A2015 B2015)/2', '(A2016 B2016)/2',"0", "E2015", "E2016", "0","D2015", "D2016" ],
'mean': [nan, 9, 10.5, nan, 9, 14.5, nan, 10, 16, nan, 40, 50, nan, 50, 60]}
output = pd.DataFrame(d)
>>> output
col1 col2 col3 group operation mean
0 A 2015 10 10 nan 0.0
1 A 2016 20 10 (B2015 C2015)/2 9.0
2 A 2017 25 10 (B2016 C2016)/2 10.5
3 B 2015 10 10 0 0.0
4 B 2016 12 10 (A2015 C2015)/2 9.0
5 B 2017 14 10 (A2016 C2016)/2 14.5
6 c 2015 8 10 0 0.0
7 c 2016 9 10 (A2015 B2015)/2 10.0
8 c 2017 10 10 (A2016 B2016)/2 16.0
9 d 2015 50 20 0 0.0
10 d 2016 60 20 E2015 40.0
11 d 2017 70 20 E2016 50.0
12 e 2015 40 20 0 0.0
13 e 2016 50 20 D2015 50.0
14 e 2017 60 20 D2016 60.0
CodePudding user response:
- Compute the means of all other values within each group using a double
groupby
:
sum
all the values within the group- subtract the current (focal) value
- divide by one less than the number of items in the group
- Assign the
shift
-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0