Pandas - Apply convex combinations to rows with exact same categorical features-CodePudding

So I have 2 pandas dataframes with their dtypes. I want to be able to apply a convex combination (So given some values x1 and x2 a convex combination is one where L is in [0,1] and L*x1 (1-L)*x2) to the none categorical features between all rows that have the same categorical features EXCEPT itself. Also there shouldn't be any duplicates(i.e. one row convex combo'd with another row multiple times). So for example:

Is taco?  Count  
yes        2  
yes        5
yes        1

Where Is taco? is dtype category and Count is dtype Int. x1 and x2 can be a vector of numerical features, but in the above case it's just 2 different rows of Count. There is only one categorical feature above which is Is taco? and they're all the same so we do the convex combination between all rows. If L=0.5 it should return

idx Is taco?  Count  
0   yes        3.5  
1   yes        1.5
2   yes        3

idx=0 was calculated by 1st and 2nd row. So 0.5 * 2 0.5 * 5 = 3.5. Then idx=1 calculated by 1st and 3rd row so 0.5 * (1 2) = 1.5. So as you can see the non-categorical features are combined via a convex combination. How can I do this with Pandas?

CodePudding user response：

Use itertools.combinations:

from itertools import combinations

func = lambda x: np.sum(np.array(list(combinations(x, r=len(x)-1))) * 0.5, axis=1)

out = df.groupby('Is taco?')['Count'] \
        .apply(func).explode().reset_index()

Output:

>>> out
  Is taco? Count
0      yes   3.5
1      yes   1.5
2      yes   3.0

Another example:

df = pd.DataFrame({'Is taco?': ['no', 'no', 'no', 'yes', 'yes', 'yes', 'yes'],
                   'Count': [1, 3, 5, 3, 6, 9, 12]})
print(df)

# Output:
  Is taco?  Count
0       no      1
1       no      3
2       no      5
3      yes      3
4      yes      6
5      yes      9
6      yes     12

# After combinations
>>> out
  Is taco? Count
  Is taco? Count
0       no   2.0
1       no   3.0
2       no   4.0
3      yes   9.0
4      yes  10.5
5      yes  12.0
6      yes  13.5