Applying function to groups in dataframe, excluding current row value-CodePudding

Let's say I have the following data:

day	query	num_searches
1	abc	2
1	def	3
2	abc	6
3	abc	5
4	def	1
4	abc	3
5	abc	7
6	abc	8
7	abc	10
8	abc	1

I'd like to generate z-score (excluding the current row's value) such that for query 'abc':

Day 1: [6, 5, 3, 7, 8, 10, 1] (exclude the value 2) zscore = -1.32
Day 2: [2, 5, 3, 7, 8, 10, 1] (exclude the value 6) zscore = 0.28
...
Day 7: [2, 6, 5, 3, 7, 8, 1] (exclude the value 10) zscore = 2.22
Day 8: [2, 6, 5, 3, 7, 8, 10] (exclude the value 1) zscore = -1.88

I have the following function to calculate this 'exclusive' zscore.

def zscore_exclusive(arr):
    newl = []
    for index, val in enumerate(x):
        l = list(x)
        val = l.pop(index)
        arr_popped = np.array(l)
        avg = np.mean(arr_popped)
        stdev = np.std(arr_popped)        
        newl.append((val - avg) / stdev)
    return np.array(newl)

How can I apply this custom function to each grouping (by query string)? Remember, I'd like to pop the currently evaluated element from the series.

CodePudding user response：

Given:

   day query  num_searches
0    1   abc             2
1    1   def             3
2    2   abc             6
3    3   abc             5
4    4   def             1
5    4   abc             3
6    5   abc             7
7    6   abc             8
8    7   abc            10
9    8   abc             1

Doing:

Note!
- For np.std, ddof = 0 by default.
- But for pd.Series.std, ddof = 1 by default.
  - You should be sure which one you want to use.

z_score = lambda x: [(x[i]-x.drop(i).mean())/x.drop(i).std(ddof=0) for i in x.index]
df['z-score'] = df.groupby('query')['num_searches'].transform(z_score)
print(df)

Output:

   day query  num_searches   z-score
0    1   abc             2 -1.319950
1    1   def             3       inf
2    2   abc             6  0.277350
3    3   abc             5 -0.092057
4    4   def             1      -inf
5    4   abc             3 -0.866025
6    5   abc             7  0.661438
7    6   abc             8  1.083862
8    7   abc            10  2.223782
9    8   abc             1 -1.877336