I have a dataframe with the below specs
| ID | Name| count |
| -- |---- | ---- |
| 1 | A | 75 |
| 2 | B | 10 |
| 3 | A | 15 |
| 4 | A | 10 |
| 5 | A | 5 |
| 6 | A | 3 |
If I set the threshold for the count to be 15, I want the below rows to get added up uniformly. So the output should be
| ID | Name | count |
| -- |---- | ---- |
| 1 | A | 15 |
| 2 | B | 10 |
| 3 | A | 30 |
| 4 | A | 25 |
| 5 | A | 20 |
| 6 | A | 18 |
75 from ID 1 gets added up based on group "Name" and it is always based on threshold value. Please advice
CodePudding user response:
IIUC you can do something like:
def distribute(sr, thres=15):
idxmax = sr.idxmax()
remain = max((sr[idxmax] - thres) / max(len(sr) - 1, 1), 0)
return np.where(sr.index == idxmax, min(sr[idxmax], thres), sr remain)
df['newcount'] = df.groupby('Name')['count'].transform(distribute, thres=15)
Output:
ID | Name | count | newcount |
---|---|---|---|
1 | A | 75 | 15 |
2 | B | 10 | 10 |
3 | A | 15 | 30 |
4 | A | 10 | 25 |
5 | A | 5 | 20 |
6 | A | 3 | 18 |