Identify the Max Value in a New Column and Loop-CodePudding

I am beginner to Python and am having some trouble trying to solve a problem. I am able to get to this point. What I need to do from here is create top_team column and store a 1 there if they have the highest score. Every other team would be assigned a 0 I would also like to be able to loop through this x number of times and store all previous results/summarize.

  import pandas as pd
    
    data = {'team': ['ATL', 'ATL', 'BOS', 'BOS', 'NYY', 'NYY'], 'player': ['A', 'B', 'C', 'D', 'E', 'F']}
    data = pd.DataFrame(data)
    
    import scipy.stats as stats
    
    a, b = 0, 9999
    mu = 5
    sigma = 1
    dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)
    
    data['score'] = dist.rvs(6).round(2)
    
    grouped = data.groupby("team", as_index=True)["score"].sum()
    grouped = pd.DataFrame(grouped)
    grouped = grouped.reset_index(level=0)
    
    print(grouped)

The above code yields a result similar to this:

   team  score
0  ATL  10.73
1  BOS  12.20
2  NYY   9.75

In this example, BOS would be the top_team and be assigned a value of 1 in the new column.

I would like to be able to turn this into a function and run this x = 10 times and end up with a result similar to the below. Obviously, results would vary dependent upon the random number generation.

    team  top_team
0   ATL    4
1   BOS    1
2   NYY    5

CodePudding user response：

In this particular case one of the easiest way to keep and update information about best team is using dict()

I would solve it the following way:

import scipy.stats as stats

def get_top_team(data):
    
    data = data.copy()
    a, b = 0, 9999
    mu = 5
    sigma = 1
    dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)

    data['score'] = dist.rvs(6).round(2)
    
    grouped = data.groupby("team", as_index=True)["score"].sum()
    grouped = pd.DataFrame(grouped)
    grouped = grouped.reset_index(level=0)
    
    return grouped.loc[grouped.score.idxmax, 'team']


res = {}

for i in range(10):
    best_team = get_top_team(data)
    res[best_team] = res.get(best_team, 0)   1

And in case you need to get a result as a DataFrame:

df = pd.DataFrame(res.items(), columns = ['team', 'top_team'])

CodePudding user response：

Create a matrix (not array) from the beginning.

import pandas as pd

data = {'team': ['ATL', 'ATL', 'BOS', 'BOS', 'NYY', 'NYY'], 'player': ['A', 'B', 'C', 'D', 'E', 'F']}
data = pd.DataFrame(data)

import scipy.stats as stats

a, b = 0, 9999
mu = 5
sigma = 1
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)

# create a random matrix instead of an array (6x10 matrix) and make it into a df
scores = pd.DataFrame(dist.rvs((6,10)).round(2), index=data['team'])
# find total score per team for each of the 10 scenarios/columns
grouped = scores.groupby(level=0).sum()
# find the max of each column, locate where they are and sum by columns
grouped = grouped.eq(grouped.max(0)).sum(1).to_frame('score').reset_index()
grouped