I am beginner to Python and am having some trouble trying to solve a problem. I am able to get to this point. What I need to do from here is create top_team
column and store a 1
there if they have the highest score
. Every other team would be assigned a 0
I would also like to be able to loop through this x
number of times and store all previous results/summarize.
import pandas as pd
data = {'team': ['ATL', 'ATL', 'BOS', 'BOS', 'NYY', 'NYY'], 'player': ['A', 'B', 'C', 'D', 'E', 'F']}
data = pd.DataFrame(data)
import scipy.stats as stats
a, b = 0, 9999
mu = 5
sigma = 1
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)
data['score'] = dist.rvs(6).round(2)
grouped = data.groupby("team", as_index=True)["score"].sum()
grouped = pd.DataFrame(grouped)
grouped = grouped.reset_index(level=0)
print(grouped)
The above code yields a result similar to this:
team score
0 ATL 10.73
1 BOS 12.20
2 NYY 9.75
In this example, BOS
would be the top_team
and be assigned a value of 1
in the new column.
I would like to be able to turn this into a function and run this x = 10
times and end up with a result similar to the below. Obviously, results would vary dependent upon the random number generation.
team top_team
0 ATL 4
1 BOS 1
2 NYY 5
CodePudding user response:
In this particular case one of the easiest way to keep and update information about best team is using dict()
I would solve it the following way:
import scipy.stats as stats
def get_top_team(data):
data = data.copy()
a, b = 0, 9999
mu = 5
sigma = 1
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)
data['score'] = dist.rvs(6).round(2)
grouped = data.groupby("team", as_index=True)["score"].sum()
grouped = pd.DataFrame(grouped)
grouped = grouped.reset_index(level=0)
return grouped.loc[grouped.score.idxmax, 'team']
res = {}
for i in range(10):
best_team = get_top_team(data)
res[best_team] = res.get(best_team, 0) 1
And in case you need to get a result as a DataFrame:
df = pd.DataFrame(res.items(), columns = ['team', 'top_team'])
CodePudding user response:
Create a matrix (not array) from the beginning.
import pandas as pd
data = {'team': ['ATL', 'ATL', 'BOS', 'BOS', 'NYY', 'NYY'], 'player': ['A', 'B', 'C', 'D', 'E', 'F']}
data = pd.DataFrame(data)
import scipy.stats as stats
a, b = 0, 9999
mu = 5
sigma = 1
dist = stats.truncnorm((a - mu) / sigma, (b - mu) / sigma, loc=mu, scale=sigma)
# create a random matrix instead of an array (6x10 matrix) and make it into a df
scores = pd.DataFrame(dist.rvs((6,10)).round(2), index=data['team'])
# find total score per team for each of the 10 scenarios/columns
grouped = scores.groupby(level=0).sum()
# find the max of each column, locate where they are and sum by columns
grouped = grouped.eq(grouped.max(0)).sum(1).to_frame('score').reset_index()
grouped