I have a huge data set, over 1,000,000 rows. I want to see how many 'xCordAdjusted', 'yCordAdjusted' pairings are correlated to 'event' type 'SHOT', 'MISS', and 'GOAL'.
'xCordAdjusted' has a minimum value of 0 and maximum of 100, 'yCordAdjusted' has a minimum value of -44 and a maximum of 44.
dff.head()
season event xCordAdjusted yCordAdjusted
2020 SHOT 74 -29
2020 SHOT 49 -25
2020 SHOT 52 31
2020 SHOT 43 39
2020 MISS 46 -33
I want to see the frequency of each coordinate resulting in the three 'event' attribute possibilities 'SHOT','MISS','GOAL'. Doesn't have to be exact - I just want to be able to preform further analysis on the totals for each 'event' given their x,y cord frequency.
Desired output:
xCordAdjusted yCordAdjusted event total
100 -44 SHOT 500,xxx
MISS 500,xxx
GOAL 500,xxx
99 -44 SHOT 500,xxx
MISS 500,xxx
GOAL 500,xxx
CodePudding user response:
Since you are looking to sum up the number of each type of event by the x and y coordinates, you can use groupby
and sum
:
dff.groupby(['xCordAdjusted','yCordAdjusted','event']).sum()