I want to loop through my data and population my dictonairies with 'event' value and their corresponding 'xCordAdjusted' and 'yCordAdjusted'
Dataframe:
season period teamCode event goal xCord xCordAdjusted yCord yCordAdjusted shotType playerPositionThatDidEvent playerNumThatDidEvent shooterPlayerId shooterName shooterLeftRight
2014 1 MTL MISS 0 61 61 29 29 WRIST C 51 8471976.0 David Desharnais L
2014 1 TOR SHOT 0 -54 54 29 -29 BACK C 42 8475098.0 Tyler Bozak R
2014 1 TOR SHOT 0 -40 40 32 -32 WRIST D 46 8471392.0 Roman Polak R
My work:
league_data = {};
league_data['SHOT'] = {};
league_data['SHOT']['x'] = [];
league_data['SHOT']['y'] = [];
league_data['GOAL'] = {};
league_data['GOAL']['x'] = [];
league_data['GOAL']['y'] = [];
league_data['MISS'] = {};
league_data['MISS']['x'] = [];
league_data['MISS']['y'] = [];
event_types = ['SHOT','GOAL','MISS']
for data in season_df:
for event in event_types:
if data in event_types:
if 'x' in range(0,100):
league_data[event]['x'].append(['xCordAdjusted'])
league_data[event]['y'].append(['yCordAdjusted'])
league_data
Output:
{'SHOT': {'x': [], 'y': []},
'GOAL': {'x': [], 'y': []},
'MISS': {'x': [], 'y': []}}
CodePudding user response:
You can extract the desired information directly from the DataFrame in a vectorized fashion, instead of looping over it repeatedly:
league_data = {
'SHOT': {},
'GOAL': {},
'MISS': {},
}
for event in event_types:
mask = (season_df['event'] == event) & season_df['xCord'].between(0, 100)
x_adjusted = season_df.loc[mask, 'xCordAdjusted'].tolist()
y_adjusted = season_df.loc[mask, 'yCordAdjusted'].tolist()
league_data[event]['x'] = x_adjusted
league_data[event]['y'] = y_adjusted
gives
{'GOAL': {'x': [], 'y': []},
'MISS': {'x': [61], 'y': [-29]},
'SHOT': {'x': [], 'y': []}
}
Note that I adjusted the range condition since your original code if 'x' in range(0,100)
doesn't do what you intend because it doesn't reference your DataFrame at all.
CodePudding user response:
for data in season_df:
iterate on columns, not rows.
Instead, use for index, row in season_df.iterrows()
However, iteration on rows is quite slow, so if your data is quite big, you can utilize vectorization.
Also, your code looks not working as you expected.. like if 'x' in range(0, 100)
. I re-code it on my assumption, try this.
for event in event_types:
matched_df = season_df[season_df['event'] == event]
x_matched_list = matched_df[(0 <= matched_df['xCordAdjusted']) & (matched_df['xCordAdjusted'] <= 100)]['xCordAdjusted'].tolist()
league_data[event]['x'] = x_matched_list # or extend
y_matched_list = matched_df[(0 <= matched_df['yCordAdjusted']) & (matched_df['yCordAdjusted'] <= 100)]['yCordAdjusted'].tolist()
league_data[event]['y'] = y_matched_list # or extend
But be careful with possibility of length 'xCordAdjusted' not matching with 'yCordAdjusted'