I am working on a dataframe that has a column named season (newly created, np.nan filled), another column is match_id, it's values are like: match 1 has match_id 1, match 2 has match_id 2, ... , match n has match_id n. It's cricket (close to baseball) dataset so it's ball by ball. 1 match has 20 20 overs max (Each over has 6 balls). So match_id 1 is approx from index 0 to 240. Then match_id 2 is approx from index 241 to 480. Data is ball by ball (1 row for 1 ball)/match by match(approx 240 rows for 1 match)/ Season by Season (approx 14160 rows for 1 season).
My condition is that if match_id is from 1 to 59, place 2017 in those season column rows.
In my dataset match_id and other columns pre existed. I created np.nan column season, now I want to fill it.
my data looks like,
In[]: df_raw.head(6)
out[]:
season match_id inning batting_team bowling_team over ball
0 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 1
1 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 2
2 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 3
3 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 4
4 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 5
5 NaN 1 1 Sunrisers Hyderabad Royal Challengers Bangalore 1 6
CodePudding user response:
Alternatively use loc
function:
df.loc[(df['match_id']<=59) & (df['match_id']>=1), 'season'] = 2017
Note that since season
column contains NaNs it will be stored as floating point numbers. When you have finished filling in the season
values you can convert the values to integers
df['season'] = df['season'].astype('int')
CodePudding user response:
I split the process into two steps but you can just as well merge the two in to one line.
First check if the match_id is in the specified range, then overwrite with the desired value based on the condition.
df['season'] = df['match_id'].isin(range(1,60)
df['season'] = df['season'].apply(lambda x: 2017 if x else np.nan)