Home > Software engineering >  How to insert a value in a column's specific ranging rows, according to a condition. Pandas
How to insert a value in a column's specific ranging rows, according to a condition. Pandas

Time:11-07

I am working on a dataframe that has a column named season (newly created, np.nan filled), another column is match_id, it's values are like: match 1 has match_id 1, match 2 has match_id 2, ... , match n has match_id n. It's cricket (close to baseball) dataset so it's ball by ball. 1 match has 20 20 overs max (Each over has 6 balls). So match_id 1 is approx from index 0 to 240. Then match_id 2 is approx from index 241 to 480. Data is ball by ball (1 row for 1 ball)/match by match(approx 240 rows for 1 match)/ Season by Season (approx 14160 rows for 1 season).

My condition is that if match_id is from 1 to 59, place 2017 in those season column rows.

In my dataset match_id and other columns pre existed. I created np.nan column season, now I want to fill it.

my data looks like,

In[]: df_raw.head(6)
out[]:
    season  match_id    inning  batting_team         bowling_team                  over ball
0   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    1
1   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    2
2   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    3
3   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    4
4   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    5
5   NaN     1           1       Sunrisers Hyderabad  Royal Challengers Bangalore   1    6

CodePudding user response:

Alternatively use loc function:

df.loc[(df['match_id']<=59) & (df['match_id']>=1), 'season'] = 2017

Note that since season column contains NaNs it will be stored as floating point numbers. When you have finished filling in the season values you can convert the values to integers

df['season'] = df['season'].astype('int')

CodePudding user response:

I split the process into two steps but you can just as well merge the two in to one line.

First check if the match_id is in the specified range, then overwrite with the desired value based on the condition.

df['season'] = df['match_id'].isin(range(1,60)
df['season'] = df['season'].apply(lambda x: 2017 if x else np.nan)
  • Related