Home > Software design >  How can I copy the value of a previous row, on the same column in a DataFrame?
How can I copy the value of a previous row, on the same column in a DataFrame?

Time:11-30

I have the following DataFrame:

Date Track Surface FGrating HorseId Last FGrating at Sha Tin, grass
2017-09-03 Sha Tin Grass 110 1736
2017-09-16 Sha Tin Grass 124 1736 110
2017-10-14 Sha Tin Grass 118 1736 124
2017-11-11 Sha Tin Grass 107 1736 118
2018-03-28 Happy Valley Grass 117 1736
2018-04-11 Happy Valley Grass 114 1736
2018-09-22 Sha Tin Grass 124 1736 107

As you can see, there are spaces for the Last FGrating at Sha Tin, grass. These spaces have two causes:

  1. There is no last FGrating (the first record);
  2. There is a last FGrating for this track, but it was not written when the race for the same horse was on another track (Happy Valley grass, in this table).

So far, I managed to do only the filling part, according to this code:

def return_mask_and_text_from_tracks(data, track_no, metric):
    if track_no == 0:  # Sha Tin - grass
        mask = (data.Track == 'Sha Tin') & (data.Surface == 'Grass')
        text = str(metric)   ' at Sha-Tin Grass'
    if track_no == 1:  # Sha Tin - dirt
        mask = (data.Track == 'Sha Tin') & (data.Surface == 'Dirt')
        text = str(metric)   ' at Sha-Tin Dirt'
    if track_no == 2:  # Happy Valley - grass
        mask = (data.Track == 'Happy Valley') & (data.Surface == 'Grass')
        text = str(metric)   ' at Happy Valley Grass'
    return mask, text

def compute_last_fgrating(data, mask=''):
    if len(mask) == 0:
        return data.groupby('HorseId')['FGrating'].apply(lambda x: x.shift(1))
    else:
        return data.loc[mask][['HorseId', 'FGrating']].groupby('HorseId')['FGrating'].apply(lambda x: x.shift(1))

for i in range(3):
    mask, text = return_mask_and_text_from_tracks(featured_data, i, 'Last FGrating')
    featured_data[text] = compute_last_fgrating(featured_data, mask=mask)

I am looking for ways to fill the rightmost column with the following values:

  1. 0, when no last FGrating exist (the first record, in this case)
  2. The last FGrating on Sha Tin grass on the records where the race was on other tracks, for the same horse.

The final table should look like this:

Date Track Surface FGrating HorseId Last FGrating at Sha Tin, grass
2017-09-03 Sha Tin Grass 110 1736 0 (no previous FGrating exists)
2017-09-16 Sha Tin Grass 124 1736 110
2017-10-14 Sha Tin Grass 118 1736 124
2017-11-11 Sha Tin Grass 107 1736 118
2018-03-28 Happy Valley Grass 117 1736 118 (the lastFGrating on Sha Tin grass at this point)
2018-04-11 Happy Valley Grass 114 1736 118
2018-09-22 Sha Tin Grass 124 1736 107

How can I do it?

CodePudding user response:

Here you only need

df["Last FGrating at Sha Tin, grass"] = df["Last FGrating at Sha Tin, grass"].fillna(method="ffill").fillna(0)

Explanation

The first fillna fill NaN with the latest value available (method="ffill") then you have left only one NaN on your first row and you can fill it with 0.

  • Related