Home > Enterprise >  A simple way of selecting the previous row in a column and performing an operation?
A simple way of selecting the previous row in a column and performing an operation?

Time:12-01

I'm trying to create a forecast which takes the previous day's 'Forecast' total and adds it to the current day's 'Appt'. Something which is straightforward in Excel but I'm struggling in pandas. At the moment all I can get in pandas using .loc is this:

pd.DataFrame({'Date': ['2022-12-01', '2022-12-02','2022-12-03','2022-12-04','2022-12-05'],
                      'Appt': [12,10,5,4,13], 
                      'Forecast': [37,0,0,0,0]
                     })

What I'm looking for it to do is this:

pd.DataFrame({'Date': ['2022-12-01', '2022-12-02','2022-12-03','2022-12-04','2022-12-05'],
                      'Appt': [12,10,5,4,13], 
                      'Forecast': [37,47,52,56,69]
                     })

E.g. 'Forecast' total on the 1st December is 37. On the 2nd December the value in the 'Appt' column in 10. I want it to select 37 and 10, then put this in the 'Forecast' column for the 2nd December. Then iterate over the rest of the column.

I've tied using .loc() with the index, and experimented with .shift() but neither seem to work for what I'd like. Also looked into .rolling() but I think that's not appropriate either.

I'm sure there must be a simple way to do this?

Apologies, the original df has 'Date' as a datetime column.

CodePudding user response:

You can use mask and cumsum:

df['Forecast'] = df['Forecast'].mask(df['Forecast'].eq(0), df['Appt']).cumsum()

# or
df['Forecast'] = np.where(df['Forecast'].eq(0), df['Appt'], df['Forecast']).cumsum()

Output:

         Date  Appt  Forecast
0  2022-12-01    12        37
1  2022-12-01    10        47
2  2022-12-01     5        52
3  2022-12-01     4        56
4  2022-12-01    13        69

CodePudding user response:

You have to make sure that your column has datetime/date type, then you may filter df like this:

# previous code&imports
yesterday = datetime.now().date() - timedelta(days=1)
df[df["date"] == yesterday]["your_column"].sum()
  • Related