there is several questions on stackoverflow regarding how to fill empty cells with previous row values. However in my case I have a dataframe in this format:
Date | Val1 | Val2 | Sum |
---|---|---|---|
7.1 | 3 | 12 | 15 |
8.1 | 3 | 3 | |
9.1 | 3 | 3 | |
10.1 | 5 | 10 | 15 |
11.1 | 6 | 12 | 18 |
12.1 | 5 | 11 | 16 |
13.1 | 4 | 10 | 14 |
14.1 | 6 | 13 | 13 |
15.1 | 6 | 6 | |
16.1 | 6 | 6 |
On the left side, I have a date, for the val1, the values stay the same as Fridays over the weekends (6th and 7th in this case), however for the val2, it is empty. I would like in this case copy the 13 to the empty cells. However, in my data, the val2 is available only from a certain date so I want it to only spawn the previous row number given a certain condition (i.e. start spawning from date > 10.1 or the row > 4 or something like this) Is this possible? thanks
CodePudding user response:
You can use ffill
with boolean masks for boolean indexing. This will give you a lot of flexibility to decide which rows should be filled or not:
m1 = df['Date'].gt(10.1) # you can do the same with datetime
m2 = df['Val2'].isna()
df.loc[m1&m2, 'Val2'] = df['Val2'].ffill()
output:
Date Val1 Val2 Sum
0 7.1 3 12.0 15
1 8.1 3 NaN 3
2 9.1 3 NaN 3
3 10.1 5 10.0 15
4 11.1 6 12.0 18
5 12.1 5 11.0 16
6 13.1 4 10.0 14
7 14.1 6 13.0 13
8 15.1 6 13.0 6
9 16.1 6 13.0 6
CodePudding user response:
If you have many rows of dates, then it would be best to figure out if a date falls on the weekend, and if so shift()
the Friday values into val2
:
# converts day to number (0-6)
df['day_of_week'] = pd.to_datetime(df['date']).dt.dayofweek
# if Saturday get the previous row
df['sat_value'] = df.loc[df['day_of_week'].shift(-1)==5, 'val2']
df['sat_value'] = df['sat_value'].shift(1)
# if Sunday get from 2 rows previous
df['sun_value'] = df.loc[df['day_of_week'].shift(-2)==6, 'val2']
df['sun_value'] = df['sun_value'].shift(2)
# fill in val2 based on sat_value and sun_value
conds = [
df['sat_value'].notnull(),
df['sun_value'].notnull()
]
choices = [df['sat_value'], df['sun_value']]
df['val2'] = np.select(conds, choices, df['val2']) # by default fill val2 itself