Pandas: how to fill empty cells with its previous row value given a line number condition-CodePudding

there is several questions on stackoverflow regarding how to fill empty cells with previous row values. However in my case I have a dataframe in this format:

Date	Val1	Val2	Sum
7.1	3	12	15
8.1	3		3
9.1	3		3
10.1	5	10	15
11.1	6	12	18
12.1	5	11	16
13.1	4	10	14
14.1	6	13	13
15.1	6		6
16.1	6		6

On the left side, I have a date, for the val1, the values stay the same as Fridays over the weekends (6th and 7th in this case), however for the val2, it is empty. I would like in this case copy the 13 to the empty cells. However, in my data, the val2 is available only from a certain date so I want it to only spawn the previous row number given a certain condition (i.e. start spawning from date > 10.1 or the row > 4 or something like this) Is this possible? thanks

CodePudding user response：

You can use ffill with boolean masks for boolean indexing. This will give you a lot of flexibility to decide which rows should be filled or not:

m1 = df['Date'].gt(10.1) # you can do the same with datetime
m2 = df['Val2'].isna()

df.loc[m1&m2, 'Val2'] = df['Val2'].ffill()

output:

   Date  Val1  Val2  Sum
0   7.1     3  12.0   15
1   8.1     3   NaN    3
2   9.1     3   NaN    3
3  10.1     5  10.0   15
4  11.1     6  12.0   18
5  12.1     5  11.0   16
6  13.1     4  10.0   14
7  14.1     6  13.0   13
8  15.1     6  13.0    6
9  16.1     6  13.0    6

CodePudding user response：

If you have many rows of dates, then it would be best to figure out if a date falls on the weekend, and if so shift() the Friday values into val2:

# converts day to number (0-6)
df['day_of_week'] = pd.to_datetime(df['date']).dt.dayofweek

# if Saturday get the previous row
df['sat_value'] = df.loc[df['day_of_week'].shift(-1)==5, 'val2']
df['sat_value'] = df['sat_value'].shift(1)

# if Sunday get from 2 rows previous
df['sun_value'] = df.loc[df['day_of_week'].shift(-2)==6, 'val2']
df['sun_value'] = df['sun_value'].shift(2)

# fill in val2 based on sat_value and sun_value
conds = [
    df['sat_value'].notnull(),
    df['sun_value'].notnull()
]
choices = [df['sat_value'], df['sun_value']]
df['val2'] = np.select(conds, choices, df['val2']) # by default fill val2 itself