I am trying to fill NaN values only if a certain condition is met. I want to use an ffill of whatever the 'REVENUE_STATUS_FLAG' value is prior to listed NaN values. The time series dataset is split into 'SCU_KEY's representing companies and 'PRODUCT' representing type of product. So I don't want the values being filled for one 'SCU_KEY' or 'PRODUCT' to populate on NaN values for other 'SCU_KEY's or 'PRODUCT's. Here is the code I attempted and the result I am looking for:
My code:
cross_6 = df_SR5['REVENUE_STATUS_FLAG'].notna()
cross_7 = df_SR5['REVENUE_STATUS_FLAG'].shift(-1).isna()
cross_8 = df_SR5['SCU_KEY'] == df_SR5['SCU_KEY'].shift(-1)
cross_9 = df_SR5['PRODUCT'] == df_SR5['PRODUCT'].shift(-1)
df_SR5['REVENUE_STATUS_FLAG'] = np.where((cross_6) & (cross_7) & (cross_8) & (cross_9), df_SR5['REVENUE_STATUS_FLAG'].fillna(method='ffill'), df_SR5['REVENUE_STATUS_FLAG'])
Input:
df = pd.DataFrame({'year': ['2019-02', '2019-03', '2019-04', '2019-05',
'2018-03', '2018-04', '2018-05', '2017-04', '2017-05', '2019-06', '2019-07', '2019-08'], 'SCU_KEY': [3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5], 'PRODUCT': ['HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'EPM', 'EPM', 'SCM', 'SCM', 'SCM'], 'REVENUE_STATUS_FLAG': [1, 1, np.NaN, np.NaN, np.NaN, 0, np.NaN, np.NaN, np.NaN, np.NaN, 0, 0]})
Output:
df = pd.DataFrame({'year': ['2019-02', '2019-03', '2019-04', '2019-05',
'2018-03', '2018-04', '2018-05', '2017-04', '2017-05', '2019-06', '2019-07', '2019-08'], 'SCU_KEY': [3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5], 'PRODUCT': ['HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'HCM', 'EPM', 'EPM', 'SCM', 'SCM', 'SCM'], 'REVENUE_STATUS_FLAG': [1, 1, 1, 1, np.NaN, 0, 0, np.NaN, np.NaN, np.NaN, 0, 0]})
CodePudding user response:
df['filled'] = df.groupby(['PRODUCT', 'SCU_KEY'])['REVENUE_STATUS_FLAG'].fillna(method='ffill')
year SCU_KEY PRODUCT REVENUE_STATUS_FLAG filled
0 2019-02 3 HCM 1.0 1.0
1 2019-03 3 HCM 1.0 1.0
2 2019-04 3 HCM NaN 1.0
3 2019-05 3 HCM NaN 1.0
4 2018-03 4 HCM NaN NaN
5 2018-04 4 HCM 0.0 0.0
6 2018-05 4 HCM NaN 0.0
7 2017-04 4 EPM NaN NaN
8 2017-05 4 EPM NaN NaN
9 2019-06 5 SCM NaN NaN
10 2019-07 5 SCM 0.0 0.0
11 2019-08 5 SCM 0.0 0.0