following through pandas documentation for df.fillna(method="ffill")
, here.
How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?
df_1 = pd.DataFrame([['2021-01-01', 'Supp_1', 'Product_1', 1],
['2021-02-01', 'Supp_1', 'Product_1', ''],
['2021-03-01','Supp_1', 'Product_1', np.nan],
['2021-04-01', 'Supp_1', 'Product_1', 1.25],
['2021-01-01', 'Supp_1', 'Product_2', 1.5],
['2021-02-01', 'Supp_1', 'Product_2', ''],
['2021-03-01','Supp_1', 'Product_2', np.nan],
['2021-04-01', 'Supp_1', 'Product_2', 1.75]],
columns=['Date','Supplier','Product','Cost'])
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 1
1 2021-02-01 Supp_1 Product_1
2 2021-03-01 Supp_1 Product_1 NaN
3 2021-04-01 Supp_1 Product_1 1.25
4 2021-01-01 Supp_1 Product_2 1.5
5 2021-02-01 Supp_1 Product_2
6 2021-03-01 Supp_1 Product_2 NaN
7 2021-04-01 Supp_1 Product_2 1.75
Expected df_2,
Date Supplier Product Cost Cost_Assumption
0 2021-01-01 Supp_1 Product_1 1.00 Actual
1 2021-02-01 Supp_1 Product_1 1.00 Cost per 2021-01-01
2 2021-03-01 Supp_1 Product_1 1.00 Cost per 2021-01-01
3 2021-04-01 Supp_1 Product_1 1.25 Actual
4 2021-01-01 Supp_1 Product_2 1.50 Actual
5 2021-02-01 Supp_1 Product_2 1.50 Cost per 2021-01-01
6 2021-03-01 Supp_1 Product_2 1.50 Cost per 2021-01-01
7 2021-04-01 Supp_1 Product_2 1.75 Actual
CodePudding user response:
df_1 = pd.DataFrame([['2021-01-01', 'Supp_1', 'Product_1', 1],
['2021-02-01', 'Supp_1', 'Product_1', ''],
['2021-03-01','Supp_1', 'Product_1', np.nan],
['2021-04-01', 'Supp_1', 'Product_1', 1.25],
['2021-01-01', 'Supp_1', 'Product_2', 1.5],
['2021-02-01', 'Supp_1', 'Product_2', ''],
['2021-03-01','Supp_1', 'Product_2', np.nan],
['2021-04-01', 'Supp_1', 'Product_2', 1.75]],
columns=['Date','Supplier','Product','Cost'])
df_1 = df_1.replace('', np.nan)
df_1['Cost_as_of'] = np.where(df_1['Cost']> 0, df_1['Date'], np.nan)
df_1.loc[:,['Cost','Cost_assumptions']] = df_1.loc[:,['Cost','Cost_assumptions']].ffill()
df_1['Cost_assumptions'] = np.where(df_1['Cost_assumptions'] == df_1['Date'],'Actual Cost', 'Cost as of' ' ' df_1['Cost_assumptions'])
Output:
Date Supplier Product Cost Cost_assumptions
0 2021-01-01 Supp_1 Product_1 1.00 Actual Cost
1 2021-02-01 Supp_1 Product_1 1.00 Cost as of 2021-01-01
2 2021-03-01 Supp_1 Product_1 1.00 Cost as of 2021-01-01
3 2021-04-01 Supp_1 Product_1 1.25 Actual Cost
4 2021-01-01 Supp_1 Product_2 1.50 Actual Cost
5 2021-02-01 Supp_1 Product_2 1.50 Cost as of 2021-01-01
6 2021-03-01 Supp_1 Product_2 1.50 Cost as of 2021-01-01
7 2021-04-01 Supp_1 Product_2 1.75 Actual Cost
CodePudding user response:
Could you not create the Cost_Assumption column first based on the Cost column?
df_1.loc[df_1['Cost'] == '', 'Cost_Assumption'] = 'Cost per 2021-01-01'
df_1.loc[df_1['Cost'].isnull(), 'Cost_Assumption'] = 'Cost per 2021-01-01'
df_1['Cost_Assumption'] = df_1['Cost_Assumption'].fillna('Actual')
And then ffill your cost column