Home > Software engineering >  Pandas missing value; with fflill and add comment
Pandas missing value; with fflill and add comment

Time:10-13

following through pandas documentation for df.fillna(method="ffill"), here. How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?How to add a new column with comments?

 df_1 = pd.DataFrame([['2021-01-01', 'Supp_1', 'Product_1', 1],
                       ['2021-02-01', 'Supp_1', 'Product_1', ''],
                       ['2021-03-01','Supp_1', 'Product_1', np.nan],
                       ['2021-04-01', 'Supp_1', 'Product_1', 1.25],
              ['2021-01-01', 'Supp_1', 'Product_2', 1.5],
                       ['2021-02-01', 'Supp_1', 'Product_2', ''],
                       ['2021-03-01','Supp_1', 'Product_2', np.nan],
                       ['2021-04-01', 'Supp_1', 'Product_2', 1.75]],
                      columns=['Date','Supplier','Product','Cost'])

      Date     Supplier Product     Cost
0   2021-01-01  Supp_1  Product_1   1
1   2021-02-01  Supp_1  Product_1   
2   2021-03-01  Supp_1  Product_1   NaN
3   2021-04-01  Supp_1  Product_1   1.25
4   2021-01-01  Supp_1  Product_2   1.5
5   2021-02-01  Supp_1  Product_2   
6   2021-03-01  Supp_1  Product_2   NaN
7   2021-04-01  Supp_1  Product_2   1.75

Expected df_2,

       Date     Supplier Product Cost   Cost_Assumption
0   2021-01-01  Supp_1  Product_1   1.00    Actual
1   2021-02-01  Supp_1  Product_1   1.00    Cost per 2021-01-01
2   2021-03-01  Supp_1  Product_1   1.00    Cost per 2021-01-01
3   2021-04-01  Supp_1  Product_1   1.25    Actual
4   2021-01-01  Supp_1  Product_2   1.50    Actual
5   2021-02-01  Supp_1  Product_2   1.50    Cost per 2021-01-01
6   2021-03-01  Supp_1  Product_2   1.50    Cost per 2021-01-01
7   2021-04-01  Supp_1  Product_2   1.75    Actual

CodePudding user response:

    df_1 = pd.DataFrame([['2021-01-01', 'Supp_1', 'Product_1', 1],
                               ['2021-02-01', 'Supp_1', 'Product_1', ''],
                               ['2021-03-01','Supp_1', 'Product_1', np.nan],
                               ['2021-04-01', 'Supp_1', 'Product_1', 1.25],
                      ['2021-01-01', 'Supp_1', 'Product_2', 1.5],
                               ['2021-02-01', 'Supp_1', 'Product_2', ''],
                               ['2021-03-01','Supp_1', 'Product_2', np.nan],
                               ['2021-04-01', 'Supp_1', 'Product_2', 1.75]],
                              columns=['Date','Supplier','Product','Cost'])
    df_1 = df_1.replace('', np.nan)
    df_1['Cost_as_of'] = np.where(df_1['Cost']> 0, df_1['Date'], np.nan)
        


       df_1.loc[:,['Cost','Cost_assumptions']] = df_1.loc[:,['Cost','Cost_assumptions']].ffill()
df_1['Cost_assumptions'] = np.where(df_1['Cost_assumptions'] == df_1['Date'],'Actual Cost', 'Cost as of'   ' '   df_1['Cost_assumptions'])

Output:

    Date    Supplier    Product     Cost    Cost_assumptions
0   2021-01-01  Supp_1  Product_1   1.00    Actual Cost
1   2021-02-01  Supp_1  Product_1   1.00    Cost as of 2021-01-01
2   2021-03-01  Supp_1  Product_1   1.00    Cost as of 2021-01-01
3   2021-04-01  Supp_1  Product_1   1.25    Actual Cost
4   2021-01-01  Supp_1  Product_2   1.50    Actual Cost
5   2021-02-01  Supp_1  Product_2   1.50    Cost as of 2021-01-01
6   2021-03-01  Supp_1  Product_2   1.50    Cost as of 2021-01-01
7   2021-04-01  Supp_1  Product_2   1.75    Actual Cost

CodePudding user response:

Could you not create the Cost_Assumption column first based on the Cost column?

df_1.loc[df_1['Cost'] == '', 'Cost_Assumption'] = 'Cost per 2021-01-01'
df_1.loc[df_1['Cost'].isnull(), 'Cost_Assumption'] = 'Cost per 2021-01-01'
df_1['Cost_Assumption'] = df_1['Cost_Assumption'].fillna('Actual')    

And then ffill your cost column

  • Related